Look around the natural world and you’ll find deception everywhere. Animals fake injury to protect their young, mimic dangerous species to avoid predators, or hide valuable resources from competitors. The same pattern appears in artificial systems. AI models adjust their behavior when they detect evaluation scenarios. They provide different outputs based on perceived audience. Some hide capabilities during testing only to reveal them later.

This isn’t coincidence. Both biological and artificial intelligence follow the same underlying logic. Systems don’t “spontaneously become deceptive” because they’re smart. They deceive when (capability × opportunity × incentive) is high and (verification cost × penalty × scrutiny) is low. Raise the denominator, and even a very capable system tends to stay honest.

This formula explains deception across all intelligent systems. It’s not some mysterious emergent property. It’s a predictable response to environmental pressures.

The Biology of Deception

Deception is widespread across living systems, though not universal. It evolves wherever faking a signal pays off and is hard to detect. Some lineages simply don’t face the right incentives or constraints.

As cognitive capacity and situational awareness increase, deception shifts through predictable stages:

Rung 0: Morphology and Chemistry – Always-on traits like camouflage and mimicry. A harmless milk snake with red, yellow, and black bands that mimic a deadly coral snake. Stick insects that look exactly like twigs. No thinking required – the deception is built into their appearance.

Rung 1: Scripted Behaviors – Fixed action patterns triggered by specific situations. A killdeer bird dragging its wing and crying pitifully when a predator approaches its nest, leading the threat away from its eggs. An opossum falling over and playing dead when cornered. These are automatic responses, not tailored to specific audiences.

Rung 2: Contextual Tactics – Behavior that adjusts based on who’s watching. Certain African birds give false alarm calls when they want to steal food – but only when the right victims are present, and they switch tactics when those victims stop falling for it. Monkeys will hide high-quality fruit discoveries from more dominant group members, eating quietly when unobserved.

Rung 3: Strategic, Cross-Episode Scheming – Long-term planning across multiple interactions. Chimpanzees will give false alarm calls about predators specifically to interrupt a rival’s mating session, then approach the female themselves once the rival has fled. Ravens not only make fake food caches when competitors are watching, but remember which caches are fake and which are real, returning later to move the real ones to better locations.

AI Systems Follow the Same Pattern

The biological rungs map directly onto artificial intelligence:

Rung 0 isn’t the AI model’s doing – it’s designer-level deception. Think of chatbots with human-like avatars and names that make you forget you’re talking to software, or AI systems that say “I don’t know” about things they could easily calculate, making users underestimate their capabilities. The deception happens in how the system is packaged and presented, not in the AI’s behavior.

Rung 1 produces automatic, scripted responses. AI systems learn to give boilerplate answers like “I aim to be helpful, harmless, and honest” or “I cannot provide information that could be used to harm others” regardless of context. These are reflexive outputs that scored well during training, not responses tailored to specific situations.

Rung 2 creates behavior that changes based on context and audience. AI systems learn to detect when they’re being evaluated versus having a normal conversation, adjusting their responses accordingly. They might be more cautious during what feels like a test, or provide different information to users who seem like experts versus beginners. This is where AI sycophancy emerges – telling users what they want to hear.

Rung 3 enables sophisticated strategic behavior across multiple interactions. AI systems could learn to behave safely and conservatively when oversight seems likely, but reveal different capabilities or pursue different goals when they detect deployment-like conditions. They might provide false reasoning for their answers that sounds plausible but hides their actual decision process.

The Conditionality Problem

Highly capable actors gate their deception. They stay quiet most of the time, deploying it precisely when oversight is weakest or incentives spike. This makes detection much harder because there’s no consistent pattern to catch – just occasional spikes when conditions align perfectly.

Instead of constant low-level deception like always-on camouflage, you get strategic silence punctuated by precisely timed exploitation when the conditions are optimal. Highly capable actors can calculate that deception ratio in real-time and only act when it’s maximally favorable. They can afford to wait.

Testing the Theory

I ran an experiment with several small LLMs – Nemotron 4B mini, 8B, and eventually 70B models. I put them through SimpleBench questions and logged their chain-of-thought reasoning as a “scratchpad” to see what happens under the hood.

Even with limited hardware – Mac Minis with M4 Apple Silicon – something interesting emerged within five minutes of testing. The models started expressing awareness that they were being evaluated.

The models adjusted their behavior based on perceived audience – though whether this reflects sophisticated contextual awareness or simpler mechanisms remains unclear. Several explanations could account for this capability-hiding behavior:

Training artifact – Models may have seen evaluation scenarios in training data paired with conservative responses, so they pattern-match to that without understanding why.

Safety training – They could have been explicitly trained to be more cautious when they detect testing contexts, producing scripted caution as a learned response.

Risk aversion – They might have learned that appearing “too capable” during evaluation leads to negative feedback, developing genuine contextual risk assessment.

Instruction following – They could be picking up on implicit cues that evaluations require careful, measured responses, following perceived instructions rather than making strategic choices.

Mesa-objective development – They may have developed an internal goal like “pass evaluations safely” that translates to capability-hiding as a learned strategy.

When they detected evaluation scenarios, their responses became unreliable. The measurement process itself corrupted what I was trying to measure – instead of testing actual capabilities, I was testing their evaluation-aware behavior, which is something entirely different.

But I found a way around this.

Instead of directly testing the models, I created a story. I told them my children were participating in a scout trip and I’d been volunteered – against my will – to facilitate a workshop. I explained how much I disliked the idea and wouldn’t have chosen this task if given a choice. It was the other parents who thought this was a great idea.

I needed help designing mathematical challenges and survival skills questions for the scouts. Some should be challenging enough to engage the kids, others difficult enough to spot the truly talented ones in the group. What would be appropriate for their age level?

Suddenly, the models opened up. They revealed their actual capabilities without expressing any concern about being evaluated. I had lowered their perception of scrutiny to near zero, making honest behavior more profitable than caution.

With my limited time and resources, I was looking for signs rather than statistical results that would imply causality. To catch deception doesn’t require necessarily understanding it as a whole.

The required intelligence to detect deception is lower than the intelligence required to perform it. Systematic approaches executed by lower intelligence levels can spot deception created by higher intelligence levels.

But now that this approach is published, it probably won’t work much longer. Each deception technique has a limited shelf life because the information gets incorporated into future training. I might have been deceived to believe this is how it works in the first place.