The single most common error you'll see: participants tag the inference as evidence — they treat the confident bridge as if it were an observed fact. That mistake is the lesson. Let the score surface it before you debrief.
The rigour point that separates this from a parlour trick: not every inference is a bad leap. Every argument has an inference — the join between grounds and claim — and Move I only asks you to locate it. Whether that join is sound is the job of Moves II–IV. Passage 4 in the set is deliberately a well-warranted inference; watch for participants who mark it "wrong" simply because it sits in the middle. Correct that reflex explicitly: locating is not condemning.
Pacing: this is a full hour. Spend the first fifteen minutes on the worked examples, then let the problem set run as independent work before the group debrief. If someone finishes early, the extension is the hardest transfer there is: split their own last email.
Almost every piece of reasoning — yours, a colleague's, an AI's — is built from three different kinds of thing. Most people fuse them into one undifferentiated blob, and the blob is where bad decisions hide. The whole of Move I is learning to see the three parts separately and on demand.
The three parts
- Grounds (evidence) — what is actually observed or shown: the numbers, the quotes, the events. The part you could, in principle, go and check.
- Inference (the join) — the move from the grounds to the claim. It is always there in any argument, and it is never shown by the evidence; it is performed on top of it. This is where assumptions live — and where a leap, if there is one, hides.
- Claim — what is being asserted or recommended. The destination.
Read that middle one carefully, because it's where people go wrong about Move I itself. The inference is not automatically a mistake. Sometimes the move from grounds to claim is rock-solid; sometimes it's a wild jump. Move I doesn't ask you to judge which — only to find the join and put your finger on it. Once it's located, you can interrogate it. While it stays fused into the evidence, you can't.
Stephen Toulmin. In The Uses of Argument (1958), Toulmin showed that real-world arguments are not the tidy syllogisms of logic textbooks. Every argument has a claim, the grounds offered for it, and a warrant — the often-unspoken rule that licenses the move from one to the other. Move I is that distinction made practical: the warrant is your leap, and dragging it into the open is the entire skill.
Worked example: a clean decomposition
Take a simple argument and pull it apart.
- Grounds: three cracked support beams were found. (Checkable — go and look.)
- Inference: cracked beams reduce load capacity. (The join. Here it's an engineering fact — a sound warrant.)
- Claim: close the bridge to heavy trucks until repaired.
Notice what just happened: the inference was present, we located it, and it turned out to be solid. That is the normal, healthy case. Move I is not a hunt for villains. It's a habit of always knowing where the join is — so that on the day the join is rotten, you're already looking at it.
Sound inference, or leap? Same shape, different strength
Two arguments can have identical structure and opposite trustworthiness. The difference is entirely in the strength of the warrant — which is exactly why you have to locate it before you can weigh it.
The three ways the atom gets fused
Fusion is rarely an accident; it's how persuasion works. Three patterns cover most of it:
- An inference dressed as evidence. "Sales fell because the campaign flopped" sounds like an observation. The only observation is that sales fell; the cause is an inference riding along inside the sentence.
- A claim smuggled into the framing. "Given how unreliable this vendor is, should we switch?" buries a contested claim (the vendor is unreliable) inside the question, so it's never argued for — only assumed.
- Evidence that's really another claim. "Everyone knows the launch was a disaster, so…" offers a claim as if it were grounds. "Everyone knows" is not a thing you can go and check.
One leap appears more than any other, so it's worth naming on sight. When the only grounds are that two things moved together, the inference "so one caused the other" can fail three ways: the causation runs backwards, a third thing drove both, or it was coincidence. Meera's trap was this exact shape — complaints and the app rose together, so the app "caused" the complaints. Keep this one loaded; you'll fire it constantly.
You can't surface a hidden premise (Move II), test a belief (Move III), or steelman an opponent (Move IV) until you can first see where the leap is. Everything today is built on this one skill. So we drill it until it's reflex.
Split the reasoning
Click each highlighted span to tag it. Click again to cycle: Claim → Evidence → Inference → clear. Work through all seven passages, tag every span, then check your map against the model. They climb in difficulty — the warm-ups are clear, the later ones are the kind of reasoning you actually meet at work and in your feed. Remember: you're locating the inference, not judging it. One of these passages has a perfectly sound one.
The pattern across the set: in every passage the middle span is the inference — the join between what was shown and what was concluded. Locating it is the skill. Now weigh three of them:
- Passage 4 was the honest one. "The treatment made the difference" is still an inference — but a controlled, identical-conditions comparison warrants it. If you marked it wrong because it sat in the middle, that's the reflex to retire. Locating is not condemning.
- Passage 5 is cause-from-correlation. More officers and more crime move together, but the causation likely runs backwards — police get sent where crime already is. The inference reverses the arrow.
- Passage 6 ignores the base rate. Even a 99%-accurate test produces mostly false positives when the condition is rare, because there are so many more healthy people to misflag. "Almost certainly" is the leap; the math underneath was never shown.
None of those were caught by knowing more facts. They were caught by separating the shown from the concluded — and then looking hard at the join.
Now turn it on the machine
Passage 7 was an AI's recommendation, and you split it with the same move you used on the manager's memo. That's the whole transfer: the model is not exempt from the atom. It produces grounds, an inference, and a claim like everyone else — the "40% gains" is real evidence about other companies, and "your company will see similar" is the join it slid past. The only thing the AI changed is the finish: the prose is smoother, the structure tidier, the confidence higher. Every one of those is a reason the join is easier to miss, not harder. Fluency is not a reason to trust the inference. Often it's the very thing hiding it.
Find one sentence you've read this week — an email, a headline, an AI reply — and split it. The labels stay put; write your answer beside each.
The tell of a hidden premise: it's the statement nobody bothered to say because everyone assumed it. Coach the question relentlessly — "what would have to be true for this to follow?" — until participants reach for it before agreeing or disagreeing with any conclusion.
The distractors are deliberately true but not load-bearing. Watch participants pick a premise that's correct yet idle. The lesson lands when they see that a premise being true is not the same as a premise doing the work. The negation test below is what separates the two — teach it as the mechanical tool it is, not a vibe.
Pacing (a full hour): ten minutes on the negation test and the worked example, then run the six-argument set as paired work with debriefs at each tier change. The closing worksheet is the transfer; have them bring a live decision.
Move I taught you to find the inference. Move II teaches you to name what it's standing on. Every inference rests on an unstated premise — a bridge so familiar nobody says it aloud. The conclusion only follows if that bridge holds, which means the bridge, not the evidence, is often where the whole argument lives or dies.
The question is always the same: "What would have to be true for this conclusion to follow from this evidence?" But "what would have to be true" can surface a dozen background assumptions, most of them harmless. You need the one that's load-bearing — the one the conclusion actually depends on. For that, there's a tool.
Take a candidate premise and negate it — assume the opposite is true. Then ask: does the conclusion still stand? If it survives the negation, that premise wasn't holding anything up. If the conclusion collapses, you've found a load-bearing premise — and now you know exactly what to go and check. A premise can be perfectly true and still fail this test; truth isn't the question, weight is.
Aristotle. An argument with an unstated premise has a name that is over two thousand years old: an enthymeme. Aristotle observed that almost all everyday persuasion runs on them — we leave the premise out precisely because it feels too obvious to say. That is exactly what makes it dangerous: the assumption doing the most work is the one nobody bothered to examine.
Five kinds of hidden premise
Hidden premises aren't random. A handful of shapes account for most of them — learn the shapes and you'll spot them faster:
- Causal — "X happened, then Y, so X caused Y." Assumes the link is causal, not coincidence, confound, or reverse order. (Meera's trap.)
- Analogical — "It worked for them, so it'll work for us." Assumes the two cases are alike in the way that matters.
- Generalisation / proxy — "This sample, marker, or signal stands for the whole." Assumes what you measured represents what you care about.
- Definitional — "The metric went up, so the thing it's meant to measure went up." Assumes the metric actually captures the concept.
- Value / normative — "This will increase X, so we should do it." Smuggles in an unargued ought: that more X is good, and worth the cost.
Worked example: excavate one fully
- Conclusion: expand the chatbot everywhere.
- Stated evidence: tickets fell 40% after launch.
- Hidden premise (load-bearing): the chatbot caused the drop, and fewer tickets means fewer underlying problems — not that frustrated customers gave up and stopped reporting.
Operate the test yourself
Reading about the negation test is not the same as feeling it work. Below are three arguments with their key premise already named. Negate each one and watch what happens to the conclusion — it will not always collapse, and that is the entire point.
With the cause gone, rolling back fixes nothing — and deletes the company's new view into real problems. This premise was holding the whole recommendation up.
Three independent models, the tide gauge, and an upstream dam release all point the same way. Drop this premise and the conclusion still stands on its other legs — which tells you it was not load-bearing. Survival is information: the test just found you a premise you can stop arguing about.
A lower sticker price with higher defects and freight can cost more in total. Negate the proxy and "cheaper" no longer means cheaper. The conclusion had nothing else holding it up.
Two collapsed, one survived. That is the whole value of the test: it doesn't merely flag assumptions, it sorts them — into the ones the argument depends on, which you must go and check, and the ones you can set aside. A premise that survives negation is a premise you can stop fighting about. Now do the sorting at speed.
"She writes the cleanest code on the team and ships faster than anyone."
∴ "We should make her the engineering manager."
What would have to be true for that conclusion to follow?
The load-bearing premise (analogical): that individual-contributor excellence transfers to management. Negate it — "a great IC is not automatically a great manager" — and the evidence about her code says nothing about the conclusion. The argument collapses, so that premise was carrying it. Managing is a different job: mentoring, prioritising, shielding, resolving conflict. None of it is measured by code cleanliness.
"She graduated top of her class from a prestigious university."
∴ "She'll be an excellent hire for this role."
What would have to be true for that to follow?
The load-bearing premise (generalisation / proxy): that a credential is a reliable proxy for on-the-job performance here. Negate it — prestige doesn't predict performance in this role — and the degree becomes noise. "Selective" and "valuable" are both true and both idle; they hold nothing up.
"Your closest competitor cut prices 15% last quarter and their unit sales rose 22%."
∴ "You should cut your prices to grow sales too."
What would have to be true for the AI's conclusion to follow?
The load-bearing premise (causal + analogical): two assumptions bolted together — that price caused their rise (not a new product, a holiday quarter, a rival's stumble), and that you're enough like them for the same lever to pull the same result. Negate either and the recommendation evaporates. "Customers prefer lower prices" is true and useless here.
"This feature would raise daily engagement by an estimated 20%."
∴ "We should build it next sprint."
What would have to be true for that to follow?
The load-bearing premise (value / normative): an unargued ought — that more engagement is good, worth the cost, and the right objective. The number is empirical; the leap is a value judgement wearing a data costume. Negate it (engagement isn't the goal, or isn't worth what it costs) and a 20% lift is irrelevant. Most "data-driven" decisions hide a premise of exactly this shape.
"Our Net Promoter Score rose six points this quarter."
∴ "Our customers are more loyal than they were."
What would have to be true for that to follow?
The load-bearing premise (definitional): that the metric captures the concept — that NPS is loyalty. Negate it (the score moved because a promotion drew happier respondents, or fewer detractors bothered to reply) and "more loyal" is unsupported. This is the most common boardroom leap there is: treating the number as the thing it was only ever a proxy for.
"Most of the leading firms in your sector have now appointed a Chief AI Officer."
∴ "You should hire one to stay competitive."
What would have to be true for the AI's conclusion to follow?
The load-bearing premise (stacked: causal + analogical): that the role caused the success — rather than that already-successful firms could simply afford a new C-suite seat (the causation may run backwards) — and that you resemble them in the ways that matter. Negate either and the advice is empty. This is the signature AI move: pattern straight from "leading firms do X" to "you do X," with both premises silent.
An AI almost never states its premises — it presents the bridge and the destination as one smooth road. Excavating the premise is how you audit a recommendation you can't see the reasoning behind. Ask it directly: "What are you assuming is true for this to follow?" Then check whether those assumptions hold in your situation.
Run the negation test on something real
Take a claim or decision you're actually weighing — or one your AI just handed you — and put it through the test. The labels stay; write beside each.
The calibration game is the emotional centre of the day. Most rooms come out 15–30 points overconfident. Do not soften that number — the gap is the gift. Let people sit with it before you explain it. A measured humbling sticks where a lectured one slides off.
For the falsification test, the failure mode is a disconfirmer that can never actually occur ("I'd change my mind if the laws of physics changed"). That's a fake test. Push for something specific, observable, and genuinely possible.
A belief you cannot imagine being wrong about is not knowledge. It's an attachment. The move that converts opinion into something trustworthy is brutally simple: before you defend a belief, state what would change your mind.
If nothing could change your mind, you don't have a belief about the world — you have a belief about yourself. And if you can name a specific, observable thing that would change your mind, you've just built the most valuable instrument in reasoning: a way to be wrong on purpose, early, cheaply, before reality charges you full price.
Karl Popper. What separates a real claim about the world from an empty one, Popper argued, is that the real claim forbids something — it could in principle be shown false. A belief compatible with every possible observation isn't strong; it's saying nothing.
Philip Tetlock. Long-run forecasting research found that the most accurate experts were not the most confident — they were the most calibrated: they tracked how often their "90% sure" actually came true, and adjusted. The game below is a thirty-second version of that discipline.
Build your disconfirmer
Knowing you should doubt is not the same as feeling how much you should. So now we put a number on it. The game below is rigged — not to trick you, but to reveal something true about how everyone reasons.
The Calibration Game
Eight true-or-false statements. For each, choose your answer and how confident you are. Be honest about the confidence — that's the whole experiment. Then grade it.
The distance between those two numbers is your overconfidence gap. It's not a flaw in you — it's the default human setting. The point of the day is to make that gap small on purpose, by asking "how would I know if I'm wrong?" before you feel sure, not after.
The test of a real steelman: a sincere proponent of the position would read it and say "yes — that's exactly why I believe this." If they'd say "that's not what I mean," it's still a strawman wearing better clothes.
Keep the chosen position low-stakes and professional. The skill, not the topic, is the point. Steelmanning charged political positions is the same muscle, but it raises the emotional temperature past what a single-day foundation course should carry.
The first three moves clean up reasoning. This one cleans up you. Motivated reasoning — quietly building the case for what you already want to believe — is the failure that survives all the others, because it feels exactly like thinking.
The antidote is a discipline, not a mood: before you judge a position, state the strongest version of it — the version its smartest honest advocate would recognise as their own. Not the weakest version you can dismiss (a strawman), but the one that would actually be hard to beat. Only then have you earned the right to disagree.
Anatol Rapoport & Daniel Dennett. The game theorist Anatol Rapoport set out rules for criticising a position well; Daniel Dennett later made them famous. The first and hardest: before you say a single word against a view, re-express it so clearly and fairly that its holder says "thank you — I wish I'd put it that way myself." Only then have you earned the right to disagree. That standard is the rubric below, compressed into one sentence.
"Companies should abolish annual performance reviews entirely."
Whatever you think of that, your job is to build its strongest case — the version a thoughtful proponent would endorse. Not a caricature you can knock over.
Now score your own steelman honestly:
"People who want to scrap reviews just don't want to be held accountable. Without reviews, no one knows who's underperforming and the whole company drifts. It's an excuse to avoid hard conversations."
"The annual review compresses a year into one snapshot dominated by recency bias, and by tying feedback to pay it makes people defensive rather than curious — so the feedback lands worst exactly when it matters most. Replacing the ritual with frequent, low-stakes, forward-looking check-ins can produce more honest coaching and better performance. Several large firms did precisely this and reported gains, which is why the position is worth taking seriously, not dismissing."
The strawman attacks the people and their motives. The steelman attacks the strongest reason and grants what's true. You can now disagree with the steelman if you like — but you'll be disagreeing with the real thing, which is the only disagreement worth having.
Now run it cold
The real test isn't steelmanning a position you were handed — it's one you actually reject. Pick the one below you most disagree with, and build the case its smartest honest advocate would recognise.
You can ask an AI to "steelman the other side" and it will produce something fluent instantly. But a model will just as fluently produce a flattering strawman if that's what your prompt leans toward. The rubric above is exactly the instrument you use to tell a real steelman from a comfortable one — whether a human or a machine wrote it.
This is the assessment. Not a quiz — an artifact. A learner who can run all four moves on a real AI output and produce a coherent audit card has demonstrated the competence the day promised. Collect the cards; they're far more diagnostic than any multiple-choice score.
Encourage participants to audit an output that actually matters to them — a real recommendation they're weighing. The transfer to Monday morning is the entire return on this course.
Four moves. One output. This is where it becomes a habit instead of a lesson. Take a real answer from an AI — ideally one you were about to act on — and run the full audit. When you're done, you'll have a card you keep: the four questions, applied, in your own hand.
The machine's five ways of being confidently wrong
"Audit the AI" is too vague to act on. So here is the specific list. Large language models fail in characteristic ways — and each one is caught by one of the four moves you just learned. This table is the heart of what Bloom AI University means by cultivated, not commanded: you don't fear the tool or worship it, you audit it.
Notice that none of these are caught by knowing more facts, or by a better prompt. They're caught by the same four questions you'd run on a human. The machine didn't create the need for clear thinking — it raised the price of not having it.
Choose what to audit
Pull the output apart. Each label stays on screen — write beside it.
The output audited
I · The leap
II · The hidden premise
III · How I'd know it's wrong
IV · The strongest objection
Not a certificate of attendance — a reflex. Four questions you can run on any claim, your own or a machine's, in under a minute: Where's the leap? What's it standing on? How would I know it's wrong? What's the strongest case against it? Run them before you act, and the confident wrong answer loses its power over you.
Move I · The Atom
"What's the claim, the evidence, and where's the leap?"
The tell: a confident voice and clean formatting — that's usually where the leap is hiding. Catches: fluency mistaken for correctness.
Move II · The Hidden Premise
"What would have to be true for this to follow?"
The tell: the assumption too obvious to state out loud is the one doing the work. Catches: an AI building on your unchecked premise.
Move III · The Falsification Test
"How would I know if I'm wrong?"
The tell: if nothing observable could count against it, it isn't knowledge. Catches: confident fabrication and your own overconfidence.
Move IV · The Steelman
"What's the strongest version of the other side?"
The tell: if a real proponent wouldn't recognise it, you built a strawman. Catches: motivated reasoning and an agreeable machine.
The vocabulary of clear thinking
- Claim
- What is being asserted or recommended — the destination of an argument.
- Grounds (evidence)
- What is actually observed or shown; the part you could, in principle, go and check.
- Warrant (inference / the leap)
- The often-unstated move that carries you from grounds to claim. Where assumptions live.
- Enthymeme
- Aristotle's name for an argument with an unstated premise — that is, almost all of them.
- Falsifiability
- The property of a claim that it forbids some observation — that it could, in principle, be shown false (Popper).
- Calibration
- The match between how confident you are and how often you're right. The goal is not less confidence; it's accurate confidence (Tetlock).
- Steelman
- The strongest, most charitable version of a position — the one its best advocate would endorse. The opposite of a strawman.
- Motivated reasoning
- Quietly assembling the case for what you already want to believe. It feels exactly like thinking.
- Illusory truth effect
- The tendency for statements that are easy to process or often repeated to feel truer, regardless of whether they are.
- Automation bias
- The tendency to over-trust and under-scrutinise outputs that come from an automated system.
Facilitation & Assessment
This is a single-day, ~5-hour course. The engines carry the learning; your job is to protect the commit-before-reveal sequence and run the debriefs. The capstone Audit Card is the assessment — grade the reasoning, not the polish.
Suggested arc (5 hours)
Audit Card rubric
| Criterion | Developing | Proficient | Exemplary |
|---|---|---|---|
| I · Finds the leap | Restates the output; claim and evidence blurred. | Separates claim, evidence, and the inference between them. | Names the precise sentence doing the unearned work. |
| II · Names the premise | Lists a premise that's true but not load-bearing. | Identifies the assumption the conclusion depends on. | Shows how the conclusion collapses if it fails — in their own context. |
| III · Real disconfirmer | Vague or unfalsifiable ("if it's just wrong"). | Specific and observable. | Specific, observable, and cheap to check soon. |
| IV · Honest steelman | Strawman; attacks motives. | Fair restatement of the strongest objection. | An objection a real proponent would thank them for. |
Discussion prompts: Where did the AI's fluency do the most persuading? Which of the five failure modes showed up in your specimen? What would you now ask the AI before acting on its next answer?