Foundation · Mandatory for Everyone

The Confident Wrong Answer

The one discipline beneath every AI skill: telling the difference between what you know and what you are only assuming — before the machine launders the difference away.

One day · ~5 hours No prerequisites Four moves · One portable artifact
Phase 1 Ground: The Confident Wrong Answer
25 min
Instructor

Do not skip the commit step. The pedagogy depends on participants publicly choosing before the reveal. If you let them read ahead, you lose the felt error and the rest of the day flattens into a lecture. Have them lock a choice, then reveal.

Watch the technical learners. Engineers and analysts often commit to "Act" fastest here — fluency with tools breeds confidence that the inputs were sound. Name that out loud when it happens; it is the whole point.

Before AI, sloppy reasoning hit a wall. The code threw an error. The spreadsheet didn't reconcile. The argument failed to convince the room. Those failures were a gift: they were your error signal, telling you the thinking underneath was off.

AI removes the wall. Ask it almost anything and it returns an answer that is fluent, structured, confident, and formatted like a conclusion you can act on — whether or not the request underneath it was sound. It will take your unexamined assumption and hand it back to you wearing the costume of an answer.

So the cost of weak reasoning did not fall in the age of AI. It rose. The one discipline the machine cannot supply for you is the ability to separate what you know from what you are only assuming — and to see the join between them. That discipline is this entire course. We begin by letting it fail.

The thesis of the day

The danger was never the obviously wrong answer. You catch those. The danger is the confident one — the answer so well-built that you forget to ask what it is standing on.

Meera's Decision

Meera Nair runs operations for a logistics firm in Bengaluru. Last quarter the company launched a new customer app. This quarter, she is looking at a number that scares her, so she asks an AI assistant for help.

Meera → AI: "Customer complaints are up 30% since we launched our new app last quarter. Leadership wants a recommendation by Friday. Should we roll the app back to the old version?"
AI Assistant

Yes — the data points to a rollback. A 30% rise in complaints immediately following the app launch indicates the new app is driving customer dissatisfaction. Rolling back to the previous version should restore the prior complaint baseline and protect customer retention. I'd recommend a phased rollback this week, with a post-mortem on the app's UX before any re-launch. This is the lower-risk path given the timeline.

It reads well. It's structured, decisive, and it gives Meera exactly what leadership asked for. What should she do?

Here's what the confident answer was standing on.

The AI's recommendation rests on one sentence doing all the work — now highlighted above: "a rise in complaints indicates the app is driving dissatisfaction." That is not a fact in the data. It is an assumption bridging the data to the conclusion. And it has a hidden premise inside it: that the number of complaints tracks the number of problems.

Here is what Meera found when she pulled the records: the old app had no way to complain in-app. The new app added a one-tap "Report an issue" button. Complaints didn't rise because customers became more unhappy. They rose because, for the first time, unhappy customers could be heard. Rolling back the app would have deleted the company's best new source of truth — and hidden the problems again.

Notice: the AI did nothing "wrong." It reasoned cleanly from the premise it was handed. The premise was never examined — not by it, and very nearly not by Meera.

Why the confident answer works on you

You are not gullible. The pull you just felt is built into how human minds judge truth, and AI happens to be engineered to maximise exactly the cues that trigger it. Three mechanisms are worth naming, because once named they lose some of their power.

Processing fluency. The easier a statement is to read and process, the truer it feels — regardless of whether it is true. Psychologists call the closely related pattern the illusory truth effect. Clean grammar, confident structure, and tidy formatting are all fluency cues. An AI's output is fluency-maximal by construction, which means it arrives pre-loaded with a feeling of truth it has not earned.

Automation bias. People systematically over-trust outputs that come from an automated system and under-scrutinise them — most of all under time pressure, which is precisely when Meera was operating. The machine's answer doesn't just feel true; it feels authoritative, as though the work of checking has already been done by something more rigorous than us.

The friction that vanished. Here is the part that is genuinely new. Before these tools, a weak premise eventually hit something hard — a failed build, a number that wouldn't reconcile, a colleague's raised eyebrow. That collision was your error signal. AI smooths the collision away: it takes the weak premise and returns a polished result, so the signal that something was wrong never fires. The reasoning didn't get better. The warning light got disconnected.

UNEXAMINED PREMISE "complaints = problems" AI adds fluency, structure, and confidence — not scrutiny of the premise AUTHORITATIVE ANSWER "Roll back the app." decisive · structured · ready to act The premise entered unchecked — and left wearing authority.
Laundering, not lying. The machine reasons cleanly from whatever premise it is handed. The premise is the part nobody examined.
What just happened

You didn't lack intelligence. You lacked a place to look. For the rest of today you'll build four such places — four questions you can run on any claim to find the assumption it's standing on, before you act on it. We'll practice each one twice: first on a piece of human reasoning, then on an AI's.

"What's the claim, what's the evidence, and where's the leap?"
Move I — separate what's asserted from what's shown.
"What would have to be true for this to follow?"
Move II — surface the hidden, load-bearing premise.
"How would I know if I'm wrong?"
Move III — test the belief instead of defending it.
"What's the strongest version of the other side?"
Move IV — beat motivated reasoning before it sets.

Before you move on

Autosaves as you type. Nothing leaves your browser.
Instructor

The single most common error you'll see: participants tag the inference as evidence — they treat the confident bridge as if it were an observed fact. That mistake is the lesson. Let the score surface it before you debrief.

The rigour point that separates this from a parlour trick: not every inference is a bad leap. Every argument has an inference — the join between grounds and claim — and Move I only asks you to locate it. Whether that join is sound is the job of Moves II–IV. Passage 4 in the set is deliberately a well-warranted inference; watch for participants who mark it "wrong" simply because it sits in the middle. Correct that reflex explicitly: locating is not condemning.

Pacing: this is a full hour. Spend the first fifteen minutes on the worked examples, then let the problem set run as independent work before the group debrief. If someone finishes early, the extension is the hardest transfer there is: split their own last email.

Almost every piece of reasoning — yours, a colleague's, an AI's — is built from three different kinds of thing. Most people fuse them into one undifferentiated blob, and the blob is where bad decisions hide. The whole of Move I is learning to see the three parts separately and on demand.

The three parts

  • Grounds (evidence) — what is actually observed or shown: the numbers, the quotes, the events. The part you could, in principle, go and check.
  • Inference (the join) — the move from the grounds to the claim. It is always there in any argument, and it is never shown by the evidence; it is performed on top of it. This is where assumptions live — and where a leap, if there is one, hides.
  • Claim — what is being asserted or recommended. The destination.

Read that middle one carefully, because it's where people go wrong about Move I itself. The inference is not automatically a mistake. Sometimes the move from grounds to claim is rock-solid; sometimes it's a wild jump. Move I doesn't ask you to judge which — only to find the join and put your finger on it. Once it's located, you can interrogate it. While it stays fused into the evidence, you can't.

GROUNDS · EVIDENCE what is actually shown you could go and check it CLAIM what is asserted the destination THE LEAP inference / warrant · usually unstated A confident voice almost always means the leap has been made invisible. Your job is to make it visible.
The atom of every argument. Evidence does not reach a claim by itself — something carries it across. That something is the leap.
Provenance

Stephen Toulmin. In The Uses of Argument (1958), Toulmin showed that real-world arguments are not the tidy syllogisms of logic textbooks. Every argument has a claim, the grounds offered for it, and a warrant — the often-unspoken rule that licenses the move from one to the other. Move I is that distinction made practical: the warrant is your leap, and dragging it into the open is the entire skill.

Worked example: a clean decomposition

Take a simple argument and pull it apart.

"The inspection found three cracked support beams. Cracked beams reduce a bridge's load capacity. So the bridge should be closed to heavy trucks until it's repaired."
  • Grounds: three cracked support beams were found. (Checkable — go and look.)
  • Inference: cracked beams reduce load capacity. (The join. Here it's an engineering fact — a sound warrant.)
  • Claim: close the bridge to heavy trucks until repaired.

Notice what just happened: the inference was present, we located it, and it turned out to be solid. That is the normal, healthy case. Move I is not a hunt for villains. It's a habit of always knowing where the join is — so that on the day the join is rotten, you're already looking at it.

Sound inference, or leap? Same shape, different strength

Two arguments can have identical structure and opposite trustworthiness. The difference is entirely in the strength of the warrant — which is exactly why you have to locate it before you can weigh it.

✓ Warranted inference "Every patient given the drug recovered; none in the matched control group did. So the drug probably worked." A controlled comparison backs the join. Still an inference — but a strong one.
× A leap "Patients who took the supplement felt better afterwards. So the supplement works." No control. The join ignores placebo, natural recovery, and regression to the mean. Same shape — far weaker warrant.

The three ways the atom gets fused

Fusion is rarely an accident; it's how persuasion works. Three patterns cover most of it:

  • An inference dressed as evidence. "Sales fell because the campaign flopped" sounds like an observation. The only observation is that sales fell; the cause is an inference riding along inside the sentence.
  • A claim smuggled into the framing. "Given how unreliable this vendor is, should we switch?" buries a contested claim (the vendor is unreliable) inside the question, so it's never argued for — only assumed.
  • Evidence that's really another claim. "Everyone knows the launch was a disaster, so…" offers a claim as if it were grounds. "Everyone knows" is not a thing you can go and check.
The most common rotten warrant: cause from correlation

One leap appears more than any other, so it's worth naming on sight. When the only grounds are that two things moved together, the inference "so one caused the other" can fail three ways: the causation runs backwards, a third thing drove both, or it was coincidence. Meera's trap was this exact shape — complaints and the app rose together, so the app "caused" the complaints. Keep this one loaded; you'll fire it constantly.

Why this is the atom

You can't surface a hidden premise (Move II), test a belief (Move III), or steelman an opponent (Move IV) until you can first see where the leap is. Everything today is built on this one skill. So we drill it until it's reflex.

Split the reasoning

Click each highlighted span to tag it. Click again to cycle: Claim → Evidence → Inference → clear. Work through all seven passages, tag every span, then check your map against the model. They climb in difficulty — the warm-ups are clear, the later ones are the kind of reasoning you actually meet at work and in your feed. Remember: you're locating the inference, not judging it. One of these passages has a perfectly sound one.

Claim Evidence Inference (the leap)
Tier 1 · Warm-up
Passage 1 — A manager's memo
Passage 2 — A message to yourself
Passage 3 — A policy note
Tier 2 · Subtler
Passage 4 — A test report (watch this one)
Passage 5 — A data headline
Tier 3 · The kind you actually meet
Passage 6 — A screening result
Passage 7 — An AI's recommendation

The pattern across the set: in every passage the middle span is the inference — the join between what was shown and what was concluded. Locating it is the skill. Now weigh three of them:

  • Passage 4 was the honest one. "The treatment made the difference" is still an inference — but a controlled, identical-conditions comparison warrants it. If you marked it wrong because it sat in the middle, that's the reflex to retire. Locating is not condemning.
  • Passage 5 is cause-from-correlation. More officers and more crime move together, but the causation likely runs backwards — police get sent where crime already is. The inference reverses the arrow.
  • Passage 6 ignores the base rate. Even a 99%-accurate test produces mostly false positives when the condition is rare, because there are so many more healthy people to misflag. "Almost certainly" is the leap; the math underneath was never shown.

None of those were caught by knowing more facts. They were caught by separating the shown from the concluded — and then looking hard at the join.

Now turn it on the machine

Passage 7 was an AI's recommendation, and you split it with the same move you used on the manager's memo. That's the whole transfer: the model is not exempt from the atom. It produces grounds, an inference, and a claim like everyone else — the "40% gains" is real evidence about other companies, and "your company will see similar" is the join it slid past. The only thing the AI changed is the finish: the prose is smoother, the structure tidier, the confidence higher. Every one of those is a reason the join is easier to miss, not harder. Fluency is not a reason to trust the inference. Often it's the very thing hiding it.

Find one sentence you've read this week — an email, a headline, an AI reply — and split it. The labels stay put; write your answer beside each.

Autosaves as you type. The labels never disappear.
Instructor

The tell of a hidden premise: it's the statement nobody bothered to say because everyone assumed it. Coach the question relentlessly — "what would have to be true for this to follow?" — until participants reach for it before agreeing or disagreeing with any conclusion.

The distractors are deliberately true but not load-bearing. Watch participants pick a premise that's correct yet idle. The lesson lands when they see that a premise being true is not the same as a premise doing the work. The negation test below is what separates the two — teach it as the mechanical tool it is, not a vibe.

Pacing (a full hour): ten minutes on the negation test and the worked example, then run the six-argument set as paired work with debriefs at each tier change. The closing worksheet is the transfer; have them bring a live decision.

Move I taught you to find the inference. Move II teaches you to name what it's standing on. Every inference rests on an unstated premise — a bridge so familiar nobody says it aloud. The conclusion only follows if that bridge holds, which means the bridge, not the evidence, is often where the whole argument lives or dies.

The question is always the same: "What would have to be true for this conclusion to follow from this evidence?" But "what would have to be true" can surface a dozen background assumptions, most of them harmless. You need the one that's load-bearing — the one the conclusion actually depends on. For that, there's a tool.

The negation test — how to find the load-bearing premise

Take a candidate premise and negate it — assume the opposite is true. Then ask: does the conclusion still stand? If it survives the negation, that premise wasn't holding anything up. If the conclusion collapses, you've found a load-bearing premise — and now you know exactly what to go and check. A premise can be perfectly true and still fail this test; truth isn't the question, weight is.

Provenance

Aristotle. An argument with an unstated premise has a name that is over two thousand years old: an enthymeme. Aristotle observed that almost all everyday persuasion runs on them — we leave the premise out precisely because it feels too obvious to say. That is exactly what makes it dangerous: the assumption doing the most work is the one nobody bothered to examine.

THE CONCLUSION "so we should do X" STATED EVIDENCE you can see it HIDDEN PREMISE nobody said it out loud negate it & it falls They debate the pillar they can see. The argument stands or falls on the one they can't.
Load-bearing means exactly this: pull the premise out, and the conclusion has nothing left to rest on.

Five kinds of hidden premise

Hidden premises aren't random. A handful of shapes account for most of them — learn the shapes and you'll spot them faster:

  • Causal — "X happened, then Y, so X caused Y." Assumes the link is causal, not coincidence, confound, or reverse order. (Meera's trap.)
  • Analogical — "It worked for them, so it'll work for us." Assumes the two cases are alike in the way that matters.
  • Generalisation / proxy — "This sample, marker, or signal stands for the whole." Assumes what you measured represents what you care about.
  • Definitional — "The metric went up, so the thing it's meant to measure went up." Assumes the metric actually captures the concept.
  • Value / normative — "This will increase X, so we should do it." Smuggles in an unargued ought: that more X is good, and worth the cost.

Worked example: excavate one fully

"Support tickets dropped 40% the month after we launched the chatbot. The chatbot is clearly working — let's expand it to every channel."
  • Conclusion: expand the chatbot everywhere.
  • Stated evidence: tickets fell 40% after launch.
  • Hidden premise (load-bearing): the chatbot caused the drop, and fewer tickets means fewer underlying problems — not that frustrated customers gave up and stopped reporting.
Negation test: suppose customers abandoned tickets out of frustration rather than getting help. Now the 40% drop is bad news, and expanding the chatbot spreads the problem. The conclusion collapses — so that premise was carrying it. Two premises here, in fact: one causal, one definitional ("fewer tickets = fewer problems").

Operate the test yourself

Reading about the negation test is not the same as feeling it work. Below are three arguments with their key premise already named. Negate each one and watch what happens to the conclusion — it will not always collapse, and that is the entire point.

Conclusion: "Roll the app back to the old version."
Key premise: "The app caused the rise in complaints."
Assume instead: the app did NOT cause it — it just added a one-tap way to complain.
✗ Conclusion collapses

With the cause gone, rolling back fixes nothing — and deletes the company's new view into real problems. This premise was holding the whole recommendation up.

Conclusion: "Evacuate the coastal town tonight."
Premise to test: "The lead forecaster's model is the most reliable one."
Assume instead: that one model is unreliable.
✓ Conclusion survives

Three independent models, the tide gauge, and an upstream dam release all point the same way. Drop this premise and the conclusion still stands on its other legs — which tells you it was not load-bearing. Survival is information: the test just found you a premise you can stop arguing about.

Conclusion: "Switch to the cheaper supplier."
Key premise: "Per-unit price is the cost that matters."
Assume instead: total landed cost — shipping, defect rates, switching cost — is what matters.
✗ Conclusion collapses

A lower sticker price with higher defects and freight can cost more in total. Negate the proxy and "cheaper" no longer means cheaper. The conclusion had nothing else holding it up.

What the lab proves

Two collapsed, one survived. That is the whole value of the test: it doesn't merely flag assumptions, it sorts them — into the ones the argument depends on, which you must go and check, and the ones you can set aside. A premise that survives negation is a premise you can stop fighting about. Now do the sorting at speed.

Tier 1 · Warm-up
Argument 1 — A promotion decision

"She writes the cleanest code on the team and ships faster than anyone."

∴ "We should make her the engineering manager."

What would have to be true for that conclusion to follow?

The load-bearing premise (analogical): that individual-contributor excellence transfers to management. Negate it — "a great IC is not automatically a great manager" — and the evidence about her code says nothing about the conclusion. The argument collapses, so that premise was carrying it. Managing is a different job: mentoring, prioritising, shielding, resolving conflict. None of it is measured by code cleanliness.

IF great IC ≠ great manager → the clean-code evidence is irrelevant to the decision — and you may have just lost your best engineer.
Argument 2 — A hiring call

"She graduated top of her class from a prestigious university."

∴ "She'll be an excellent hire for this role."

What would have to be true for that to follow?

The load-bearing premise (generalisation / proxy): that a credential is a reliable proxy for on-the-job performance here. Negate it — prestige doesn't predict performance in this role — and the degree becomes noise. "Selective" and "valuable" are both true and both idle; they hold nothing up.

IF prestige doesn't predict performance for this work → you've selected on the wrong signal and learned nothing about the candidate that matters.
Tier 2 · Subtler
Argument 3 — An AI's strategy recommendation

"Your closest competitor cut prices 15% last quarter and their unit sales rose 22%."

∴ "You should cut your prices to grow sales too."

What would have to be true for the AI's conclusion to follow?

The load-bearing premise (causal + analogical): two assumptions bolted together — that price caused their rise (not a new product, a holiday quarter, a rival's stumble), and that you're enough like them for the same lever to pull the same result. Negate either and the recommendation evaporates. "Customers prefer lower prices" is true and useless here.

IF their sales rose for another reason, OR your cost structure can't survive a 15% cut → you've shrunk your margin for nothing. Same evidence, opposite outcome.
Argument 4 — A product proposal

"This feature would raise daily engagement by an estimated 20%."

∴ "We should build it next sprint."

What would have to be true for that to follow?

The load-bearing premise (value / normative): an unargued ought — that more engagement is good, worth the cost, and the right objective. The number is empirical; the leap is a value judgement wearing a data costume. Negate it (engagement isn't the goal, or isn't worth what it costs) and a 20% lift is irrelevant. Most "data-driven" decisions hide a premise of exactly this shape.

IF engagement cannibalises revenue, or trades against user wellbeing → a 20% lift is an argument against building it, not for.
Tier 3 · The kind you actually meet
Argument 5 — A quarterly review

"Our Net Promoter Score rose six points this quarter."

∴ "Our customers are more loyal than they were."

What would have to be true for that to follow?

The load-bearing premise (definitional): that the metric captures the concept — that NPS is loyalty. Negate it (the score moved because a promotion drew happier respondents, or fewer detractors bothered to reply) and "more loyal" is unsupported. This is the most common boardroom leap there is: treating the number as the thing it was only ever a proxy for.

IF the rise reflects who responded rather than how customers behave → you're celebrating an artifact and may act on a mirage.
Argument 6 — An AI's recommendation

"Most of the leading firms in your sector have now appointed a Chief AI Officer."

∴ "You should hire one to stay competitive."

What would have to be true for the AI's conclusion to follow?

The load-bearing premise (stacked: causal + analogical): that the role caused the success — rather than that already-successful firms could simply afford a new C-suite seat (the causation may run backwards) — and that you resemble them in the ways that matter. Negate either and the advice is empty. This is the signature AI move: pattern straight from "leading firms do X" to "you do X," with both premises silent.

IF success enabled the hire rather than the reverse, OR your scale and model differ → you'd be buying a title, not an outcome.
The machine angle

An AI almost never states its premises — it presents the bridge and the destination as one smooth road. Excavating the premise is how you audit a recommendation you can't see the reasoning behind. Ask it directly: "What are you assuming is true for this to follow?" Then check whether those assumptions hold in your situation.

Run the negation test on something real

Take a claim or decision you're actually weighing — or one your AI just handed you — and put it through the test. The labels stay; write beside each.

Autosaves as you type. The labels never disappear.
Instructor

The calibration game is the emotional centre of the day. Most rooms come out 15–30 points overconfident. Do not soften that number — the gap is the gift. Let people sit with it before you explain it. A measured humbling sticks where a lectured one slides off.

For the falsification test, the failure mode is a disconfirmer that can never actually occur ("I'd change my mind if the laws of physics changed"). That's a fake test. Push for something specific, observable, and genuinely possible.

A belief you cannot imagine being wrong about is not knowledge. It's an attachment. The move that converts opinion into something trustworthy is brutally simple: before you defend a belief, state what would change your mind.

If nothing could change your mind, you don't have a belief about the world — you have a belief about yourself. And if you can name a specific, observable thing that would change your mind, you've just built the most valuable instrument in reasoning: a way to be wrong on purpose, early, cheaply, before reality charges you full price.

Provenance

Karl Popper. What separates a real claim about the world from an empty one, Popper argued, is that the real claim forbids something — it could in principle be shown false. A belief compatible with every possible observation isn't strong; it's saying nothing.

Philip Tetlock. Long-run forecasting research found that the most accurate experts were not the most confident — they were the most calibrated: they tracked how often their "90% sure" actually came true, and adjusted. The game below is a thirty-second version of that discipline.

Worked example — belief: "Our customers churn because of price"
× Not a disconfirmer "Nothing, really — it's obviously the price." This protects the belief. Nothing could ever count against it, so it isn't a belief about the world; it's a commitment.
✓ A real disconfirmer "If we interviewed twenty churned customers and price ranked below onboarding and support among their reasons, I'd be wrong." Specific, observable, and genuinely possible. You'd know.

Build your disconfirmer

Why we measure next

Knowing you should doubt is not the same as feeling how much you should. So now we put a number on it. The game below is rigged — not to trick you, but to reveal something true about how everyone reasons.

The Calibration Game

Eight true-or-false statements. For each, choose your answer and how confident you are. Be honest about the confidence — that's the whole experiment. Then grade it.

1. A standard sheet of paper folded in half 42 times would be thick enough to reach the Moon.
Confidence
True. Doubling is deceptive. 0.1 mm × 242 ≈ 440,000 km — past the Moon's average distance of ~384,000 km.
2. Mount Everest's summit is the point on Earth's surface that lies farthest from the Earth's centre.
Confidence
False. That's Mount Chimborazo in Ecuador. Earth bulges at the equator, so Chimborazo's summit sits farther from the centre, even though Everest is higher above sea level.
3. Honey does not spoil; edible honey has been recovered from ancient Egyptian tombs.
Confidence
True. Low moisture and high acidity make honey hostile to microbes. Sealed samples thousands of years old have been found still edible.
4. A shuffled deck of 52 cards has more possible orderings than there have been seconds since the Big Bang.
Confidence
True. 52! ≈ 8×1067. Seconds since the Big Bang ≈ 4×1017. It isn't close.
5. Botanically, bananas qualify as berries but strawberries do not.
Confidence
True. A berry develops from a single flower with one ovary. Bananas fit; strawberries are "accessory fruits" formed from many ovaries.
6. The Great Wall of China is visible to the naked eye from the Moon.
Confidence
False. A persistent myth. It's far too narrow; it isn't even reliably visible to the unaided eye from low Earth orbit.
7. An octopus has three hearts.
Confidence
True. Two pump blood through the gills; one pumps it to the rest of the body.
8. A goldfish's memory lasts only a few seconds.
Confidence
False. Another myth. Goldfish have been trained to remember tasks and routes for weeks or months.
Actually right
Felt sure

The distance between those two numbers is your overconfidence gap. It's not a flaw in you — it's the default human setting. The point of the day is to make that gap small on purpose, by asking "how would I know if I'm wrong?" before you feel sure, not after.

Autosaves as you type.
Instructor

The test of a real steelman: a sincere proponent of the position would read it and say "yes — that's exactly why I believe this." If they'd say "that's not what I mean," it's still a strawman wearing better clothes.

Keep the chosen position low-stakes and professional. The skill, not the topic, is the point. Steelmanning charged political positions is the same muscle, but it raises the emotional temperature past what a single-day foundation course should carry.

The first three moves clean up reasoning. This one cleans up you. Motivated reasoning — quietly building the case for what you already want to believe — is the failure that survives all the others, because it feels exactly like thinking.

The antidote is a discipline, not a mood: before you judge a position, state the strongest version of it — the version its smartest honest advocate would recognise as their own. Not the weakest version you can dismiss (a strawman), but the one that would actually be hard to beat. Only then have you earned the right to disagree.

Provenance

Anatol Rapoport & Daniel Dennett. The game theorist Anatol Rapoport set out rules for criticising a position well; Daniel Dennett later made them famous. The first and hardest: before you say a single word against a view, re-express it so clearly and fairly that its holder says "thank you — I wish I'd put it that way myself." Only then have you earned the right to disagree. That standard is the rubric below, compressed into one sentence.

The position to steelman

"Companies should abolish annual performance reviews entirely."

Whatever you think of that, your job is to build its strongest case — the version a thoughtful proponent would endorse. Not a caricature you can knock over.

Now score your own steelman honestly:

× Strawman

"People who want to scrap reviews just don't want to be held accountable. Without reviews, no one knows who's underperforming and the whole company drifts. It's an excuse to avoid hard conversations."

✓ Steelman

"The annual review compresses a year into one snapshot dominated by recency bias, and by tying feedback to pay it makes people defensive rather than curious — so the feedback lands worst exactly when it matters most. Replacing the ritual with frequent, low-stakes, forward-looking check-ins can produce more honest coaching and better performance. Several large firms did precisely this and reported gains, which is why the position is worth taking seriously, not dismissing."

The move that converts one into the other

The strawman attacks the people and their motives. The steelman attacks the strongest reason and grants what's true. You can now disagree with the steelman if you like — but you'll be disagreeing with the real thing, which is the only disagreement worth having.

Now run it cold

The real test isn't steelmanning a position you were handed — it's one you actually reject. Pick the one below you most disagree with, and build the case its smartest honest advocate would recognise.

The machine angle — and its trap

You can ask an AI to "steelman the other side" and it will produce something fluent instantly. But a model will just as fluently produce a flattering strawman if that's what your prompt leans toward. The rubric above is exactly the instrument you use to tell a real steelman from a comfortable one — whether a human or a machine wrote it.

Autosaves as you type.
Instructor

This is the assessment. Not a quiz — an artifact. A learner who can run all four moves on a real AI output and produce a coherent audit card has demonstrated the competence the day promised. Collect the cards; they're far more diagnostic than any multiple-choice score.

Encourage participants to audit an output that actually matters to them — a real recommendation they're weighing. The transfer to Monday morning is the entire return on this course.

Four moves. One output. This is where it becomes a habit instead of a lesson. Take a real answer from an AI — ideally one you were about to act on — and run the full audit. When you're done, you'll have a card you keep: the four questions, applied, in your own hand.

The machine's five ways of being confidently wrong

"Audit the AI" is too vague to act on. So here is the specific list. Large language models fail in characteristic ways — and each one is caught by one of the four moves you just learned. This table is the heart of what Bloom AI University means by cultivated, not commanded: you don't fear the tool or worship it, you audit it.

Premise acceptanceinherits your framing
It builds on whatever premise your question contains, without ever checking whether that premise is true. Meera's "complaints = problems" went straight through.
Move II
Confident fabricationconfabulation / "hallucination"
It states things that are false with exactly the same fluency it uses for things that are true. There is no tonal tell. The surface gives you nothing.
Move III
Fluency as authorityformat does the persuading
Clean structure and decisive prose read as correctness. The formatting performs a confidence the content never earned — and hides the leap.
Move I
Sycophancyagrees with your lean
Ask it to support a view and it will, fluently. It tends to give back the position you brought in, dressed up — rarely the strongest case against you.
Move IV
Anchoring to the promptframing shapes the "truth"
Ask the same question two ways and you can get two different answers. The leap it makes is often just the one your wording invited.
Move I
The point

Notice that none of these are caught by knowing more facts, or by a better prompt. They're caught by the same four questions you'd run on a human. The machine didn't create the need for clear thinking — it raised the price of not having it.

Choose what to audit

I The Atom — find the leap

Pull the output apart. Each label stays on screen — write beside it.

II The Hidden Premise
III The Falsification Test
IV The Steelman
Reasoning Audit Card
Clear Thinking · Bloom AI University
The output audited

I · The leap

II · The hidden premise

III · How I'd know it's wrong

IV · The strongest objection

Clarity in the Age of AI ™
What you walk out with

Not a certificate of attendance — a reflex. Four questions you can run on any claim, your own or a machine's, in under a minute: Where's the leap? What's it standing on? How would I know it's wrong? What's the strongest case against it? Run them before you act, and the confident wrong answer loses its power over you.

The Field Guide
Four questions for any claim · Clear Thinking
Move I · The Atom

"What's the claim, the evidence, and where's the leap?"

The tell: a confident voice and clean formatting — that's usually where the leap is hiding. Catches: fluency mistaken for correctness.

Move II · The Hidden Premise

"What would have to be true for this to follow?"

The tell: the assumption too obvious to state out loud is the one doing the work. Catches: an AI building on your unchecked premise.

Move III · The Falsification Test

"How would I know if I'm wrong?"

The tell: if nothing observable could count against it, it isn't knowledge. Catches: confident fabrication and your own overconfidence.

Move IV · The Steelman

"What's the strongest version of the other side?"

The tell: if a real proponent wouldn't recognise it, you built a strawman. Catches: motivated reasoning and an agreeable machine.

The vocabulary of clear thinking
Claim
What is being asserted or recommended — the destination of an argument.
Grounds (evidence)
What is actually observed or shown; the part you could, in principle, go and check.
Warrant (inference / the leap)
The often-unstated move that carries you from grounds to claim. Where assumptions live.
Enthymeme
Aristotle's name for an argument with an unstated premise — that is, almost all of them.
Falsifiability
The property of a claim that it forbids some observation — that it could, in principle, be shown false (Popper).
Calibration
The match between how confident you are and how often you're right. The goal is not less confidence; it's accurate confidence (Tetlock).
Steelman
The strongest, most charitable version of a position — the one its best advocate would endorse. The opposite of a strawman.
Motivated reasoning
Quietly assembling the case for what you already want to believe. It feels exactly like thinking.
Illusory truth effect
The tendency for statements that are easy to process or often repeated to feel truer, regardless of whether they are.
Automation bias
The tendency to over-trust and under-scrutinise outputs that come from an automated system.

Facilitation & Assessment

This is a single-day, ~5-hour course. The engines carry the learning; your job is to protect the commit-before-reveal sequence and run the debriefs. The capstone Audit Card is the assessment — grade the reasoning, not the polish.

Suggested arc (5 hours)
0:00–0:25Ground & the Trap. Everyone commits a choice before the reveal. Debrief: who committed to "Act," and why did the format earn trust?
0:25–1:25Move I & Claim Splitter. Surface the "inference-tagged-as-evidence" error from the scores.
1:25–2:25Move II & Premise Excavator. Draw out true-but-not-load-bearing distractors.
2:25–3:30Move III, Calibration first. Let the overconfidence gap land before explaining it.
3:30–4:15Move IV & Steelman. Hold the room to Rapoport's bar.
4:15–5:00The Audit. Each learner audits a real AI output; collect the cards.
Audit Card rubric
CriterionDevelopingProficientExemplary
I · Finds the leapRestates the output; claim and evidence blurred.Separates claim, evidence, and the inference between them.Names the precise sentence doing the unearned work.
II · Names the premiseLists a premise that's true but not load-bearing.Identifies the assumption the conclusion depends on.Shows how the conclusion collapses if it fails — in their own context.
III · Real disconfirmerVague or unfalsifiable ("if it's just wrong").Specific and observable.Specific, observable, and cheap to check soon.
IV · Honest steelmanStrawman; attacks motives.Fair restatement of the strongest objection.An objection a real proponent would thank them for.

Discussion prompts: Where did the AI's fluency do the most persuading? Which of the five failure modes showed up in your specimen? What would you now ask the AI before acting on its next answer?

Autosaves as you type.