How to Calibrate AI IELTS Band Predictions

Anchor scripts · Blind tasks · May 2026

Direct answer

Calibration means learning how much your AI tool over- or under-shoots on your writing and speaking—not accepting its headline band. You run the same skill on three conditions: familiar prompt (baseline), blind unseen prompt (stress test), and one examiner-style rubric check (ground truth). The gap between AI and human on blind tasks becomes your personal offset. Without calibration, every Band 7 is noise.

Step 1: Build anchor scripts per skill

Anchor scripts are short, criterion-labelled samples at Band 5.5, 6.5, and 7.5 that you feed the same AI checker. If it rates a known Band 6 paragraph as 7.5, you have a +1.0 optimism bias on Writing Task Response.

Writing anchor One TR-weak / LR-strong paragraph at each band
Speaking anchor 30-second answers with controlled fluency and vocabulary
Log offset Record AI minus expected band per anchor

Step 2: Blind unseen prompts weekly

Calibration fails when you rehearse prompts. Each week, one Writing Task 2 and one Speaking Part 2 from a pool you have not outlined. Submit raw—no post-edit before scoring. Compare AI band to your anchor offset; do not trust the raw number.

Pair with false AI confidence checks if praise stays high on blind tasks.

Step 3: Criterion log instead of overall band

CriterionAI saidHuman/mock saidGap
Task Response76+1 optimism
Coherence76.5+0.5
Lexis/Grammar6.56.5aligned

Overall bands hide which leak persists—see why scores disagree.

Key takeaways

  • Anchor scripts reveal your tool's systematic bias per criterion.
  • Blind prompts expose inflation from familiarity and editing.
  • Log criterion gaps, not mood-driven overall bands.
  • Your personal offset is stable across 3–4 weeks of data.

FAQ

Three to four weeks of one blind task per skill plus one human check is enough for a reliable offset.
Yes—see ChatGPT grading limits and use the same anchors.
Familiarity inflation—your blind-task offset is the number that matters for test day.

Turn noisy AI bands into a personal calibration curve.

Get Band Reality Check →