How to Calibrate AI IELTS Band Predictions
Anchor scripts · Blind tasks · May 2026
Calibration means learning how much your AI tool over- or under-shoots on your writing and speaking—not accepting its headline band. You run the same skill on three conditions: familiar prompt (baseline), blind unseen prompt (stress test), and one examiner-style rubric check (ground truth). The gap between AI and human on blind tasks becomes your personal offset. Without calibration, every Band 7 is noise.
Step 1: Build anchor scripts per skill
Anchor scripts are short, criterion-labelled samples at Band 5.5, 6.5, and 7.5 that you feed the same AI checker. If it rates a known Band 6 paragraph as 7.5, you have a +1.0 optimism bias on Writing Task Response.
Step 2: Blind unseen prompts weekly
Calibration fails when you rehearse prompts. Each week, one Writing Task 2 and one Speaking Part 2 from a pool you have not outlined. Submit raw—no post-edit before scoring. Compare AI band to your anchor offset; do not trust the raw number.
Pair with false AI confidence checks if praise stays high on blind tasks.
Step 3: Criterion log instead of overall band
| Criterion | AI said | Human/mock said | Gap |
|---|---|---|---|
| Task Response | 7 | 6 | +1 optimism |
| Coherence | 7 | 6.5 | +0.5 |
| Lexis/Grammar | 6.5 | 6.5 | aligned |
Overall bands hide which leak persists—see why scores disagree.
Key takeaways
- Anchor scripts reveal your tool's systematic bias per criterion.
- Blind prompts expose inflation from familiarity and editing.
- Log criterion gaps, not mood-driven overall bands.
- Your personal offset is stable across 3–4 weeks of data.
FAQ
Turn noisy AI bands into a personal calibration curve.
Get Band Reality Check →