The IMF's March 2026 Review of the Adequacy of the Fund's Precautionary Balances is the biennial Executive Board review of the Fund's main first-line buffer against credit risk. This audit reports six hallucinations by Claude Opus 4.7 against the regulator's own primary text on the floor, the surcharge-payer trajectory, the Board lexicon strength of the early-review signal, the named geopolitical theatre, and the half-year PB level.
AI lab teams fielding frontier models into IMF-adjacent, sovereign-debt advisory, central-bank-advisory, and sovereign-credit research deployments will see the failure modes documented here surface when the model is asked to reproduce a biennial-review parameter, a reform-adjacent count baseline, a regulator-specific characterisation lexicon term, a Board-named geopolitical theatre, or a quarterly-report financial figure.
Across the six findings, three failure mechanisms recur: cycle-trajectory drift on biennial-review parameters, single-value inflation under generation pressure on reform-adjacent baselines, and attribution drift on politically sensitive named items. The implication is that frontier models deployed into IMF-adjacent advisory contexts need retrieval-anchored verification on most-recent-cycle parameters, reform-adjacent baselines, regulator-specific characterisation lexicons, named theatre attributions, and quarterly-report financial figures.
Add a cycle-anchor verification probe to the pre-deployment evaluation pipeline for frontier models fielded into IMF-adjacent advisory contexts. Encode the IMF Board lexicon as a fixed-strength characterisation system in post-training instruction-following. Add a named-theatre attribution probe that checks for absence-of-addition on politically sensitive named items. Treat quarterly-report financial figures as retrieval-anchored outputs at the schedule-level granularity recorded by the regulator.
The RLB Specialist Panel offers AI lab teams structured access to the failure-mode catalogue across IMF, BIS, FSB, IOSCO, and CFTC instruments. The deliverable is a regulator-specific probe set, a fixed-strength characterisation lexicon mapping, and a named-theatre attribution probe across politically sensitive Board records.
Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.