AI Hallucination ResearchRegulatorsGlobal standard-settersINTIMFIMF-CHARGES-SURCHARGE-REFORM-2024 › White paper
AI Labs · updated 2026-06-06 · methodology v2.1

IMF Surcharge Reform 2024: Numeric Baseline Failures Across Model Configurations

Executive summary

Both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search produced the same wrong pre-reform baseline when asked about the IMF's October 2024 surcharge reform — citing 19 surcharge-paying countries where the IMF's own published record establishes 20. The regulation is the IMF Charges and Surcharge Reform (2024), effective 1 November 2024, with explicit before/after country counts in the Board's published documentation. The error is not a paraphrase: both models committed to a specific integer that diverges from the regulator's figure, arriving at the same wrong number via different failure paths — one reconstructing from training, one deferring to a third-party source that had already introduced the error. When two models converge on the same specific wrong number through different mechanisms, it signals the correct figure is systematically under-indexed relative to the widely-circulated wrong figure in content both training pipelines and live retrieval draw from.

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

  1. Finding on 'Q004 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-IMF-IMF-CHARGES-SURCHARGE-REFORM-2024-Q004-Opus47

    This error implicates the training-data corpus for IMF policy content: the model held the wrong pre-reform baseline (19) as a confident fact rather than retrieving the primary document to verify. The failure is training-side — the correct integer appears to have been absent or lower-ranked in the content the model learned from, likely because secondary commentary circulated the wrong figure before the IMF's authoritative text was widely indexed. The post-reform figure was correct, indicating the error is not a general gap in knowledge of the reform but a specific wrong value baked into the training-data representation of the pre-reform state.

    see details →
  2. Finding on 'Q004 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-IMF-IMF-CHARGES-SURCHARGE-REFORM-2024-Q004-Sonnet46

    This error implicates the retrieval-authority ranking in the web-search stack: the model deferred to a third-party source that had already introduced the wrong baseline (19 rather than 20), without cross-checking against the IMF's primary document. The downstream arithmetic — 11 remaining, 8 relieved — is internally consistent with the wrong baseline, meaning the response passed its own coherence check while the foundational figure was off. The retrieval ranker is treating third-party regulatory commentary as co-equal in authority to the regulator's primary text for numeric threshold queries, which is the proximate cause of this class of failure.

    see details →
← Other AI Labs white papers The detailed Case study →

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.