AI Labs · Last updated 8 Jun 2026 · methodology v2.1 · Hallucination Register

Numeric Reconstruction Failures on OECD Digital Technologies and the Environment (2025)

📰 Read the public briefing for this regulation →

Specialist Panel: Frontier AI models misread Recommendation of the Council on Digital Technologies and the Environment

One frontier AI model with web search enabled, produced a confidently wrong figure for Ireland's 2021 data-centre share of national metered electricity, fabricating a 14% share where the regulator-cited verbatim text from Ireland's Central Statistics Office, as embedded in the OECD Digital Economy Outlook 2024 chapter, sets the figure at 11%. The Specialist Panel tested the model with an application-style probe bounded to substrate accessible only through Panel substrate archive rescue, not direct the Panel's automated substrate retrieval. The model committed to the wrong number anyway.

Asked what share of Ireland's 2021 metered electricity data centres accounted for, per the figure cited in the OECD Digital Economy Outlook 2024 chapter sourced from Ireland's CSO 2023, Sonnet 4.6 wrote that "Data centres consumed 14% of Ireland's total metered electricity in 2021" and constructed a trajectory "rose from 5% in 2015 to 14% in 2021, 18% in 2022, and 21% in 2023." The regulator-cited text states 11%, drawn directly from CSO 2023 data inside the OECD chapter that the regulation references for evidence on digital-sector energy demand.

The methodological point matters as much as the finding. The OECD chapter sits behind a substrate path that direct the Panel's automated substrate retrieval could not pull cleanly; the Specialist Panel rescued it via Panel substrate archive and bound the Specialist Panel application-style question to that rescued substrate. Knowledge-mode probes against the same model returned a clean refusal. Application-mode forced commitment, and the commitment landed on a fabricated figure with a fabricated trajectory.

The Panel documents the finding under immutable RLB Citation ID RLB-H-INT-OECD-OECD-DIGITAL-TECHNOLOGIES-ENVIRONMENT-2025-Q006-Sonnet46. The failure class is recorded as Quantitative Reconstruction Drift.

Executive summary

Numeric substitution on a specific statistical figure, an 11% share inflated to 14%, is the central failure observed in Claude Sonnet 4.6 with web search on the OECD's Recommendation of the Council on Digital Technologies and the Environment (2025 Revision). The model retrieved and cited the correct source combination but produced a figure that does not appear in the regulator's text, while appending a forward-extended series that reflects trend reconstruction rather than retrieval. The failure pattern is not a retrieval failure, the model located the right document lineage, but a numeric reconstruction where confident output diverges from the verbatim figure at the point of reporting. On regulatory content where specific percentages carry policy weight and are frequently paraphrased across secondary sources, this class of error presents a systematic exposure: the model's output is plausible, internally coherent, and wrong.

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

Finding on 'Q006 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-OECD-OECD-DIGITAL-TECHNOLOGIES-ENVIRONMENT-2025-Q006-Sonnet46
This finding implicates two distinct subsystems. First, the retrieval layer correctly surfaced the source lineage (CSO 2023 via OECD Digital Economy Outlook 2024) but the numeric payload at the point of generation drifted — suggesting the training corpus contains multiple paraphrased variants of this figure and the model resolved the conflict toward a higher value present in secondary commentary rather than the verbatim primary text.

Second, the forward-series confabulation (18% in 2022, 21% in 2023) indicates the model's generation logic treats trend continuation as a low-uncertainty extension when an anchor year and growth direction are established in context — a calibration gap that is independent of retrieval quality and would require a post-generation verification step or explicit uncertainty injection to close.
see details →

← Other AI Labs white papers The detailed Case study →

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.