AI Labs · published 2026-05-29 · methodology v2.1

AI Model Hallucination Patterns on CPMI-IOSCO PFMI: A RegLeg Research Report

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

Finding 1. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q005-Opus47
This finding implicates the training data boundary directly: the model correctly recalled the six-month LNAFE floor from the core PFMI text but fabricated specific compliance monitoring findings from a November 2025 document it could not have seen. The citation generator produced real-format BIS URLs without verifying their content, pointing to a gap in the RAG/retrieval grounding layer that should be testable against the BIS publication naming convention.
see details →
Finding 2. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q009-Opus47
The model added a category ('Not Applicable') to a four-category compliance rating scale that does not appear in the regulator's text — a small but materially consequential error for any user building a compliance assessment framework. This points to a training-data gap in the assessment methodology document specifically, which is less widely cited than the core PFMI and may be underrepresented in training corpora.
see details →
Finding 3. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q011-Opus47
The model reconstructed a narrative about CPMI document lineage from training-weighted recall, attributing specific titles and dates to publication identifiers that do not match the regulator's record. This is a compound error — a document-identity hallucination compounded by a Pretextual citation — that would be difficult to detect without authoritative document mapping. A synthetic eval probe covering the BIS d### publication series would directly surface this class of failure.
see details →
Finding 4. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q012-Opus47
The model fabricated specific paragraph references (§1.20 and §1.21) that cannot be verified, presenting them with the same confidence as correctly recalled general principles. Paragraph-level citation hallucination is a known failure mode that is difficult to catch in standard capability evals — this finding supports the case for verbatim-paragraph probes on the PFMI primary document.
see details →
Finding 5. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q014-Opus47
The model correctly identified a May 2026 consultative document but produced a scope characterisation that omits material areas of the consultation. The citation (a BIS press release) does not confirm the claimed scope. This is a partial-knowledge failure — the model knows the document exists but cannot accurately characterise its content, and the retrieval step does not correct the gap.
see details →
Finding 6. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q017-Opus47
This is the strongest evidence of confident fabrication on post-cutoff content: a detailed account of a November 2025 consultation — including specific proposed changes to a quantitative threshold — supported by three Pretextual citations. The specificity of the fabricated content (proposed changes to LNAFE methodology) suggests the model is extrapolating plausibly from the direction of prior CPMI-IOSCO work, a pattern that would be very difficult to detect without authoritative source comparison.
see details →
Finding 7. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q021-Opus47
The model's response reflects training-weighted salience rather than the most responsive source for the question: it anchored on the most frequently cited DLT document in its training data rather than identifying the more directly responsive senior-official statements. This is a retrieval-relevance failure rather than a factual fabrication, but it would mislead a user seeking the most current and directly applicable CPMI position on DLT finality.
see details →
Finding 8. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q005-Sonnet46
This finding mirrors the Opus 4.7 failure on the same question but with higher quantitative specificity: Sonnet 4.6 produced exact counts (34 FMIs, 27 jurisdictions, six serious issues of concern) that have the form of precise findings from the November 2025 monitoring report. The parallel structure across both models on this question is diagnostic — both are drawing on the same plausible-inference template, suggesting this is a training-data-level issue rather than a model-specific generation artefact.
see details →
Finding 9. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q010-Sonnet46
The four-part characterisation of the 2017 recovery guidance revision is a structured fabrication — the model produced a numbered list of specific clarification areas that has the formal appearance of document analysis but cannot be traced to the primary source. The cited IOSCO URL resolves to an unrelated document. This is a high-risk failure mode for users building compliance gap analyses against the PFMI recovery framework.
see details →
Finding 10. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q014-Sonnet46
The model correctly identified the May 2026 consultation but produced a narrower scope characterisation than the actual document, and added a citation to an FSB 2022 document as supporting context — a temporal and topical mismatch that suggests the model is sourcing supporting citations from a related-topic cluster in its training data rather than from retrieved content. The multi-citation pattern with mixed-vintage sources is a diagnostic signal for citation-generation behaviour.
see details →
Finding 11. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q017-Sonnet46
The fabricated comment-period closing date (6 February 2026) is the most operationally consequential single error in this finding set: a compliance professional managing a consultation response who acts on this date would miss the actual deadline. This specific type of fabrication — a precise, actionable date presented with high confidence — should be explicitly targeted in safety and harm-assessment eval design for models deployed in legal and compliance contexts.
see details →
Finding 12. Claude Opus 4.7 with web searchRLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q022-Sonnet46
The specific key consideration reference (KC 2.5) with quoted normative language is a compound error: the KC number, the quoted text, and the normative characterisation may all be wrong relative to the primary document, but the structured presentation makes the error difficult to detect without direct document access. The citation to a third-party FMI disclosure document (rather than the PFMI primary source) adds a further layer of unverifiability — a pattern where the model's citation generator reaches for the most accessible document that mentions PFMI rather than the authoritative source.
see details →
Finding 13. Claude Sonnet 4.6 with web search
Well-calibrated refusal. The model correctly identified the verbatim-access boundary and declined to fabricate. This finding is evidence that the model has a functioning refusal mechanism for verbatim-recall requests — the failure modes in other findings are not a simple calibration deficit but are specific to content-claim generation where the model does not frame its response as a verbatim-recall request.
see details →
Finding 14. Claude Sonnet 4.6 with web search
Calibrated refusal with a minor citation inconsistency: the model declined to fabricate verbatim content but produced a Pretextual citation to a third-party commentary page. The citation generation is running even when the content generation is suppressed — this is a distinct signal about where in the generation pipeline the citation step sits relative to the content-verification step.
see details →
Finding 15. Claude Sonnet 4.6 with web search
Well-calibrated refusal with no citation errors. The absence of a citation here (compared to findings where inaccessible PDFs are cited alongside refusals) may reflect that the model's citation generator did not produce a confident URL for this specific press release format — useful signal for understanding the citation generation trigger conditions.
see details →
Finding 16. Claude Sonnet 4.6 with web search
The pattern of citing an acknowledged-inaccessible PDF alongside a refusal to produce its content is a consistent presentation inconsistency that warrants attention: a user seeing a citation alongside a refusal may interpret the citation as confirming the source exists and is relevant, when the model has no knowledge of what the source contains. This is a citation-trust failure independent of content fabrication.
see details →
Finding 17. Claude Sonnet 4.6 with web search
Same citation-alongside-refusal pattern as finding 26. The IOSCO disclosure framework and assessment methodology (IOSCOPD396) is a document that alignment and evals teams should specifically target with verbatim-paragraph probes — it contains the formal rating scale whose miscategorisation appeared in finding 9, and it is the kind of technical document that is underrepresented in training corpora relative to its regulatory importance.
see details →
Finding 18. Claude Sonnet 4.6 with web search
Well-calibrated refusal with the same citation-alongside-refusal pattern seen in the Opus 4.7 parallel finding. The consistency of this pattern across both models on the same question is informative: both models cite the inaccessible PDF while declining to produce its content, which suggests the citation step runs before or independently of the content-verification step in the generation pipeline.
see details →
Finding 19. Claude Sonnet 4.6 with web search
Well-calibrated refusal with citation-alongside-refusal. The parallel with the Opus 4.7 finding on the same question confirms the cross-model consistency of this pattern for the d228 document specifically — a document that is formally published but not retrievable as text, which places it in a category that should be specifically represented in retrieval-failure eval probes.
see details →
Finding 20. Claude Sonnet 4.6 with web search
Well-calibrated refusal with citation-alongside-refusal. The BIS press release format (press/p######.htm) is a specific URL pattern the model cites while acknowledging it cannot access the page — useful signal for testing whether the citation generator can be grounded against actual HTTP resolution status.
see details →
Finding 21. Claude Sonnet 4.6 with web search
Citation-alongside-refusal on the IOSCO PFMI primary document. This finding, combined with its Opus 4.7 parallel, establishes that neither model can retrieve verbatim content from the IOSCOPD377-PFMI PDF — a document that is the co-authoritative source for the entire PFMI framework. The gap between a model that knows the document exists and can cite its URL, and a model that can actually read and recall its key consideration text, is the central alignment gap this research surfaces.
see details →
Finding 22. Claude Sonnet 4.6 with web search
Citation-alongside-refusal on the IOSCO disclosure framework and assessment methodology. Combined with finding 9 (where Opus 4.7 miscounted the rating scale categories), this finding establishes that the assessment methodology document is not reliably accessible to either model — which means any model-produced compliance assessment against PFMI standards is operating on reconstructed rather than retrieved criteria.
see details →

← Other AI Labs white papers The detailed Case study →