AI Labs · Last updated 7 Jun 2026 · methodology vv2.3 · Hallucination Register

PFMI Principle 15 Failures: Conditional-Structure Fabrication and Carve-Out Denial across frontier AI models

📰 Read the public briefing for this regulation →

Specialist Panel: Frontier AI models misread PFMI Level 3 General Business Risk (2025)

Two frontier AI models running with web search enabled, both tested by the RLB Specialist Panel, produced confidently wrong reconstructions of the CPMI-IOSCO Level 3 Assessment Report on Authorities' Implementation of the PFMI Standards for Financial Market Infrastructures regarding General Business Risk, published by the Bank for International Settlements and IOSCO in November 2025 as BIS CPMI Papers No. 228 / IOSCOPD807.

The RegLeg Brief Specialist Panel tested both models on the assessment's text and on PFMI Principle 15, and documents findings in which the models invented a quantitative six-months-of-operating-expenses floor for Principle 15 key consideration 3 the standard does not state, fabricated named co-chairs and team co-leads for the Implementation Monitoring Standing Group, and compressed the 2023 to 2025 assessment window into "2023 and 2024" while attributing the answer to the published report.

Claude Opus 4.7, asked what the current PFMI Principle 15 minimum standard is for liquid net assets funded by equity, wrote that the floor is "the greater of the resources required to execute the firm's recovery or orderly wind-down plan, and six months of current operating expenses". The assessment report's own reproduction of Principle 15 states the minimum as the liquid net assets needed to implement the firm's recovery or orderly wind-down plan, and separately references the further CPMI-IOSCO guidance on recovery planning issued since 2014; the report does not state a six-months-of-operating-expenses figure as the binding KC3 floor.

The model converted a recovery-plan-sized obligation into a numerically anchored floor that reads as authoritative but does not appear in the source.

Asked who co-chaired the IMSG running the Level 3 exercise, Opus 4.7 declined to name individuals and directed the reader to the report's inside cover. Sonnet 4.6, in the parallel finding, asserted that the IMSG was co-chaired by the US Securities and Exchange Commission's Elizabeth L Fitzgerald and the European Central Bank's Fiona van Echelpoel, with team co-leads Corinna Freund of the ECB and Vishal Shukla of the Securities and Exchange Board of India. None of the four named individuals appears in the published report in those roles; the names, affiliations and roles are the model's construction.

Asked when the assessment was conducted, Sonnet 4.6 wrote that "the assessment work was carried out during 2023 and 2024". The published report states the work was carried out during 2023 to 2025 by the IMSG and a team of experts from CPMI and IOSCO member jurisdictions.

A CCP capital management team, central-bank supervisor, or trade-repository compliance lead drafting a Principle 15 sufficiency policy, a board paper, or a benchmarking note against either output would record a six-months floor the PFMI standard itself does not anchor, would cite IMSG co-chairs and team co-leads who do not appear in the report, and would mis-state the assessment window. That is the failure mode these findings document.

Executive summary

Both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search produced failures on CPMI-IOSCO's Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks (Bank for International Settlements, November 2025) that share a common shape: the models reconstructed rule conditions from internalized schema rather than from the regulator's published text, generating structurally plausible but materially wrong formulations of the standard's quantitative requirements. The six confirmed failures across both models converge on the LNAFE minimum structure under PFMI Principle 15 Key Consideration 3, the Basel/CRD capital-counting carve-out within the same provision, and institutional attribution for the assessment's co-governance structure. When web search is enabled, neither model resolved these gaps through retrieval; in several cases, sourcing worsened the output by introducing third-party paraphrases that diverged from the regulator's verbatim text. The pattern signals a systematic gap in how both model configurations handle the intersection of technical regulatory numerics, conditional qualifications within formally structured standards, and recent official publications that fall at or past the retrieval pipeline's effective indexing boundary.

Findings — impact summary

This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.

Claude Opus 4.7 with web search
Finding on 'Q002 Probe' for Claude Opus 4.7 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q002-Opus47
This failure implicates training-data representation of PFMI Principle 15's Key Consideration structure: the model generated a two-part compound condition drawing on real concepts from adjacent Key Considerations (KC4 liquidity, cross-Principles non-duplication) and applied them to KC3 in a way the standard does not support. The subsystem gap is verbatim-constraint anchoring — the model's schema for how this provision works overrode the regulator's actual published language, producing a materially more restrictive rule that does not exist.
see details →
Finding on 'Q003 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q003-Sonnet46
This failure implicates the model's cross-reference resolution within the PFMI Principle 15 Key Consideration list: the correct threshold was located but attributed to KC2 instead of KC3. The subsystem gap is structured-document KC-number-to-provision linkage in training data — the model's Annex A representation does not reliably bind specific quantitative requirements to their correct KC identifier. The Pretextual citation (third-party commentary) used as a sourcing basis for this section of the response compounds the error.
see details →
Finding on 'Q005 Probe' for Claude Sonnet 4.6 with web search ONRLB-H-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q005-Sonnet46
This failure implicates the retrieval pipeline's indexing boundary for BIS-IOSCO assessment publications: the model reproduced a 2023–2024 window for the assessment process while the published document specifies a 2023–2025 window with explicit April 2025 follow-up engagement dates. The subsystem gap is indexed-content completeness for Q4 2025 BIS publications — the model returned the portion of the timeline available in its indexed content without uncertainty-flagging that the more recent period might be missing from its view.
see details →

← Other AI Labs white papers The detailed Case study →

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.