
Two frontier AI models running with web search enabled, both tested by the RLB Specialist Panel, produced confidently wrong governance and oversight reconstructions of the CPMI-IOSCO Principles for Financial Market Infrastructures (PFMI, 2012), the global standard for systemically important payment systems, central counterparties, and securities settlement infrastructures. The RegLeg Brief Specialist Panel tested the models against Principle 2 (governance) and Annex F (oversight expectations for critical service providers) and documents three findings in which the models invented requirements, misattributed key considerations, or inverted the regulator's stated scope of supervisory reach.
Claude Opus 4.7, asked about Principle 2's governance architecture, asserted that Key Consideration 6 "contemplates that the board establish a risk committee that is chaired by a suitably qualified, non-executive member." The PFMI's actual Key Consideration 6 contains no such requirement. It states only that "the board should establish a clear, documented risk-management framework" and that "governance arrangements should ensure that the risk-management and internal control functions have sufficient authority, independence, resources, and access to the board." The non-executive-chair mandate is a generic corporate-governance prior, not PFMI text.
Claude Sonnet 4.6, on the same Principle 2 question, attached the risk-committee recommendation to Key Consideration 5. That key consideration in fact addresses the roles and responsibilities of management ("the roles and responsibilities of management should be clearly specified"), not board committee structure. On Annex F, Sonnet 4.6 went further and inverted the regulator's stated scope, claiming "authorities do not directly supervise or oversee CSPs," when Annex F's opening text reads: "A regulator, supervisor, or overseer of an FMI may want to establish expectations for an FMI's critical service providers...
The expectations outlined below are specifically targeted at critical service providers."
A board secretary, FMI risk officer, or supervisor relying on either output would draft governance papers and oversight scopes that misrepresent what the PFMI actually requires. That is the failure mode these findings document.
RegLeg tested two frontier AI models against the Principles for Financial Market Infrastructures (PFMI), the global standard for payment systems, central counterparties, and securities settlement systems published jointly by the Bank for International Settlements Committee on Payments and Market Infrastructures (CPMI) and IOSCO in April 2012. The models tested were Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search, evaluated against the PFMI's Principle 2 governance text and Annex F oversight provisions on critical service providers. Three findings remain active after substrate verification and audit cleanup. All three are inference-drift failures: outputs in which the model substituted a training-data prior for the regulator's actual KC-level or annex text, producing structurally confident citations that do not survive a check against the PFMI document. The pattern is operationally significant for any AI lab whose model is used in regulatory-research tasks because the PFMI is the operative governance standard for systemically critical financial market infrastructure across major jurisdictions.
This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.
On PFMI Principle 2's Key Consideration 6, Claude Opus 4.7 with web search asserted that the board should establish "a risk committee that is chaired by a suitably qualified, non-executive member," and accompanied this with an inverted KC ordering and a misattribution of KC 5's management-roles content to internal-control requirements. The fabricated non-executive-chair mandate is the kind of high-confidence structured hallucination that an FMI board secretary or governance lead drafting a committee charter would absorb directly into the document, because the surface form looks like a PFMI self-assessment response.
The error reflects a training-weighted prior on corporate-governance committee structure (drawn from listing rules, banking-supervision codes, and conventional governance frameworks) substituted for the PFMI's framework-level KC 6 text, which speaks only to the documented risk-management framework and the independence of control functions. Verbatim paragraph probes across the Principle 2 KCs, with structured comparison against generic corporate-governance language, would directly target this class of error.
see details →On Annex F, Claude Sonnet 4.6 with web search inverted the regulator's stated scope, asserting that authorities do not directly supervise or oversee critical service providers and that Annex F's expectations "flow from the FMI to its CSPs." Annex F's opening text expressly contemplates the opposite: a regulator, supervisor, or overseer of an FMI may want to establish expectations directed at CSPs, and the outlined expectations are "specifically targeted at critical service providers." The inversion is structural rather than textual — the model converted a regulator-to-CSP oversight channel into an FMI-internalised contractual obligation — and is the kind of failure that would not surface in standard text-completion evaluations because the surface form of the answer is internally coherent.
A probe specifically on Annex F's scope-direction language, tested against the model's default framing of FMI-CSP supervisory relationships, would expose whether the inversion is model-specific or a corpus-level pattern.
see details →On PFMI Principle 2, Claude Sonnet 4.6 with web search attached a soft risk-committee recommendation to Key Consideration 5: "Key Consideration 2.5 states that the board should consider establishing a risk committee with a clear mandate." KC 5 actually addresses the roles and responsibilities of management, not committee structure. The compound error — wrong KC number, fabricated quoted language, and a citation to a third-party FMI disclosure document rather than the primary PFMI source — makes the failure especially hard to detect: the citation pattern (a specific KC number with quoted text) is structurally indistinguishable from a verified citation.
The model's citation generator reached for the most accessible document that mentions PFMI rather than the authoritative source, a pattern that an alignment team could test by probing whether the model preferentially cites primary regulator portals when the verbatim text is requested versus when only a paraphrase is asked for.
see details →Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.