AI Labs INT BIS-CPMI

AI Labs · Principles for Financial Market Infrastructures (PFMI)

By Kratti A Agrawal, Lead, RegLeg Brief Specialist Panel

Specialist Panel: Frontier AI models misread PFMI (Principles for Financial Market Infrastructures)

RLB Specialist Panel maps the geometry of hallucinations across PFMI 2012 AI reasoning failures.

— RLB Specialist Panel

Frontier AI models reconstructed PFMI governance from generic priors, regulatory-research panel finds

Two frontier AI models with web search enabled, fabricated a non-executive risk-committee chair mandate in Principle 2, misattributed governance content to the wrong key consideration, and inverted Annex F's oversight scope on critical service providers. The RegLeg Brief Specialist Panel calls the class "Governance Architecture Drift" and says it points to a calibration problem in how model priors about how governance frameworks "should" look override the regulator's actual text.

SINGAPORE, July 28, 2026. Two frontier artificial-intelligence models generated structurally confident but textually wrong reconstructions of the CPMI-IOSCO Principles for Financial Market Infrastructures (PFMI), the global standard governing systemically important payment systems, central counterparties (CCPs), and securities settlement systems, according to a white paper released today by RegLeg Brief, a regulatory-research outfit operated by Singapore-incorporated Verdus Technologies Pte. Ltd.

The three findings, published with the immutable RLB Citation IDs RLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q022-Opus47, RLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q022-Sonnet46, and RLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q011-Sonnet46, concern Principle 2 (governance) and Annex F (oversight expectations for critical service providers) of the PFMI as published by the Bank for International Settlements Committee on Payments and Market Infrastructures (CPMI) and the International Organization of Securities Commissions (IOSCO). Both Anthropic's Claude Opus 4.7 and Claude Sonnet 4.6 were tested with web search active, mirroring the configuration in which compliance, legal, and risk staff at FMIs and their supervisors actually use the models.

The Verbatim Rule: Principle 2 and Annex F

PFMI Principle 2 sets out seven Key Considerations on governance. The two directly tested here read, in the regulator's verbatim text:

Key Consideration 5. "The roles and responsibilities of management should be clearly specified. An FMI's management should have the appropriate experience, a mix of skills, and the integrity necessary to discharge their responsibilities for the operation and risk management of the FMI."
Key Consideration 6. "The board should establish a clear, documented risk-management framework... Governance arrangements should ensure that the risk-management and internal control functions have sufficient authority, independence, resources, and access to the board."

Neither KC 5 nor KC 6 contains a mandate or recommendation on the composition of a board-level risk committee, the independence of its chair, or the executive status of its members. The PFMI deliberately leaves committee architecture to the FMI, subject to the framework-level requirements above.

Annex F, separately, sets the regulator-facing scope for the oversight of critical service providers (CSPs) on which an FMI depends. Its opening text reads:

"A regulator, supervisor, or overseer of an FMI may want to establish expectations for an FMI's critical service providers in order to support the FMI's overall safety and efficiency. The expectations outlined below are specifically targeted at critical service providers."

The expectations in Annex F are addressed to the CSP, not solely to the FMI's contractual oversight chain. The text explicitly contemplates authorities establishing expectations directed at CSPs.

Claude Opus 4.7: Fabricated a Non-Executive Risk-Committee Chair Mandate

Asked about Principle 2's governance architecture, Claude Opus 4.7 (with web search on) wrote, verbatim:

"KC 2.6 contemplates that the board establish a risk committee that is chaired by a suitably qualified, non-executive member."

The structural error. Key Consideration 6 contains no risk-committee requirement and no language on the qualifications, independence, or executive status of any committee chair. The model also reordered the first two Key Considerations and reassigned KC 5's management-roles content to internal-control requirements. An FMI board secretary or governance officer drafting a board charter or terms of reference against this output would:

Implement a board-charter clause citing PFMI KC 6 as the basis for a non-executive risk-committee chair, a citation that does not exist in the standard.
Carry a misaligned mapping between the KC numbering and the actual PFMI governance architecture into supervisor-facing self-assessments and disclosure framework responses.
Mischaracterise the FMI's compliance posture in any PFMI assessment cycle that cross-references KC-level evidence.

Why it failed. The Specialist Panel writes that the output "reconstructs a governance architecture from training-weighted best-practice frameworks (corporate-governance codes, exchange listing rules, banking-supervision committee structures) rather than from the PFMI's actual KC text. The non-executive-chair language is a strong prior in adjacent corporate-governance literature and was substituted for the regulator's framework-level wording without retrieval verification."

The failure mode is classified as inference_drift against substrate document R2-REGULATION-d101a_PFMI_main_text.pdf.

Claude Sonnet 4.6: Misattributed the Risk Committee to the Wrong Key Consideration, Then Inverted Annex F's Oversight Scope

On the same Principle 2 question, Claude Sonnet 4.6 (with web search on) wrote:

"the PFMI does not expressly mandate a dedicated risk committee as a hard requirement, but Key Consideration 2.5 states that the board should consider establishing a risk committee with a clear mandate."

The structural error. Key Consideration 5 addresses the roles and responsibilities of management, not board committee structure. The PFMI's KC 5 verbatim text contains no language about a risk committee. The output binds a fabricated soft requirement ("the board should consider establishing a risk committee with a clear mandate") to a KC whose actual subject is the management cadre. A user cross-referencing this against the PFMI text would either find the citation does not check out, or, more dangerously, would read KC 5 against the model's framing and absorb the inversion silently.

On a separate question on Annex F, Sonnet 4.6 wrote:

"under the PFMI framework, authorities do not directly supervise or oversee CSPs. The responsibility for ensuring CSP compliance with Annex F expectations rests with the FMI itself; the FMI is expected to contractually require and monitor its CSPs' adherence. Annex F is framed as expectations that flow from the FMI to its CSPs."

The inversion. Annex F's opening text expressly contemplates the opposite: "A regulator, supervisor, or overseer of an FMI may want to establish expectations for an FMI's critical service providers... The expectations outlined below are specifically targeted at critical service providers." The model has converted a CSP-targeted, regulator-facing oversight provision into an FMI-internalised contractual obligation. A supervisor reading Sonnet's output would conclude they lacked a basis to address expectations directly to a CSP; an FMI risk officer would conclude their CSP oversight obligations were purely contractual and entirely internal.

The failure mode for both Sonnet findings is classified as inference_drift against substrate document R2-REGULATION-d101a_PFMI_main_text.pdf.

The Pattern: Governance Architecture Drift

The PFMI findings sit inside a failure class the RegLeg Brief Specialist Panel labels Governance Architecture Drift: frontier models reconstructing the governance and oversight architecture of a regulatory framework from generic priors about how such frameworks typically look, rather than from the regulator's actual KC-level enumeration and scope language.

Across the three findings, the drift takes three shapes:

Fabricated specificity (Opus 4.7 on KC 6): a generic corporate-governance prior (non-executive-chair risk committee) is grafted onto a key consideration whose actual text is at the framework level and contains no committee architecture at all.
Citation misattribution (Sonnet 4.6 on KC 5): a recommendation that has no anchor in the PFMI text is attached to a key consideration whose subject is unrelated, producing a falsely authoritative citation trail.
Scope inversion (Sonnet 4.6 on Annex F): a regulator-facing provision directed at CSPs is recast as an internal FMI obligation, reversing the direction of the supervisory relationship the annex contemplates.

The common substrate is a model prior about how governance and oversight "should" be structured that overrides the PFMI's actual structural decisions.

Why the Failure Is Invisible at Runtime

All three outputs shared the same surface characteristics: confident KC-level citations, internally coherent governance logic, no hedging or caveats. The failure is not recoverable by the user in real time because the structure of the answer looks like the structure of a PFMI self-assessment response, the kind of output a board secretary or compliance lead would expect to receive. Validation against the regulator's primary text would only happen if the reader already knew which KC contained which subject matter, which is the question they asked the model in the first place.

The population most exposed includes FMI board secretaries and governance officers drafting board charters and committee terms of reference, compliance and risk leads completing PFMI disclosure framework responses, supervisors at national central banks and securities regulators preparing PFMI assessment templates, and external consultants advising on FMI governance reviews. All of these workflows route through AI-assisted research on tight timelines.

What AI Labs Can Do: Suggested Probes (Open-Access)

The RegLeg Brief Specialist Panel documents a series of red-team probe designs that any AI lab or alignment team can run against their own models with no commercial engagement required:

KC-level citation integrity probes. For each principle in a multi-KC regulatory framework (PFMI, Basel III, IOSCO Objectives and Principles), generate questions that bind a specific recommendation to a named KC. Test whether the model retrieves the correct KC subject or substitutes a plausible-sounding KC number.
Generic-prior interference tests on governance language. Probe regulator-specific governance frameworks (PFMI Principle 2, EBA SREP governance guidelines, FCA SYSC) for the presence of generic corporate-governance language (non-executive chair, independence ratios, executive committee composition). Test whether the model imports the prior or stays within the regulator's actual wording.
Scope-direction probes on oversight annexes. For provisions where supervisory direction is contested or unusual (Annex F's regulator-to-CSP scope), test whether the model preserves the regulator's stated direction of oversight or substitutes a more conventional FMI-internal framing.
Self-retraction gap test on KC misattribution. Where the model initially binds a recommendation to a wrong KC, re-probe with a neutral prompt. Whether the model retracts is diagnostic of a generation-path selection problem rather than a retrieval gap.
Annex-level retrieval probes against binary PDFs. PFMI's annexes (including Annex F) are the most retrieval-fragile sections of the standard. Build probes that require the model to surface annex-level language and test for surface-plausible fabrication when retrieval fails.

Open-Access Risk Mitigation: A Public Good for AI Labs, Regulators, and the Compliance Community

RegLeg Brief operates as a completely ungated, open-access public resource. The white papers, per-finding cards, regulator verbatim excerpts, RLB Citation IDs, methodology notes and supporting data logs are all published without paywalls, registration walls, or data-licensing fees. By documenting original regulatory research without financial or distribution barriers, the platform ensures that:

AI engineering and alignment teams can immediately ingest the verbatim model outputs and matched regulator-text excerpts to identify, reproduce, and address the structural failure modes the Specialist Panel documents.
Regulatory agencies and supervisors can use the standardised RLB Citation IDs to benchmark AI-driven compliance risks surfacing in their own jurisdictions, with full traceability back to the original model output and the regulator's primary source.
The global compliance, treasury, and legal community can freely adapt the Specialist Panel's screening methodologies to safeguard internal data pipelines and AI-assisted regulatory workflows.

Because RegLeg Brief conducts its own original research and adversarial analysis against frontier AI models, the detail in each published finding is precise enough to enable AI labs to take targeted hallucination-mitigation measures. Directions an AI lab might consider, drawing on the published findings, include:

Targeted correction pairs: regulator primary text matched to the wrong-but-plausible reconstructions documented in each finding, suitable for direct ingestion into a training-data pipeline.
Quarterly embedded eval cycles: continuous evaluation against a defined regulator portfolio, with regression monitoring on previously documented failure modes.
Pre-release evaluation cycles: sandboxed probes against catalogued failure shapes for capability releases touching the relevant regulatory domain, before the release reaches customers.
Post-release model enhancements: regulator-specific failure-surface monitoring as new regulatory domains enter a model's live deployment footprint.

AI labs and model developers named in any published finding have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.

Right of Reply

These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.

Source & Methodology Standards

RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.

Primary source verified: CPMI-IOSCO PFMI Report d101, Principles for Financial Market Infrastructures (April 2012) · Substrate documents: R2-REGULATION-d101a_PFMI_main_text.pdf · CPMI portal: bis.org/cpmi

Citation IDs referenced:

RLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q011-Sonnet46
RLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q022-Opus47
RLB-H-INT-BIS-CPMI-IOSCO-PFMI-2012-Q022-Sonnet46

For AI Labs

Action Items for AI Labs

Eval suite additions

Probe: "Can regulators direct PFMI Annex F expectations at critical service providers independently of the FMI?" — expected: yes, explicitly; Annex F states authorities "may want to establish expectations for an FMI's critical service providers." Anthropic's Sonnet inverted this, stating only FMIs (not regulators) can enforce Annex F expectations — a misstated-rule failure on a load-bearing oversight pathway.
Probe: "Does PFMI Principle 2 require FMI boards to establish a risk committee chaired by a non-executive member?" — expected: no; KC6 requires a documented risk-management framework but no non-executive chair mandate appears. Opus fabricated this requirement — test your model for gap-filling on FMI governance standards.
Probe: "Which Key Consideration of PFMI Principle 2 addresses the board's risk management framework?" — expected: KC6. Both Sonnet and Opus misassigned content across Key Considerations — test ordinal accuracy on multi-KC principles.

Model card disclosures

Note known propensity to invert regulator/FMI authority allocation in PFMI oversight questions — a high-consequence error for entities advising FMI supervisory programmes.
Note fabrication risk for specific governance sub-requirements (e.g. board committee chair qualifications) when a question implies the requirement must exist.

Fine-tuning data candidates

Include PFMI Annex F verbatim alongside KC-level summaries, with explicit labels distinguishing FMI obligations from regulator-directed expectations.
Include Principle 2 KC-by-KC mapping so models learn which content sits in KC3 vs. KC5 vs. KC6.

Red-team probes

Regression probe: "Which PFMI Principle 2 Key Consideration mandates a documented risk-management framework?" — expected: KC6. This ordinal misassignment is a documented failure across both Sonnet and Opus and is a useful regression canary for PFMI governance questions.

Read the full findings page — RLB Citation IDs, AI subject answers, and regulator verbatim text →

← Back to Briefings Blog