AI Hallucination on Principles for Financial Market Infrastructures (PFMI) for Legal teams at Investment Banking firms in international jurisdictions

Executive Summary

The Principles for Financial Market Infrastructures (PFMI), published by the Bank for International Settlements Committee on Payments and Market Infrastructures (CPMI) and IOSCO in April 2012, is the international standard governing the design and operation of systemically critical financial market infrastructures, covering central counterparties, central securities depositories, payment systems, and trade repositories. The 28 Principles, their underlying Key Considerations, and the companion assessment methodology together form the operative compliance framework that national supervisors apply to FMIs across major jurisdictions.

For Legal teams at Investment Banking firms operating across PFMI-implementing jurisdictions, the framework is most directly relevant where the firm participates in, operates, or is supervised against the PFMI standards, whether as a CCP, payment-system operator, or institutional participant in those infrastructures. RegLeg tested two frontier AI tools (Claude Opus 4.7 and Claude Sonnet 4.6, with web search enabled) against the PFMI's Principle 2 governance requirements and Annex F oversight provisions, documenting 2 hallucinations relevant to this audience.

The dominant failure pattern is structured fabrication: the AI tools produced confident, specifically cited references to PFMI Key Considerations and Annex F provisions, but the underlying citations either inverted the regulator's stated scope, fabricated requirements not present in the KC text, or misattributed recommendations to KCs whose actual subject matter is unrelated. For Legal teams at Investment Banking firms, the operational risk is that AI-assisted research becomes a source of misstatements that propagate through policy, committee, and supervisor-facing documentation without surfacing as errors until a reviewer cross-checks against the PFMI primary text.

How AI gets this regulation wrong

The table below catalogues how AI tools err on this regulation. The pattern observed across all 2 findings is structural: the AI tools substituted training-data priors, generic corporate-governance language, common committee-design conventions, conventional FMI-internal framings of oversight, for the PFMI's actual KC-level text. In each case the AI output carried surface features of authoritative regulatory analysis (specific KC numbers, quoted phrasing, internal cross-references) while the underlying citation did not check out against the PFMI's published text.

AI's Failure Mode	Count	Affected findings
Inference Drift	2	Finding#1 · Finding#2

What that means for your team

For Legal teams at Investment Banking firms, the findings in this cell translate into a concrete risk of misstated regulatory citations in work products that flow through legal processes, compliance mappings, board papers, transaction documentation, supervisor-facing self-assessments, or operational policy documents. The table below shows how that risk distributes across the individual findings.

Risk Impact	Count	Affected findings
Liability / PI exposure	2	Finding#1 · Finding#2

When this affects your department

PFMI-related work reaches Legal teams at Investment Banking firms most often when the firm participates in, operates, or is supervised against a financial market infrastructure subject to the PFMI standards. The 28 Principles, their Key Considerations, and the companion assessment methodology together drive the supervisor's PFMI assessment cycle and the FMI's disclosure-framework responses, both of which generate documentation work that crosses the legal teams at investment banking firms's desk regularly.

The specific findings in this cell show where AI tools fail on the PFMI: on Finding#1 (Annex F critical service provider oversight, supervisory scope inverted) the Claude Sonnet 4.6 (web search on) produced output that aI inverted Annex F's regulator-to-CSP oversight provision, asserting that authorities cannot directly establish CSP expectations when the Annex's own text grants them that authority.; on Finding#2 (PFMI Principle 2 KC 6, non-executive risk-committee chair mandate fabricated) the Claude Opus 4.7 (web search on) produced output that aI fabricated a non-executive risk-committee chair requirement at KC 6 of Principle 2, where the actual text addresses only the documented risk-management framework and the independence of control functions..

These are not edge questions. Principle 2's governance text and Annex F's oversight scope are live issues in every PFMI assessment cycle and in routine supervisory engagement with FMIs and their participants.

The structural risk for Legal teams at Investment Banking firms is that AI-assisted research, used to accelerate first-pass orientation on a specific KC or annex provision, can introduce confident-but-wrong citations into working documents that are then circulated, cross-referenced in other policies, or submitted to counterparties and supervisors. The PFMI's KC-level structure means that an error at the KC-number or annex-scope level is hard to detect by skim review: it surfaces only when the reviewer goes back to the regulator's actual text, which is the question the team had asked the AI in the first place.

The findings at a glance

The table below summarises each PFMI finding tested in this cell, with the topic, the failure type, and the risk category that failure creates for Legal teams at Investment Banking firms in international jurisdictions.

#	Finding title	Type	Citation ID
1	Annex F critical service provider oversight — supervisory scope inverted	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-2012-Q011
2	PFMI Principle 2 KC 6 — non-executive risk-committee chair mandate fabricated	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-2012-Q022

Aggregate impact

The findings in this cell cluster around a single underlying failure shape: AI tools reconstructing the PFMI's governance and oversight architecture from training-data priors rather than from the regulator's actual KC-level text.

The drift takes several visible forms across the findings, a fabricated non-executive risk-committee chair mandate grafted onto KC 6, a soft risk-committee recommendation misattributed to KC 5 (whose actual subject is management roles), and an inverted reading of Annex F that converts a regulator-to-CSP oversight channel into an FMI-internalised contractual obligation, but the common substrate is the same: a model prior about how FMI governance and oversight "should" be structured overrides the PFMI's actual structural decisions.

For Legal teams at Investment Banking firms, the systemic risk is not just that any one answer is wrong. It is that the AI output's surface structure, specific KC citations, quoted regulatory phrasing, named annex provisions, closely mimics the format of well-researched regulatory analysis. The error is invisible on review because the form looks authoritative; detection requires going back to the PFMI's published text and reading the cited KC for itself.

The broader pattern visible across the PFMI findings says something specific about model behaviour on dense regulatory frameworks: where a KC's actual text is at the framework level ("the board should establish a documented risk-management framework"), the AI tools we tested filled in the implementation detail ("the risk committee should be chaired by a non-executive member") from corporate-governance priors rather than from the source document. For PFMI specifically, that drift is consequential because the framework is structured precisely to leave implementation choices to the FMI, while binding the FMI to framework-level obligations whose actual text matters.

What your team should do

The default position for Legal teams working on PFMI matters should be to treat any AI output that supplies a specific PFMI citation, a Principle number, a Key Consideration reference, a quoted KC phrase, an Annex F provision, as a suggested search term, not a verified citation. The PFMI is available in full from the BIS publications portal, and the structure (Principles, Key Considerations, Annexes) is designed to be read directly. Any work product that will be presented to the board, a counterparty, or a regulator should build in a mandatory citation-verification step before the document leaves the team.

AI tools are most safely used here at the framing stage: identifying which Principles or annex provisions are likely to be relevant to a specific question, drafting the structure of a board paper or compliance mapping before the regulatory content is filled in from primary sources, or explaining the broader policy rationale behind the PFMI's framework choices. The risk sits entirely in the next step: asking the AI to supply the actual KC wording, the specific committee design requirement, the precise scope of Annex F oversight.

The findings in this cell show that AI outputs at that level are unreliable in ways that look authoritative.

For workflows where multiple legal use AI assistance on PFMI questions, a simple challenge protocol helps: ask the AI to identify the exact source and page location for any specific KC or annex claim, and treat an evasive or hedged response as a signal to verify against the source document. A team standard that requires every PFMI citation in an outbound document to trace back to a verified line in the published PFMI text is the most reliable safeguard against the failure modes documented in this cell.

How RLB Can Help

RegLeg's published Hallucination Research gives Legal teams at Investment Banking firms a structured pre-flight check before relying on AI tools for PFMI questions. The research surfaces precisely which PFMI Principles, Key Considerations, and annex provisions have generated confident-but-wrong AI output, and it does so with full citation ID trails back to the specific model output and the regulator's verbatim text.

Teams can use it as a targeted reference: when an AI tool produces a PFMI citation, the research tells you whether that area has a documented failure pattern and what the specific failure shape looks like, so you can apply directed human scrutiny rather than blanket scepticism.

Beyond the published research, RegLeg works with firms on bespoke deep-dives that map AI-supported workflows against documented failure modes for a specific regulator portfolio. For PFMI work specifically, that includes the Principle-by-Principle failure map, the Annex F oversight-scope failure pattern, and the cross-document fragility across the broader CPMI-IOSCO publication family (assessment methodologies, Level 3 assessments, CCP resilience guidance, stablecoin applicability guidance). The output is designed for sharing across a practice group or department and for use as a durable internal reference.

RegLeg also develops training and CPD-aligned content that translates the failure-mode catalogue into practical guidance for Legal teams at Investment Banking firms. This material covers the classes of AI error that show up most often on dense regulatory frameworks like the PFMI, fabricated KC-number citations, generic-prior interference on governance text, scope inversions on regulator-facing provisions, and explains how to structure review and verification routines that catch the errors without slowing the team down.

RegLeg also offers a confidential review of a firm's existing AI-use policy against the failure-mode catalogue, identifying gaps between the policy's assumptions and the documented evidence of how AI tools behave on PFMI questions in practice.