AI Hallucination on Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks for Treasury teams at Payment Institutions firms in international jurisdictions

Executive Summary

The PFMI Level 3 assessment examines whether FMIs actually implement Principle 15's general business risk standard, in particular the liquid net assets funded by equity (LNAFE) provisions that translate the standard into a buffer a firm must hold, instruments it can count, and conditions it must satisfy. For Treasury teams at Payment Institutions operating across international markets, whether as direct FMI participants, indirect members, or firms whose products route through systemically critical infrastructure, that KC-level detail feeds directly into capital planning, FMI counterparty assessment, and regulatory mapping against equivalent domestic frameworks.

Across three tested questions on this regulation, AI assistants consistently mischaracterised the most operationally consequential provisions: where the six-month LNAFE floor sits in the KC structure, how the LNAFE minimum is calculated, and what condition actually governs whether Basel/CRD equity can be counted. Every failure was an exposed fabrication, AI tools gave confident, specific answers and either inverted their position or admitted uncertainty only when pressed directly against the source text, indicating no reliable access to the underlying standard.

How AI gets this regulation wrong

Every failure recorded on this regulation follows the same pattern: AI tools produced confident, specific answers, complete with Key Consideration references, quantitative thresholds, and qualifying conditions, and retracted or reversed only when pressed directly against the source text. The errors are not peripheral misreadings; they include inverting which Key Consideration contains the operative quantitative minimum, inventing a calculation structure with no basis in the standard, and either fabricating or flatly denying the Basel equity carve-out condition that KC3 expressly states.

AI's Failure Mode	Count	Affected findings
Exposed Fabrication	2	Finding#1 · Finding#2

What that means for your team

The risk is concentrated in regulatory enforcement: firms that build LNAFE policy, calibrate capital buffers, or respond to supervisory queries using AI-generated summaries of Principle 15 face direct exposure if the underlying KC mapping is wrong. For a Payment Institution whose liquidity or capital framework references PFMI standards, either because it participates in or connects to an FMI, or because a domestic regulator has applied equivalent requirements, a miscalibrated floor or a misstated Basel carve-out condition is not a drafting inaccuracy; it is a compliance failure that CPMI-IOSCO's Level 3 assessment process is specifically designed to surface.

Risk Impact	Count	Affected findings
Regulatory enforcement	2	Finding#1 · Finding#2

When this affects your department

Treasury teams at Payment Institutions engage with PFMI Principle 15 in a range of operational contexts: mapping the firm's liquidity and capital requirements against FMI membership rules, advising business development on the buffer implications of new corridors or settlement arrangements, preparing internal briefings when a domestic regulator applies PFMI-equivalent standards, and conducting due diligence on FMI counterparties as part of settlement and systemic risk assessments.

In cross-border environments where multiple jurisdictions apply the Level 3 framework with different implementation gaps, the KC-level detail matters, and AI tools are a natural shortcut when a business line or a senior stakeholder needs a quick answer on what Principle 15 actually requires.

The specific danger here is that Principle 15's KC2 and KC3 are closely related but structurally distinct: KC2 governs the scenario-analysis sizing obligation; KC3 sets the six-month quantitative floor and the Basel equity carve-out. AI tools tested on this regulation consistently blurred that boundary, attributing KC3's minimum to KC2, merging KC2's scenario analysis into KC3's floor as a fabricated "greater of" construct, and either misrepresenting or denying outright the Basel carve-out condition that KC3 expressly states.

A Treasury analyst who receives that answer, trusts it, and embeds it in a policy paper, capital adequacy memo, or regulatory response has built a compliance position on incorrect text.

The enforcement exposure is direct. CPMI-IOSCO's Level 3 assessment is specifically designed to identify gaps between the published standard and actual FMI implementation, and regulators applying that lens will cross-reference firms' stated LNAFE policies against the KC text. A Payment Institution that has drafted its framework using AI-generated KC attributions, calculated its buffer against a non-existent "greater of" floor, or applied the wrong qualifying condition for Basel equity inclusion will not be able to defend the position under supervisory scrutiny, and the discovery that the error originated from an AI-assisted briefing does not mitigate the compliance gap.

The findings at a glance

All three findings concern PFMI Principle 15's LNAFE provisions, tested across the KC-level structure, the minimum calculation method, and the Basel equity carve-out qualifier, the provisions Treasury teams are most likely to rely on when calibrating buffers or drafting policy.

#	Finding title	Type	Citation ID
1	KC3 Basel equity carve-out condition fabricated	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q002
2	LNAFE minimum recast as non-existent greater-of floor	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q003

Aggregate impact

The three findings cluster tightly on a single sub-section of the standard: PFMI Principle 15, Key Considerations 2 and 3, and specifically the LNAFE framework. This is not a random scatter of errors, it reflects a systematic failure by AI tools to maintain the structural boundary between KC2 (scenario-based sizing) and KC3 (the quantitative floor and equity qualification rules). In every tested question where that distinction mattered, AI tools conflated or reassigned the provisions, producing answers that are internally plausible but wrong at the KC level.

The practical consequence for a Treasury function is that the most operational layer of Principle 15, the part that translates into a number to hold, an instrument to count, and a condition to satisfy, is exactly where AI tools are unreliable. Finding 2 shows an AI inventing a "greater of" dual-track structure for KC3 that imports KC2's scenario analysis leg as a co-equal minimum. Finding 3 shows a different AI placing the six-month floor in KC2 entirely, treating KC3 as solely a segregation clause.

Finding 1 shows AI tools either replacing KC3's actual Basel carve-out qualifier with a fabricated KC4 liquidity test, or denying the carve-out exists in KC3 at all. These are not edge interpretations, they are errors on the verbatim text of the standard.

For a Payment Institution operating across international markets, the systemic risk is that this cluster of errors survives internal review. A Treasury analyst querying AI on Principle 15 will receive a plausible, internally consistent answer that mislocates the floor, misstates the carve-out, or conflates the KCs, and a reviewer checking the output would need to go to the PFMI source text itself to catch it. Firms using AI-assisted regulatory mapping for Principle 15 should treat the entire KC2/KC3 intersection as a mandatory verification step, not an AI-safe zone.

What your team should do

The default position for Treasury teams using AI on PFMI Principle 15 is straightforward: AI output on KC-level attribution is unverified until cross-referenced against the published PFMI source text. The errors documented here are not in interpretive grey areas, they involve specific KC assignments, a specific quantitative minimum (six months of current operating expenses in KC3), and a specific qualifying condition for Basel equity inclusion. All are verifiable in under two minutes against the CPMI-IOSCO Principles for Financial Market Infrastructures (April 2012) and the November 2025 Level 3 assessment report.

Any Treasury policy, capital memo, or regulatory submission that cites KC2 or KC3 specifics should carry a verification flag until the KC text has been read directly.

The sharpest risk is in briefing-type queries, where a senior stakeholder or business line asks for a quick summary of what Principle 15 requires and Treasury responds with AI-generated output. The KC2/KC3 boundary confusion survives in a summary, gets quoted in a business case, and appears in a regulatory response before anyone checks the original. The structural safeguard is citation discipline: any internal output on Principle 15 KC-level detail should explicitly cite the paragraph of the PFMI, not a paraphrase, and should note whether it has been source-verified.

That discipline also protects the firm when a regulator requests the basis for a stated compliance position.

AI tools remain usable for orientation, identifying that Principle 15 addresses general business risk, that LNAFE is the operative concept, that the CPMI-IOSCO Level 3 process exists and published assessment findings in November 2025. At that level of abstraction the risk of a material error is lower. The danger zone is precision: as soon as a query asks for a specific number, a specific qualifying condition, or a specific KC assignment, treat the AI response as a hypothesis that requires source verification before it enters any work product.

How RLB Can Help

RegLeg's published Hallucination Research is a practical pre-flight check for any Treasury team running AI tools against regulatory questions. Before you rely on an AI-generated interpretation of safeguarding thresholds, capital buffer calculations, or cross-border settlement finality rules, the research tells you where those tools have already been caught fabricating or inverting the position.

For a Payment Institutions Treasury function, where a misread PSD2 derogation or a garbled EMD2 safeguarding ratio goes directly to a compliance breach, that's not an abstract risk catalogue; it's a live read on which questions you should be routing to the tool and which ones you should not.

Beyond the published findings, RLB works with Treasury teams on regulator-specific deep-dives that map AI-supported workflows to their actual hallucination exposure. Payment Institutions running multi-jurisdiction operations carry a specific pattern: the AI tools tend to perform well on home-jurisdiction rules they've seen repeatedly in training and degrade sharply on host-country PSD2 transpositions, local safeguarding equivalents, and FX settlement window rules that vary by corridor.

We can scope that exposure systematically, by workflow, by jurisdiction, by the regulatory surface area your team actually touches, so you know precisely where to insert human review rather than blanketing everything or missing the real gaps.

If your firm already has an AI-use policy in place, RLB can run a confidential review against our failure-mode catalogue, flagging where the policy's permitted-use boundaries don't account for the specific failure patterns we've documented in payment regulation contexts, and producing a prioritised remediation list your team can work through. We can also build that work into CPD-aligned training material, structured around the actual Treasury workflows (liquidity reporting, safeguarding reconciliations, regulatory capital monitoring) rather than generic AI literacy content, so your team has a defensible, documented basis for how it uses and quality-controls AI output in regulated processes.