AI Hallucination on Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks for Legal teams at Payment Institutions firms in international jurisdictions

Executive Summary

For Legal teams at Payment Institutions firms operating across international jurisdictions, the November 2025 CPMI-IOSCO Level 3 Assessment on General Business Risks is a direct reference point: when your firm or a counterparty FMI is benchmarking its PFMI self-assessment against the regulator's latest observed-practice findings, the accuracy of what you put in front of the board or a prudential supervisor matters. Across our testing of AI tools on this assessment, one aggregated question produced a wrong answer, and the failure mode was the AI presenting a truncated version of the assessment timeline as if it were the complete picture.

The AI confidently clipped the assessment period at 2023–24, missing the 2025 validation phase that forms an integral part of the process characterised in the published report. For a Legal team drafting internal benchmarking documentation or preparing a regulatory engagement that cites the assessment's provenance, that single-year omission quietly corrupts the framing, and an experienced supervisor who knows the publication will notice.

How AI gets this regulation wrong

The predominant failure we observed on this assessment was AI tools serving up outdated or partial temporal information as though it were current and complete, drawing on summary-level shorthand from secondary sources rather than the authoritative characterisation in the published report. The practical hazard is not that the AI fabricated a date out of nowhere, but that it reproduced a real-but-incomplete formulation and then cited a third-party commentary piece to support it, giving the answer a veneer of sourcing that discourages the verification step a junior analyst would otherwise apply.

AI's Failure Mode	Count	Affected findings
Outdated	1	Finding#1

What that means for your team

For a Legal team at a Payment Institution, the primary risk from AI errors on this assessment is delivering a work product that misrepresents the assessment's scope or provenance to an internal or external audience that will rely on it, a wrong-deliverable risk that compounds the further downstream the document travels. The failure does not typically surface as an obvious factual contradiction; it surfaces as a credibility problem when the document is reviewed by a counterparty, supervisor, or auditor who holds the primary source.

Risk Impact	Count	Affected findings
Wrong deliverable	1	Finding#1

When this affects your department

Legal teams at Payment Institutions reach for this assessment most naturally when they are contextualising a PFMI self-assessment for internal governance audiences, board risk committee papers, internal audit scoping documents, or senior management briefings that need to situate the firm's general business risk position against the current state of observed practice across the FMI population.

In international jurisdictions, where CPMI-IOSCO assessment findings carry persuasive weight with local prudential regulators even absent direct legal force, Legal often leads the framing exercise: translating what the assessment found, over what time period, and on what evidentiary basis into language that a board-level recipient or a regulator-facing submission can carry without ambiguity.

The assessment's process provenance is not a footnote detail in this context. When a supervisor queries the basis for a self-assessment benchmark, or when internal audit tests the adequacy of the firm's PFMI monitoring programme, the question of when data was collected, how many FMIs participated, and whether the findings were validated directly with those FMIs before publication bears directly on the weight the self-assessment can carry.

A Legal team that has relied on AI-generated summary language describing the assessment period as "2023–24", when the published report characterises it as "2023–25" inclusive of the April 2025 FMI validation phase, has introduced a discrepancy that will either require correction under time pressure or be left to stand as a quiet misrepresentation.

The risk is sharpest where the document citing the assessment is external-facing: a regulatory response, a submission to a working group, or due-diligence material provided to a counterparty conducting a connectivity or membership assessment. In those contexts, being caught with a mis-stated assessment timeline signals either that the team did not read the primary source or that it relied on a secondary summary without checking, neither of which is a comfortable position when the supervisor or counterparty holds the BIS publication.

The findings at a glance

The table below summarises the finding from our testing of AI tools on this assessment, including the question area, the nature of the AI's error, and the risk category that error creates for a Legal team at a Payment Institutions firm in international jurisdictions.

#	Finding title	Type	Citation ID
1	Assessment timeline truncated at 2023–24, omitting 2025 validation phase	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q005

Aggregate impact

What makes the error on this assessment diagnostically significant is the mechanism: the AI did not fabricate the assessment dates, it reproduced language that appears in abbreviated form on a BIS landing page and in secondary commentary, then presented that abbreviated characterisation as if it were the complete authoritative description. The CPMI report itself states the assessment "was carried out during 2023-25"; the BIS landing page shorthand uses "2023–24" as a compressed reference.

AI tools trained on a broad corpus will encounter the landing-page shorthand far more frequently than the executive summary text, and will reproduce it with apparent confidence, often attaching a third-party commentary citation that does not itself correct the compression.

For Legal teams at Payment Institutions in international jurisdictions, this failure mode clusters on exactly the kind of provenance and process question that governance documents require precision on. The question of how many FMIs participated, on what basis, and whether findings were validated before publication is standard due-diligence framing for any PFMI self-assessment benchmark, and it is the kind of framing that a competent supervisor will test.

An AI answer that truncates the 2025 validation phase out of the assessment timeline produces a document that is subtly but materially wrong in a way that is hard to catch in routine review, because the stated dates are not implausible and the attached citation appears to support them.

The systemic risk to the firm is concentrated in the credibility of its PFMI-adjacent governance output. A single misstatement of an assessment's temporal scope may seem low-stakes in isolation, but in international jurisdictions where CPMI-IOSCO assessments are the primary public benchmark for FMI compliance culture, being demonstrably wrong about the regulator's own process in a submission or board paper is a reputational liability that extends beyond the immediate document.

What your team should do

The default position for Legal teams on this assessment should be: treat AI-generated summary language about the assessment's process provenance as a first draft that requires primary-source verification before it goes into any document that will be relied upon externally. The BIS publication is publicly available. The specific dates, participant count, and validation process are in the executive summary, not buried in technical annexes, retrieving the correct characterisation is a two-minute task. The risk of skipping that step is not proportionate to the time it saves.

Where AI tools are genuinely useful on this assessment is in orientation and structuring work: generating a first-pass outline of the regulatory context for a new team member, summarising the general business risk categories CPMI assessed across the FMI population, or drafting the introductory framing of a self-assessment paper that will be checked and substantiated against the primary source before sign-off. In those use cases, the AI's tendency to reproduce landing-page shorthand is tolerable because the output is explicitly a draft scaffold, not a final representation of fact.

The practical safeguard for the Legal workflow on this regulation is a verification step built into the document production process, not a blanket prohibition on AI assistance. Any document that states a date, participant count, or process description sourced from AI output on this assessment should carry a source check against the BIS publication before it leaves the Legal team, and that check should be documented in the file in the same way any other substantive verification would be.

In international jurisdictions where regulatory supervisors apply close scrutiny to PFMI benchmarking methodology, that verification trail is also a defensible record if the document is ever queried.

How RLB Can Help

RegLeg's published Hallucination Research gives the Legal team at a Payment Institutions firm a free, ready-to-use pre-flight check before placing weight on AI-generated output for regulatory questions. Each research entry documents the specific ways AI tools have misrepresented rules, cited non-existent provisions, or conflated requirements across payment frameworks, giving your team concrete, evidenced failure patterns rather than abstract caution. Running that check takes minutes and can prevent the kind of reliance on plausible-sounding but incorrect regulatory positions that carries real compliance and reputational risk.

Beyond the published research, RegLeg offers bespoke regulator deep-dives scoped to the workflows your Legal function actually uses. For Payment Institutions operating across multiple jurisdictions, that typically means mapping AI-supported tasks, licence condition reviews, regulatory correspondence drafts, horizon-scanning, and cross-border equivalence analysis, against the hallucination failure modes most prevalent in each relevant framework. The output is a prioritised exposure map your team can act on directly: knowing which tasks benefit most from AI assistance and which require tighter human review before outputs are relied upon.

For firms with an existing AI-use policy, RegLeg can conduct a confidential review against our failure-mode catalogue, identifying gaps and producing a prioritised remediation plan aligned to your current workflows and governance structure. We also develop training materials and CPD-aligned content tailored for Legal teams, practical, case-grounded sessions that build the critical fluency your lawyers need to work productively with AI tools without inadvertently accepting flawed regulatory analysis.