AI Hallucination on Harmonised ISO 20022 Data Requirements for Enhancing Cross-Border Payments - Updated Report for Compliance teams at Statutory Boards & Agencies firms in international jurisdictions

Executive Summary

Compliance teams at Statutory Boards & Agencies firms in international jurisdictions rely on accurate ISO 20022 adoption data to benchmark their own implementation progress, advise on interoperability obligations, and report accurately to governing boards and finance ministers on where the global payments ecosystem stands. On the CPMI Harmonised ISO 20022 Data Requirements Updated Report, AI tools we tested produced statistically wrong answers on the most-referenced monitoring datapoint in the regulation, the share of faster payment systems and RTGS systems that have adopted ISO 20022.

The failure was not a marginal rounding error: AI tools collapsed two materially different figures (faster payment systems at over three-quarters; RTGS systems at approaching half) into a single inflated number applied uniformly to both system types, overstating RTGS adoption by roughly 30 percentage points. When pressed on the discrepancy, the AI tools acknowledged uncertainty, confirming the confident initial answer had no reliable grounding. A Compliance function that built a policy brief, board paper, or regulator submission around the AI's figure would be documenting a factual misrepresentation of the current state of ISO 20022 deployment in the RTGS segment.

How AI gets this regulation wrong

The failure mode surfaced on this regulation is confident fabrication: AI tools produced a specific, citable-looking statistic with no hesitation, then retreated from it when challenged. The table below maps how this pattern manifests, where the AI's answer was internally plausible but materially wrong, and where the error would survive a quick read without triggering a red flag.

AI's Failure Mode	Count	Affected findings
Exposed Fabrication	2	Finding#1 · Finding#2

What that means for your team

For Compliance functions at Statutory Boards & Agencies firms in international jurisdictions, the risk here concentrates in the wrong-deliverable category: the AI's answer was plausible enough to flow into a board paper, regulatory submission, or internal gap analysis before the error was caught. The table below breaks down what that means in practice, which workflow carries the error forward and what it costs the firm to correct it downstream.

Risk Impact	Count	Affected findings
Wrong deliverable	2	Finding#1 · Finding#2

When this affects your department

Compliance teams at Statutory Boards & Agencies firms in international jurisdictions consult AI tools on this regulation most heavily when they are assembling briefing materials for a governing board or finance ministry, scoping an internal implementation roadmap against where the global ecosystem sits, or responding to a business line asking whether their payment infrastructure is compliant with the direction of travel CPMI has set. Adoption-rate statistics sit at the heart of all three use cases, they frame the urgency of the firm's own migration, calibrate peer-benchmarking arguments, and anchor the regulatory context in any board paper or external communication.

The specific failure here, overstating RTGS adoption by roughly 30 percentage points, would misrepresent the competitive and regulatory landscape in exactly the documents that carry the most institutional weight. A board paper asserting that close to 80 percent of RTGS systems have adopted ISO 20022 will frame the firm's own timeline as a catch-up exercise when, on the RTGS side, the actual picture is closer to a half-adoption market. That framing affects capital allocation decisions, vendor contract timelines, and the strength of the business case for accelerating internal migration.

For a firm that interfaces with a central bank, a finance ministry, or a supranational body, all normal counterparties for a Statutory Boards & Agencies compliance function, presenting an inflated RTGS adoption figure in a formal submission or regulatory dialogue creates a specific credibility risk. Counterparties working from the same CPMI monitoring data will immediately identify the discrepancy, and the reputational damage of producing a demonstrably wrong statistic in a regulatory context is disproportionate to how easy the error would have been to prevent.

The findings at a glance

The table below summarises the finding tested against this regulation, the question asked, what AI tools answered, and how that diverges from what the CPMI monitoring data actually states.

#	Finding title	Type	Citation ID
1	ISO 20022 adoption rate conflation: RTGS vs faster payments (Opus 4.7)	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q006-Opus47
2	ISO 20022 adoption rate conflation: RTGS vs faster payments (Sonnet 4.6)	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q006-Sonnet46

Aggregate impact

With a single finding in this cell, the pattern is clear and concentrated: the AI failure occurs on a quantitative regulatory datapoint where the source text makes a deliberate and material distinction between two payment system types. The authoritative CPMI monitoring survey, as reported by FSB officials, explicitly separates faster payment system adoption (more than three-quarters) from RTGS adoption (approaching half). AI tools collapsed that distinction into a single figure and applied it uniformly. The resulting answer was internally consistent and looked credible because it cited a real-sounding percentage, but it systematically misrepresented the RTGS segment.

This matters disproportionately for a Compliance function at a Statutory Boards & Agencies firm because RTGS infrastructure is often the firm's direct operational terrain. Regulatory benchmarking for a statutory body typically centres on the RTGS segment, not faster payment retail rails. An inflated RTGS adoption figure sets the wrong baseline for gap analysis, misrepresents the firm's position relative to peers, and generates an inaccurate urgency assessment for internal prioritisation.

The systemic risk is that this class of error, confident fabrication of a specific statistic, is uniquely hard to catch in a Compliance workflow that relies on AI for research efficiency. The answer looks well-sourced, uses the right terminology, and lands within a broadly plausible range. A junior analyst under time pressure has no obvious signal to pause on. The correction only surfaced when the AI was directly challenged, a step that rarely happens when a stat is embedded in a longer draft rather than standing alone as the primary output.

What your team should do

The default position for any Compliance work that cites ISO 20022 adoption statistics should be to go to the primary CPMI monitoring survey output or the FSB Payments Summit speech record directly, not to use AI as the lookup layer. The CPMI monitoring data is publicly available and stable; the cost of pulling the actual figure is low, and the cost of carrying a wrong figure into a board paper or regulatory submission is not. Treat AI answers on quantitative regulatory monitoring data as a starting hypothesis that needs source verification before it reaches any deliverable.

For the specific workflow risk here, the practical safeguard is a house rule on AI-assisted drafting: any sentence that contains a percentage, adoption rate, implementation count, or survey-derived figure must have the primary source cited in the draft, not just attributed to "CPMI monitoring data." If a junior can't produce the URL or document section, the figure doesn't go in. That rule catches this class of fabrication before it reaches review, because the AI's figure, when traced, does not match the source text.

AI tools are genuinely useful on this regulation for tasks that don't depend on getting a specific number right: summarising the structure of the harmonised data requirements, mapping the LEI/IBAN/BIC field obligations against the firm's existing payment system data model, or drafting a gap-analysis framework for comparing current message formats against the target state. Those tasks leverage the AI's ability to synthesise structured regulatory text, where the failure mode is different and more detectable. Keep AI away from any work-product where a single wrong statistic would compromise the entire deliverable.

How RLB Can Help

RegLeg's published Hallucination Research gives Compliance teams at Statutory Boards and Agencies a practical pre-flight check before placing weight on AI-assisted output for regulatory questions. Because the research is openly available, it can be incorporated into existing review workflows without additional licensing or procurement, teams can consult the relevant failure-mode findings at the point where AI tools are being used to interpret obligations, draft submissions, or assess enforcement exposure, and adjust their reliance accordingly.

Where published research is not granular enough for a specific operating context, RLB offers bespoke regulator deep-dives tailored to the Compliance function's actual workflow. These engagements map the AI-supported tasks that carry the highest hallucination exposure for a Statutory Board or Agency, typically areas such as multi-jurisdictional obligation mapping, condition-of-licence interpretation, and regulatory correspondence drafting, and produce a prioritised picture of where human verification effort should be concentrated.

RLB also conducts confidential reviews of a firm's existing AI-use policy against RegLeg's failure-mode catalogue, identifying gaps and producing a prioritised remediation roadmap that the Compliance team can action within its normal governance cycle.

To support capability building within the team, RLB develops training material and CPD-aligned content that Compliance staff can use internally. This content is designed to be delivered by the team's own leads rather than requiring ongoing external facilitation, and is calibrated to the regulatory environment and AI tools already in use at the firm. The aim is to leave the Compliance function better equipped to make its own informed judgements about AI reliability, not dependent on external sign-off each time a new workflow is introduced.