AI Hallucination on Harmonised ISO 20022 Data Requirements for Enhancing Cross-Border Payments - Updated Report for Product & Business Development teams at Payment Institutions firms in international jurisdictions

Executive Summary

For Product & Business Development teams at Payment Institutions firms operating across international jurisdictions, the CPMI ISO 20022 harmonisation framework is a live product constraint, shaping corridor strategy, partner onboarding requirements, and the business case for data enrichment investment. We tested AI tools against questions drawn directly from official CPMI and FSB statements on this regulation and found two hallucinations across the questions we examined. Both failures fall in quantitative territory: the adoption statistics that underpin market-opportunity framing and the efficiency metrics that drive internal investment cases.

In the first failure, an AI assistant collapsed two materially distinct adoption rates, one for faster payment systems, one for RTGS, into a single conflated figure, then admitted on challenge that the number had been reconstructed. In the second, an AI missed the official CPMI/FSB inquiry-rate and resolution-time data entirely, returning a false negative and misattributing the one figure it did surface to commercial sources rather than official statements. Both errors feed directly into the deliverables Product & Business Development teams build: business cases, product roadmaps, client-facing positioning, and corridor analysis.

How AI gets this regulation wrong

The failures we observed on this regulation split between an AI that conflated distinct statistics into a single invented figure, and admitted it when pressed, and an AI that searched official sources, found nothing, and returned a false negative while misattributing a related figure to the wrong source. Both failures are on the quantitative spine of the regulation: the adoption benchmarks and the operational efficiency claims that give the harmonisation agenda its strategic weight.

AI's Failure Mode	Count	Affected findings
Exposed Fabrication	2	Finding#1 · Finding#3
Inference Drift	1	Finding#2

What that means for your team

Both failures land in the same risk category: wrong deliverable. For a Product & Business Development function at a Payment Institutions firm, that translates to misquoted statistics circulating in board papers, investment cases, or client materials, numbers that have been laundered through an AI tool and stripped of their provenance before anyone noticed they were fabricated or misattributed.

Risk Impact	Count	Affected findings
Wrong deliverable	3	Finding#1 · Finding#2 · Finding#3

When this affects your department

Product & Business Development teams at Payment Institutions firms reach for this regulation when scoping ISO 20022 migration timelines against corridor partners, building the business case for data enrichment investment, or positioning the firm's capabilities to PSPs, correspondent banks, and enterprise clients who need to know how far along the network really is. The adoption statistics, how many faster payment systems and RTGS operators are live on ISO 20022, are the foundation of market-opportunity framing.

Get them wrong and the product roadmap is built on a false premise; pitch them wrong in a client deck and your counterpart, who runs treasury at a G-SIB, will notice.

The inquiry-rate and resolution-time figures sit at the ROI end of the same conversation. When you're justifying internal investment in structured remittance data handling, enrichment APIs, or exception management tooling, the Panetta numbers, 1-3% inquiry rate, 5-10 manual touchpoints, up to 80% reduction in resolution time, are exactly the quantitative anchors that make a business case land. A junior analyst tasked with populating a business case or investor briefing who uses AI to retrieve these figures will either get a conflated number or a false negative, and in neither case will they know to flag it.

The compounding risk is attribution. When an AI returns a figure attributed to "SWIFT" or "commercial bank research" rather than an FSB co-chair statement delivered at a G20 summit, the sourcing chain in the downstream document is wrong. For a firm preparing regulatory submissions, investor materials, or partner-facing documentation where source quality matters, the misattribution is an independent problem on top of the factual error.

The findings at a glance

The two findings below cover the quantitative benchmarks most directly relevant to product and business development work on this regulation, adoption rates across payment system types, and the operational efficiency metrics underpinning the case for harmonised implementation.

#	Finding title	Type	Citation ID
1	ISO 20022 adoption rate conflation: RTGS vs faster payments (Opus 4.7)	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q006-Opus47
2	Missing official inquiry-rate and resolution-time benchmarks	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q007-Sonnet46
3	ISO 20022 adoption rate conflation: RTGS vs faster payments (Sonnet 4.6)	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q006-Sonnet46

Aggregate impact

Both failures cluster on the same type of content: quantitative benchmarks from official statements made at the March 2026 FSB cross-border payments summit. That clustering matters. The Andrew Bailey and Fabio Panetta speeches from that summit represent the most recent authoritative data on ISO 20022 adoption progress and harmonisation benefits, the figures that would appear in any current business case or market analysis on this topic.

AI tools that are queried about this regulation are being asked about content that postdates or is underrepresented in their training data, and the failures show exactly what happens: one AI reconstructed a plausible-sounding figure by collapsing two distinct metrics, and another simply failed to surface the relevant speech at all despite searching.

For Product & Business Development teams, the systemic risk is that these are precisely the figures most likely to be delegated to a junior analyst with an AI tool. Adoption statistics and efficiency benchmarks feel like factual retrieval tasks, look up the number, drop it into the deck. The AI's confident presentation of a wrong adoption rate, or its failure to surface the inquiry-rate data, will not trigger any obvious red flag. The conflated 79% figure is internally coherent; the false negative ("no official statistic found") looks like due diligence. Both pass a casual review.

Across both findings, the errors are not edge cases or obscure interpretive questions, they concern the foundational numbers used to frame the strategic and commercial rationale for ISO 20022 investment. A Payment Institutions firm that builds its product roadmap, partner pitch, or investor narrative on AI-sourced versions of these figures is working from a corrupted quantitative foundation, and the error will only surface when a better-informed counterpart, a correspondent bank, a regulator, an institutional investor, cites the correct source.

What your team should do

The default position on any quantitative benchmark from this regulation should be primary source only. The Bailey and Panetta speeches from the March 2026 FSB cross-border payments summit are publicly available on the BIS website and are short documents, retrieval takes minutes. No business case, investor briefing, or client-facing document should carry an adoption rate or efficiency figure that has not been traced back to the original speech or CPMI monitoring report, regardless of what an AI tool returned.

The cost of this check is trivial relative to the cost of a figure that gets challenged in a board meeting or a partner negotiation.

Practically, the exposure points to address are templated deliverables and junior analyst workflows. If your team uses AI to populate standard sections of business cases, market-size tables, regulatory-landscape summaries, benchmark statistics, those sections need a verification step before the document moves to review. The specific failure mode here (AI conflating FPS and RTGS adoption rates into a single figure) is the kind of error that survives internal review because it is numerically plausible. Build the verification expectation into the workflow, not just the sign-off.

AI tools are genuinely useful for this regulation in structural and interpretive work: mapping the CPMI data requirements against your firm's message flows, identifying gaps between your current enrichment capabilities and the harmonised data set, drafting initial versions of gap analysis frameworks or training materials. The failures we observed are concentrated in quantitative retrieval from recent official statements, that is a discrete and manageable exclusion. Keep AI in the analytical layer; source your numbers from the primary record.

How RLB Can Help

RegLeg's published Hallucination Research gives Product & Business Development teams a concrete pre-flight check before acting on AI-assisted regulatory analysis. When your team is assessing licensing pathways into a new corridor, structuring a partner programme around an e-money framework, or pressure-testing product eligibility against safeguarding or passporting rules, the research flags where AI tools have demonstrably misfired on the same regulatory text you are querying, wrong jurisdictional scope, fabricated supervisory guidance, inverted conditions on capital or float requirements. That record is publicly verifiable and jurisdiction-specific, which means you can calibrate reliance before a product decision is already downstream.

Beyond the published findings, RLB runs bespoke regulator deep-dives scoped to the workflows where your function carries the most exposure. For Product & Business Development at a Payment Institution, that typically concentrates around regulatory change tracking for scheme rule amendments and PSD-equivalent transposition variances across corridors, go-to-market sequencing against authorisation timelines, and commercial structuring that turns on interpretation of safeguarding mechanics or agent/distributor liability rules.

A deep-dive maps which of those workflows have the highest hallucination surface given how AI tools handle fragmented multi-jurisdictional source material, and gives you a ranked view of where human review adds the most value rather than a blanket "verify everything" instruction your team cannot operationalise.

For firms that have already embedded AI tooling in their regulatory workflows, RLB offers a confidential review of existing AI-use policy against the failure-mode catalogue the research has built up across payment institution regulatory texts. The output is a prioritised remediation list, not a compliance checkbox exercise, focused on the specific decision types where a hallucination carries commercial or regulatory consequence for a Product & Business Development function.

Where teams want to build that capability internally, RLB can develop training material and CPD-aligned content scoped to payment regulation and product development contexts, so the learning lands with the people making the calls rather than sitting in a generic AI-literacy module.