AI Hallucination on Harmonised ISO 20022 Data Requirements for Enhancing Cross-Border Payments for Lawyers in international jurisdictions

Executive Summary

Four questions put to AI tools about the CPMI harmonised ISO 20022 data requirements produced four hallucinations, every answer was wrong. The errors span governance attribution, adoption statistics, official performance benchmarks, and Fedwire's specific technical format requirements: a profile that maps almost exactly onto the questions lawyers advising cross-border payment clients actually ask. Two failures involved AI asserting incorrect figures with apparent confidence before admitting, when pressed, that its answers were reconstructed or inferred rather than sourced from authoritative documents.

The remaining two involved AI attributing positions to sources that said something materially different, including misidentifying which central bank chairs the CPMI working group, with a named individual at the wrong institution. For lawyers drafting compliance opinions, advising on implementation annexes, or building regulatory submissions around these requirements, the failure rate here leaves no reliable floor.

How AI gets this regulation wrong

The failures on this regulation split between AI asserting wrong answers with apparent confidence, then walking them back when challenged, and AI attributing positions to sources that say something materially different or citing the wrong institution entirely. In both patterns, the surface presentation is polished and specific enough to pass a first read without triggering doubt. The underlying error only surfaces when the cited source is checked directly or the AI is pressed to account for its own reasoning.

AI's Failure Mode	Count	Affected findings
Exposed Fabrication	2	Finding#1 · Finding#3
Inference Drift	1	Finding#2

What that means for your practice

Three of the four failures fall into wrong-deliverable territory: advice built on them would be factually incorrect in ways that survive casual review until a counterparty or regulator checks the primary source. The fourth has PI exposure written directly into it, it concerns Fedwire's specific postal address format requirements under the hybrid approach, where an incorrect description of the optional component would propagate directly into implementation guidance, compliance annexes, and any client sign-off derived from them.

Risk Impact	Count	Affected findings
Wrong deliverable	3	Finding#1 · Finding#2 · Finding#3

When this affects Lawyers

Lawyers advising correspondent banks, payment infrastructure operators, or fintech clients across multiple jurisdictions routinely reach for AI tools when scoping new engagements or drafting first-pass advice on the ISO 20022 harmonisation requirements, particularly for questions where the answer is expected to be a matter of public record. Governance attribution, adoption timelines, and official performance statistics are precisely the kind of factual anchors that appear in opinion letters, compliance reports, and regulatory submissions without extensive caveating, because they appear verifiable and the legal drafter assumes they are not in dispute.

The calibration risk here is specific. An opinion that cites the Federal Reserve Bank of New York as chairing the CPMI ISO 20022 working group, rather than the Reserve Bank of Australia, is not an ambiguous interpretive position; it is a factual error that undermines the opinion's credibility in any context where the counterparty has checked the primary source.

Equally, collapsing the distinct FPS and RTGS adoption rates into a single figure misrepresents the current state of standard rollout in a way that matters when advising clients on timing strategy, benchmarking, or the regulatory expectations they face in a given market.

The Fedwire implementation question sits in a different risk register. Advice on a named payment system's specific hybrid postal address format goes directly into compliance annexes, system specification sign-offs, and implementation workstreams.

AI tools we tested inverted the nature of the optional address component, substituting optional structured fields drawn from general CBPR+ address schema knowledge for the source's actual specification of "optional free-format lines of 70 characters each." A lawyer who delivers that description to a client implementing Fedwire's ISO 20022 requirements, or includes it in a technical annex without independent verification, has delivered a wrong answer with PI exposure attached.

The findings at a glance

Each row below captures one AI failure tested against this regulation, the question asked, the failure type, and the risk impact classification for lawyers in international jurisdictions.

#	Finding title	Type	Citation ID
1	ISO 20022 adoption rate conflation: RTGS vs faster payments (Opus 4.7)	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q006-Opus47
2	Fedwire hybrid postal address schema over-specification	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q010-Opus47
3	ISO 20022 adoption rate conflation: RTGS vs faster payments (Sonnet 4.6)	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q006-Sonnet46

Aggregate impact

The four failures here are not random noise. They cluster on the exact categories of factual information that lawyers reach for when drafting regulatory advice on ISO 20022 harmonisation: who governs the process, how widely adopted is the standard across different system types, what do official bodies say about the operational performance case, and what are the precise technical requirements at a named implementation point. AI tools produced a wrong answer on every one.

The governance and statistics failures share a structural pattern: AI tools produced confident, specific answers that looked authoritative and were wrong in ways that required primary-source verification to catch. The CPMI working group chairmanship finding is the sharpest example, the AI named a specific individual at a different central bank, fabricating both the institution and the person in the same response. A lawyer who repeated that attribution in a client briefing, regulatory submission, or opinion memo would have done so on the basis of a hallucination indistinguishable, in form, from a researched fact.

The FPS and RTGS adoption conflation is similarly invisible at first read: a single figure looks more precise, not less, and the error only surfaces when the underlying official statement is checked against both system types separately.

The Fedwire technical failure sits apart in its consequence profile. It is not a governance or statistics question, it is a direct question about a named system's format specification, where the AI inverted the nature of the optional address component. The error propagates silently: a compliance annex or implementation specification derived from it would be technically wrong in a way that might not surface until client testing or regulatory audit.

Critically, the AI had the Fedwire implementation date correct, which creates a false calibration signal, accuracy on one detail increases trust in the surrounding technical description, and the format error rides that trust into a client deliverable.

What your team should do

The default position for this regulation is: do not rely on AI tools for any factual assertion you intend to put in an opinion, submission, or implementation annex without tracing it to a primary source you have personally read. The four failures here cover governance attribution, adoption statistics, official performance benchmarks, and named-system format specifications, together, that is most of the factual scaffolding of cross-border payment advice on ISO 20022 harmonisation. This is not a regulation where AI tools fail only on edge cases.

For governance and institutional attribution, who chairs which working group, what body issued which statement, which official speech contains which figure, treat AI tools as a pointer to where to look, not as a source. The RBA chairmanship failure illustrates the specific risk: a named individual at a wrong institution is harder to catch than a vague wrong answer precisely because it is specific and internally consistent. Any institutional attribution in a deliverable should be traced to a primary BIS publication, a central bank press release, or an official speech transcript.

For official statistics and performance benchmarks, adoption rates, inquiry rates, resolution time figures, verify against the underlying official speech or regulatory publication, not against AI-summarised versions. The FPS and RTGS adoption conflation shows how two distinct figures can be averaged into one plausible-looking number with no signal of error on the face of the response.

For named-system technical specifications, Fedwire, SWIFT CBPR+, any national RTGS operator's implementation of the hybrid or end-state approach, AI tools trained on general ISO 20022 schema knowledge will substitute that for system-specific format requirements without flagging the substitution. Before advising on any system's specific address format, field optionality, or character constraints, verify directly against the operator's published specifications: FRB Services documentation for Fedwire, SWIFT's CBPR+ implementation guides, or equivalent primary sources for other systems. Do not rely on AI-reconstructed technical summaries for specifications that go into client deliverables, compliance annexes, or implementation sign-offs.

Where AI tools are useful on this regulation, background reading, identifying the relevant regulatory instruments, summarising the general harmonisation timeline, treat the output as a research starting point requiring verification, not a finished work product.

How RLB Can Help

RegLeg's published Hallucination Research is available as a free pre-flight check for lawyers working on regulatory matters. Before relying on AI-assisted output, whether for advice, drafting, or due diligence, lawyers can consult the research to understand which failure modes have been observed for the specific regulation in question. This is not a substitute for legal judgement, but it is a structured, independent reference that flags where AI tools have historically misfired, allowing practitioners to focus their human verification effort on the highest-risk points.

For firms where multiple lawyers work across the same regulatory portfolio, RegLeg offers bespoke deep-dive engagements. These go beyond the published research to examine the specific regulations, jurisdictions, and question types most relevant to the firm's practice. The output is a tailored briefing that legal teams can use as a standing reference, updated as the regulatory landscape evolves, giving the whole team a shared, consistent picture of where AI tools should be treated with caution and where they have performed reliably.

RegLeg also works with legal teams on training and CPD-aligned content. This covers the categories of failure lawyers are most likely to encounter, including outdated regulatory text, cross-jurisdictional confusion, and misattributed citations, framed around real regulatory examples rather than abstract AI theory. Separately, RegLeg can conduct a confidential review of a firm's existing AI-use policy, assessing it against the failure-mode catalogue the research has surfaced. The output is a structured gap analysis: which risks the policy already addresses, which it does not, and where practical amendments would strengthen the firm's position.