AI Hallucination on Harmonised ISO 20022 Data Requirements for Enhancing Cross-Border Payments - Updated Report for Treasury teams at Corporate Banking firms in international jurisdictions

Executive Summary

Treasury teams at Corporate Banking firms operating across international payment corridors are among the most directly exposed audiences to the CPMI's harmonised ISO 20022 data requirements, the standard that is reshaping how structured payment data must flow across correspondent networks, central bank infrastructure, and SWIFT channels alike. Across two questions put directly to AI tools about this regulation's specific requirements and official supporting data, AI assistants produced wrong answers in both cases.

The failures split across two distinct failure patterns: one where the AI asserted that no official figures existed when they do, filed in a publicly available BIS speech from March 2026, and one where the AI initially gave a confident, plausible-sounding answer on Fedwire's postal address format requirements, only for that answer to invert the actual requirement once scrutinised. For a Treasury function managing correspondent relationships, payment operations, and cross-border transaction throughput, both failure types carry operational and reputational consequence, particularly where the AI-generated content feeds internal policy, training materials, or counterparty-facing communications.

How AI gets this regulation wrong

AI tools tested on this regulation failed in two distinct ways: one overstated the absence of official data, asserting that precise regulatory figures simply do not exist when they are on the public record, while the other substituted plausible-sounding but incorrect technical detail, inverting the actual field structure required by the implementing infrastructure. The table below breaks down these failure modes and maps each to the specific question context where they surfaced.

AI's Failure Mode	Count	Affected findings
Inference Drift	2	Finding#1 · Finding#2

What that means for your team

Both failures in this cell fall into the same risk category, the AI produces a wrong deliverable, but the downstream damage differs substantially depending on which part of the Treasury workflow picks it up first. The table below maps each finding to the operational or governance surface where the error is most likely to embed itself before anyone catches it.

Risk Impact	Count	Affected findings
Wrong deliverable	2	Finding#1 · Finding#2

When this affects your department

Treasury's most direct exposure to this regulation sits at the intersection of correspondent banking operations and internal payment infrastructure policy. When the firm is scoping its ISO 20022 migration roadmap, reviewing correspondent readiness, or building the internal briefings that flow down to cash management teams and transaction banking product owners, AI tools are a natural first port of call, particularly for quantifying the business case or extracting technical field-level requirements that would otherwise require sustained engagement with dense BIS documentation and vendor FAQs from individual central bank operators.

The specific failure points tested here sit in two of those use cases. The first, establishing what the official CPMI and FSB record says about inquiry volumes and resolution time improvements, is precisely the kind of regulatory evidence-base Treasury builds when presenting to the business or defending a project budget to the CFO. If the AI incorrectly signals that no official figures exist, the Treasury team either abandons the regulatory citation and weakens their case, or invests unnecessary time in manual research.

The second, understanding how implementing infrastructure such as Fedwire has interpreted the hybrid postal address format, is squarely a technical implementation question. Treasury teams advising on correspondent payment instructions, reviewing payment system connectivity, or scoping what changes are needed to origination systems need the right field-level answer. An incorrect description of which address components are structured versus free-format is not a rounding error; it determines whether the firm's payment messages will pass validation or generate exceptions at the receiving end.

Across both, the risk is identical in structure: a junior analyst or a business line partner takes the AI answer at face value, it enters a policy note, a training deck, a system configuration spec, or an internal Q&A document, and the error propagates before anyone with direct BIS source exposure reviews it. Given the pace at which ISO 20022 implementation is moving across correspondent networks, Treasury does not always have the luxury of a second review cycle before the output is being acted on.

The findings at a glance

The two findings below cover the specific questions on which AI tools produced incorrect answers when tested against this regulation and its associated official documentation.

#	Finding title	Type	Citation ID
1	Missing official inquiry-rate and resolution-time benchmarks	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q007-Sonnet46
2	Fedwire hybrid postal address schema over-specification	Hallucination	RLB-F-INT-BIS-CPMI-ISO-20022-HARMONISATION-UPDATED-2026-Q010-Opus47

Aggregate impact

Both failures in this cell cluster on the implementation-facing layer of the CPMI's ISO 20022 harmonisation work, not on abstract principles, but on the operational specifics that Treasury teams need to get right: what regulators have actually said on the public record about the payment friction this standard is meant to fix, and exactly how a major central bank infrastructure operator has implemented the required address data model. That clustering is not coincidental.

It reflects where AI tools are weakest on a still-evolving standard: the gap between the long-standing message specification and the recent, implementation-level documentation that has accumulated around it in the last twelve to eighteen months.

The first failure is particularly dangerous for Treasury's internal communications and business case work. The official figures from the BIS, a 1–3% cross-border inquiry rate, 5–10 manual touchpoints per exception, and up to 80% resolution-time reduction through harmonised ISO 20022 implementation, are the kind of data points that appear in board papers, CFO briefings, and project business cases. An AI that tells a Treasury analyst these figures don't exist in official CPMI or FSB material is not just unhelpful; it actively misdirects the team away from source material that would strengthen their regulatory position and investment case simultaneously.

The risk is that the team concludes no authoritative data exists and either drops the quantitative framing entirely or cites a weaker commercial source instead.

The second failure is operationally sharper. Fedwire's interpretation of the hybrid postal address format, specifically that the optional component uses free-format lines of up to 70 characters rather than optional structured fields like Street Name, Building Number, and Post Code, is not a nuance that surfaces in general ISO 20022 reference material. An AI trained on CBPR+ address conventions will pattern-match to structured optional subfields, because that is the dominant model across the SWIFT ecosystem. But Fedwire's FAQ specifies the opposite for the residual component.

A Treasury team relying on the AI's answer when reviewing origination system configurations, correspondent payment instructions, or vendor connectivity specs will build to the wrong field structure, generating validation failures or exception queues that would require remediation once the system is live.

What your team should do

The default position for Treasury when using AI tools on this regulation should be: AI is useful for orientation and summarisation of the published CPMI framework, but its answers on official quantitative statements and on implementation-level technical detail from individual infrastructure operators require direct source verification before any use in a deliverable. That is not a general caveat, it is specific to the failure pattern this regulation has produced.

The AI's knowledge of the BIS's recent speeches and the operator-level FAQs that have followed the migration deadlines is demonstrably patchy, and the failures are not random gaps, they concentrate precisely where the regulation is most live and most consequential for Treasury.

For quantitative work, building a business case, scoping exception management resource requirements, or briefing senior stakeholders on what harmonisation is expected to deliver, go directly to BIS and FSB published speech transcripts and working group reports rather than asking AI to summarise what officials have said. The Panetta speech of March 2026 is publicly indexed; it takes less time to find it directly than to iterate with an AI tool that may not have surfaced it.

For technical field-level implementation questions on specific infrastructure operators, Fedwire, TARGET2-S, CHAPS, or any other system your firm connects to, treat the AI answer as a first-draft orientation only. Operator FAQs and technical release notes are the authoritative source, and they change on a migration-by-migration cycle that AI training lags by design.

Where AI adds genuine value in this regulatory space is in navigating the high-level CPMI framework itself, understanding the staged approach, the data element hierarchy, the relationship between CBPR+, HVPS+, and other market practice groups, or generating a first-pass gap analysis against the firm's current message structures. These are areas where the underlying standard has been stable long enough to be reliably represented in training data, and where an AI summary can usefully accelerate work that a junior analyst would otherwise spend days doing manually.

The control is straightforward: verify any answer that touches a specific date, figure, or operator-level configuration before it leaves the Treasury function.

How RLB Can Help

RegLeg's published Hallucination Research gives Treasury teams a practical pre-flight check before placing weight on AI-assisted regulatory output. The findings are public and regulation-specific, your team can run a cross-reference against any AI-generated briefing on liquidity requirements, FX exposure limits, or collateral treatment before it reaches a risk committee or an ops desk. That's not a subscription ask; it's just due diligence infrastructure that's already available.

Where the published research doesn't cover your specific regulatory stack, we work directly with Corporate Banking Treasury functions to map which AI-supported workflows carry the highest hallucination exposure in your jurisdiction set. That typically means tracing where AI tools are already embedded, regulatory interpretation for LCR/NSFR calibration, cross-border capital allocation queries, real-time collateral eligibility checks, and stress-testing those use cases against the failure-mode patterns we've documented across equivalent regulatory regimes. The output is a prioritised exposure map, scoped to your actual workflows, not a generic AI-risk taxonomy.

We also work with firms that have existing AI-use policies but haven't validated them against granular failure-mode evidence. If your Treasury function has approved AI tools for regulatory research or compliance support, we can run a confidential review of that policy against our catalogue, identifying where your current guardrails are well-calibrated and where they have gaps that reflect how these models actually fail on Treasury-relevant material.

The same content can be packaged as CPD-aligned training for the team: structured around real failure examples from analogous regulations, designed for practitioners who already know the underlying regulatory framework and don't need it explained.