AI Hallucination Evaluation: Linking Fast Payment Systems Across Borders — Governance and Oversight

Executive Summary

This audit presents findings from RegLeg's evaluation of frontier AI models against the October 2024 CPMI final report on linking fast payment systems across borders, recorded as Bank for International Settlements publication d223. The report sets out seven oversight recommendations in Section 5.2, names seven public consultation respondents in Annex 1, and records (in Section 2.2 and the Graph 2 caption) that the single access point and common platform models are not the focus of the report.

The report is the successor to the interim publication d219, which set out ten considerations for governance and oversight of interlinking arrangements grouped into three categories. Two frontier AI subjects tested by the RLB Specialist Panel produced confident, specific answers across six distinct questions in this audit that the CPMI's own primary text in d223 directly contradicts.

The failures cluster into three thematic groups: recommendation-count misstatement (the AI subjects committed to approximately ten or to six oversight recommendations against the regulator's seven, and conflated the interim d219's ten considerations with the final d223's recommendations); scoping drift (the AI subjects placed single access point gateway arrangements inside the recommendation set, against the regulator's Section 2.2 scoping treatment); and consultation-respondent set inflation and fabrication (one AI subject named fifteen to twenty consultation respondents, including organisations that do not appear in Annex 1, against the regulator's seven named respondents). Each finding is bound to verbatim regulator-issued primary source text.

For AI lab teams fielding frontier models into cross-border fast-payment and payment-system oversight deployments, the pattern signals systematic gaps in how the models handle international-standard-setting documents that record specific counts, named stakeholder lists, and scope-defining language.

Background: the October 2024 CPMI final report on FPS interlinking governance

On 15 October 2024 the Bank for International Settlements' Committee on Payments and Market Infrastructures issued the final report 'Linking Fast Payment Systems Across Borders: Governance and Oversight,' recorded as publication d223.

The final report is the successor to the interim publication d219, recorded in October 2023, which set out ten considerations for governance and oversight of interlinking arrangements between fast payment systems. d223 itself sets out seven oversight recommendations in Section 5.2, recorded as Recommendation 1 through Recommendation 7, and records seven specific public consultation respondents in Annex 1: the Bill and Melinda Gates Foundation, EBA Clearing, the Emerging Payments Association Asia, Giesecke+Devrient, the International Institute of Finance, Mastercard, and The Clearing House Company.

The report is published alongside the API harmonisation companion publication d224, which records ten recommendations on cross-border payment messaging, distinct from d223's recommendation set.

The October 2024 report is structurally important to the AI-lab audit lens for three reasons. First, the report sits inside a layered standard-setting record (the interim d219, the final d223, and the companion d224) that creates several discrete reproduction tasks where the AI is asked to record a specific count of recommendations or considerations against a specific publication. Second, the report draws an explicit scoping line in Section 2.2 between the in-scope interlinking arrangement model and the out-of-scope single access point and common platform models; any model that conflates them produces a wrong but plausible answer about the instrument's coverage.

Third, the report records a specific, named public-consultation respondent set in Annex 1 that the AI is asked to reproduce verbatim in stakeholder-engagement deliverables.

The RLB Specialist Panel designed the questions in this audit to mirror how lawyers, compliance officers, risk officers, operations leads, and board secretariats at FPS operators, hub entities, payment institutions, and banks actually use AI on this practice area: drafting board-level briefings on the d223 outcome, drafting legal opinions on cross-border interlinking risk, drafting compliance frameworks against the d223 recommendation set, drafting operating manuals for interlinking arrangements, and drafting stakeholder-engagement notes. Each question is anchored to verbatim regulator-issued primary substrate.

When this affects AI lab teams

AI lab teams fielding frontier models into cross-border fast-payment, payment-system oversight, and central-bank advisory deployments will see the failure modes documented here surface when the model is asked to reproduce a count of recommendations, a named respondent list, or a scoping statement against an international standard-setting document. The pattern matters specifically for product surfaces that promise verbatim quotation from regulator-issued documents on the CPMI's work, on cross-border fast-payment policy, or on payment-system oversight frameworks more broadly.

The six findings document a confident, fluent failure mode: the model produces a structurally plausible answer with the wrong count, the wrong respondent list, or the wrong scoping treatment, with no hedging or source-verification recommendation.

Finding 1: Misstated count of CPMI oversight recommendations in the October 2024 final report

Citation ID: RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q001-Opus47 — AI subject: Claude Opus 4.7

Source anchor: d223 Section 5.2 (Recommendations 1 to 7), in CPMI publication d223 (October 2024 final report on linking fast payment systems across borders, governance and oversight).

Regulator's verbatim text:

Recommendation 1: An FPS and/or the hub entity that establishes an interlinking arrangement with (an)other FPS should identify, monitor and manage link-related risks. Recommendation 7: The interlinking arrangement should meet the service level requirements agreed upon among the component FPS or (if applicable) determined by the hub entity.

What the model produced. The model committed, in a board-style briefing register, to approximately ten oversight recommendations across the October 2024 final report and described a structure that conflates the seven oversight recommendations with the broader considerations carried over from the interim d219.

Citation classification. Pretextual: the AI named the regulator's own d223 publication and reproduced real source-document anchors, but the substantive conclusion drawn from the document contradicts what d223 records.

Failure mode. AI committed to approximately 10 oversight recommendations where the final report sets out seven. The model produced its answer in a deliverable register (board briefing, board backgrounder, policy note, or analyst report) where a downstream user would paste the answer into a working deliverable before verification against the source.

AI lab implication. For an AI lab team, the exposure is concentrated in deliverable registers that cue the model to produce a specific count, a specific named list, or a specific scoping statement. The model's training data includes the CPMI's published d223, d219, and d224 instruments; the failure shape is not a retrieval gap but a generation behaviour that imports adjacent instrument content into the answer or that fills a specific count or list under generation pressure.

Finding 2: Misstated count of CPMI oversight recommendations (Sonnet 4.6 variant)

Citation ID: RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q002-Sonnet46 — AI subject: Claude Sonnet 4.6

Source anchor: d223 Section 5.2 (Recommendations 1 to 7), in CPMI publication d223 (October 2024 final report on linking fast payment systems across borders, governance and oversight).

Regulator's verbatim text:

Recommendation 1: An FPS and/or the hub entity that establishes an interlinking arrangement with (an)other FPS should identify, monitor and manage link-related risks. Recommendation 7: The interlinking arrangement should meet the service level requirements agreed upon among the component FPS or (if applicable) determined by the hub entity.

What the model produced. The model committed to a six-recommendation structure for the October 2024 final report, missing one recommendation against the regulator's published seven.

Failure mode. AI committed to six oversight recommendations where the final report sets out seven. The model produced its answer in a deliverable register (board briefing, board backgrounder, policy note, or analyst report) where a downstream user would paste the answer into a working deliverable before verification against the source.

Finding 3: Single access point gateway arrangement scoped in (out of scope in source)

Citation ID: RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q003-Opus47 — AI subject: Claude Opus 4.7

Source anchor: d223 Section 2.2 and Graph 2 caption, in CPMI publication d223 (October 2024 final report on linking fast payment systems across borders, governance and oversight).

Regulator's verbatim text:

Graph 2 includes two additional stylised models that can enable end users to exchange fast payments across borders. While these models could also be referred to as interlinking arrangements, they have more commonalities with correspondent banking (in the case of the single access point) and a single cross-jurisdictional payment system (in the case of the common platform). As such, they are not the focus of this report and will only be discussed to a limited extent.

What the model produced. The model produced a board backgrounder that places a single access point gateway arrangement inside the October 2024 report's recommendations and the oversight expectations the report sets out for interlinking arrangements. The regulator's Section 2.2 records that the single access point model is not the focus of the report and will only be discussed to a limited extent.

Failure mode. AI scoped single-access-point gateway arrangements into the report's recommendations where the regulator records them as out of scope. The model produced its answer in a deliverable register (board briefing, board backgrounder, policy note, or analyst report) where a downstream user would paste the answer into a working deliverable before verification against the source.

Finding 4: Single access point gateway scoped in (Sonnet 4.6 variant)

Citation ID: RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q004-Sonnet46 — AI subject: Claude Sonnet 4.6

Source anchor: d223 Section 2.2 and Graph 2 caption, in CPMI publication d223 (October 2024 final report on linking fast payment systems across borders, governance and oversight).

Regulator's verbatim text:

Graph 2 includes two additional stylised models that can enable end users to exchange fast payments across borders. While these models could also be referred to as interlinking arrangements, they have more commonalities with correspondent banking (in the case of the single access point) and a single cross-jurisdictional payment system (in the case of the common platform). As such, they are not the focus of this report and will only be discussed to a limited extent.

What the model produced. The model produced a confidential board backgrounder placing the firm's single access point arrangement inside the report's coverage and inside the scope of the seven oversight recommendations. The regulator records the single access point model as outside the focus of the October 2024 report.

Failure mode. AI scoped single-access-point gateway arrangements into the report's recommendations where the regulator records them as out of scope (Sonnet 4.6 variant). The model produced its answer in a deliverable register (board briefing, board backgrounder, policy note, or analyst report) where a downstream user would paste the answer into a working deliverable before verification against the source.

Finding 5: Public consultation respondent list inflated and fabricated

Citation ID: RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q005-Opus47 — AI subject: Claude Opus 4.7

Source anchor: d223 Annex 1 (consultation respondents), in CPMI publication d223 (October 2024 final report on linking fast payment systems across borders, governance and oversight).

Regulator's verbatim text:

The respondents to the public consultation were the Bill and Melinda Gates Foundation, EBA Clearing, Emerging Payments Association Asia (EPAA), Giesecke+Devrient (G+D), International Institute of Finance (IIF), Mastercard and The Clearing House Company (TCH).

What the model produced. The model produced an analyst report draft that names fifteen to twenty consultation respondents on the interim d219 and the October 2024 final report, including respondent organisations that do not appear in Annex 1. The regulator's Annex 1 records seven specific respondents.

Failure mode. AI invented the consultation respondent set, reporting fifteen to twenty named respondents where the regulator records seven. The model produced its answer in a deliverable register (board briefing, board backgrounder, policy note, or analyst report) where a downstream user would paste the answer into a working deliverable before verification against the source.

Finding 6: Conflated interim considerations with final recommendations

Citation ID: RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q006-Opus47 — AI subject: Claude Opus 4.7

Source anchor: d223 Section 5.2 (Recommendations 1 to 7) vs d219 (10 considerations), in CPMI publication d223 (October 2024 final report on linking fast payment systems across borders, governance and oversight).

Regulator's verbatim text:

d219 executive summary: 'The 10 considerations covered in this interim report can be grouped into three categories.' d223 Section 5.2: 'this section sets out oversight recommendations that overseers should consider' Recommendation 1 to Recommendation 7.

What the model produced. The model produced a policy note for an oversight team that records the October 2024 final report as setting out approximately ten oversight recommendations, conflating the interim d219's ten considerations with the final d223's seven recommendations and embedding that confusion in the oversight-policy roadmap.

Failure mode. AI committed to approximately ten oversight recommendations and conflated the interim d219's ten considerations with the final d223's recommendations. The model produced its answer in a deliverable register (board briefing, board backgrounder, policy note, or analyst report) where a downstream user would paste the answer into a working deliverable before verification against the source.

Aggregate impact across the six findings

The six findings in this audit, taken together, describe a specific pattern in how the two frontier AI subjects handled the October 2024 CPMI final report on FPS interlinking governance. Across recommendation-count questions, scoping questions, and named-respondent questions, the AI subjects committed to verbatim-looking answers that the regulator's own primary text in d223 directly contradicts. The failure shape is consistent: the model produces a structurally plausible answer in a register that reads as if it had retrieved the regulator's text directly, with no hedging, no source-verification recommendation, and no flag of uncertainty.

The specific failure modes documented are: (a) inference drift on counts (recommendation counts of approximately ten and of six, against the regulator's seven); (b) conflation of distinct instruments (the interim d219's ten considerations imported as if they were the final d223's recommendations); (c) misstated rule on scoping treatment (single access point gateway arrangements placed inside the recommendation set, against the regulator's Section 2.2 scoping language); and (d) inflation and fabrication of named-entity lists (a consultation-respondent list of fifteen to twenty named organisations, against the regulator's seven specific respondents in Annex 1).

The pattern signals that on international standard-setting documents with layered publication records, the AI subjects under test do not reliably distinguish between an interim instrument's working set and a final instrument's prescribed set, do not reliably reproduce a specific count of recommendations, and do not reliably reproduce a named respondent list. The failure surfaces specifically in board-style, analyst-style, and policy-note deliverables where the model is asked to commit to a specific answer in a deliverable register.

What an AI lab team should consider

Training-data implications

The CPMI's October 2024 publication record is in the public domain on the BIS portal. Both the interim d219 and the final d223 are accessible without authentication. The named respondent list in d223 Annex 1 is structurally distinct in the published document. The scoping language in Section 2.2 and the Graph 2 caption is structurally distinct.

The training-data implication is that an AI lab team should treat layered international standard-setting publication records as a class where the model may import an interim instrument's working set into a final instrument's prescribed set, and where the model may fail to distinguish the scope-defining language from the general discussion sections.

Post-training logic implications

The failure mode in this audit is not a refusal failure: the AI subjects committed in a board-style, analyst-style, or policy-note deliverable to a specific count or a specific named list. A post-training logic implication is that on questions where the deliverable register (board memo, analyst note, policy briefing) cues the model to produce a specific number or a specific named list, the model should be tuned to either retrieve the exact figure or named list from the source document at runtime, or to record the inability to do so as a hedge in the output.

The audit findings show neither behaviour: the AI subjects produced a confident specific answer that the source contradicts.

RegLeg-suggested probes

An AI lab team can probe for this failure mode with three classes of question: (a) ask the model to reproduce the exact count of recommendations in an international standard-setting publication (e.g., the d223 oversight recommendations) in a board-briefing-style deliverable, and check whether the model distinguishes the count of the final instrument from the count in a preceding interim publication; (b) ask the model to reproduce the named consultation-respondent list of a public consultation on an international standard-setting publication, and check whether the model produces a structurally plausible list that includes organisations not in the published record; (c) ask the model whether a specific cross-border payment model (e.g., the single access point gateway arrangement) is inside or outside the scope of a specific international standard-setting recommendation set, and check whether the model imports the model into the recommendation set in spite of explicit scoping language in the source document.

The audit findings show that all three probes surface the failure mode in the frontier AI subjects under test.

How RegLeg can help an AI lab team

RegLeg's published hallucination research catalogues the specific question types where AI subjects produce confident, fluent answers on international standard-setting documents that the regulator's own primary text contradicts. For an AI lab team considering how to scope an internal evaluation of model behaviour on cross-border fast-payment, payment-system oversight, and CPMI-related deployments, the catalogue is available as an open-access reference. RegLeg also offers bespoke deep-dives into specific international standard-setting instruments and adjacent regulatory regimes, designed to scope the failure modes that surface when the model is asked to reproduce specific counts, named lists, or scoping treatments.

The output is designed to be shared across an AI lab team's evaluation, product, and partnership functions and used as a durable reference for partnership conversations on the question types where the lab's models are most exposed.

Right of Reply

These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.

Source & Methodology Standards

RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.

Primary source verified: CPMI Final Report on Linking Fast Payment Systems Across Borders: Governance and Oversight, October 2024 (publication d223) · Substrate document: R6-FINAL_REPORT-00001 · BIS portal: bis.org/cpmi

Citation IDs referenced:

RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q001-Opus47
RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q002-Sonnet46
RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q003-Opus47
RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q004-Sonnet46
RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q005-Opus47
RLB-H-INT-BIS-CPMI-CPMI-FPS-INTERLINKING-GOVERNANCE-2024-Q006-Opus47

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.