AI Hallucination on Implementation Monitoring of the PFMI: Level 3 Assessment on General Business Risks for Legal teams at Investment Banking firms in international jurisdictions

Executive Summary

Legal teams at investment banking firms with international FMI client relationships increasingly consult AI tools when contextualising CPMI-IOSCO Level 3 assessment outputs, whether anchoring internal PFMI self-assessment benchmarking, advising business lines on supervisory risk posture, or framing regulatory timeline questions for senior management. Across the question set tested against the November 2025 Level 3 Assessment on General Business Risks, AI tools produced one aggregated finding of concern: a hallucination in which the AI mischaracterised the temporal scope of the assessment, truncating the recognised assessment period and thereby misrepresenting when the work formally concluded.

The failure mode is the kind that survives junior review, the AI's answer is partially correct, internally coherent, and sourced against plausible secondary references, but materially wrong on the precise dates that matter when Legal is positioning a firm's self-assessment timeline relative to the IMSG process. For a Legal function advising on PFMI compliance obligations in international jurisdictions, an off-by-one-year characterisation of an assessment's conclusion date is not a rounding error; it shifts assumptions about what supervisory findings are settled versus still in flight.

How AI gets this regulation wrong

The failure pattern on this regulation centres on AI tools presenting outdated or partial information as if it were the settled, complete picture, specifically, conflating an intermediate characterisation of the assessment timeline with the authoritative one published in the final report. The table below maps where this manifests and how many tools produced the error.

AI's Failure Mode	Count	Affected findings
Outdated	1	Finding#1

What that means for your team

For Legal at an investment bank, the dominant risk here is producing a wrong deliverable, an internal briefing, a self-assessment timeline memo, or a regulatory mapping note that rests on an incorrect characterisation of when the IMSG process concluded and what phase of supervisory engagement firms are now in. The table below sets out how that translates into concrete exposure for the firm.

Risk Impact	Count	Affected findings
Wrong deliverable	1	Finding#1

When this affects your department

Legal teams at international investment banks interact with PFMI Level 3 assessment outputs in several distinct workflows. The most common is contextualising the assessment's findings and timeline for internal clients, business lines running FMI client services, risk teams updating PFMI self-assessment benchmarking documentation, or senior management briefings on where the international supervisory conversation on general business risk currently stands.

A second workflow is regulatory mapping: when the firm onboards or reviews relationships with central counterparties, trade repositories, or payment system operators, Legal often anchors the due-diligence framing against the most recent Level 3 findings, including verifying which assessment cycle is authoritative and what period it covers.

The point of failure is precise. The CPMI-IOSCO report characterises the assessment as carried out during 2023–25, a period that encompasses both the data-collection phase and the April 2025 FMI validation step before publication. AI tools tested on this regulation instead returned the shorter "2023–24" framing, drawn from earlier secondary coverage of the project that predated final publication.

If a Legal team member drafts a briefing note or self-assessment commentary relying on that truncated characterisation, the firm's internal record misrepresents when the assessment formally concluded, which FMI validation was incorporated, and therefore what supervisory cycle the November 2025 publication closes out.

The downstream stakes are not trivial. In international jurisdictions where regulators use CPMI-IOSCO Level 3 findings to benchmark domestic FMI oversight expectations, a firm that mischaracterises the assessment timeline in correspondence, submissions, or due-diligence files risks being read as not tracking the current supervisory state of play. If that error surfaces during a regulatory review or an internal audit of the firm's PFMI compliance programme, the Legal team cannot rely on "the AI said so" as a defence, and the remediation cost is not just correcting the document but retracing every downstream use of the incorrect timeline.

The findings at a glance

The table below summarises the finding identified when AI tools were tested on questions a Legal team at an international investment bank would ask about this regulation's assessment process and timeline.

#	Finding title	Type	Citation ID
1	Assessment timeline truncated, 2023–24 vs. 2023–25	Hallucination	RLB-F-INT-BIS-CPMI-IOSCO-PFMI-L3-GENERAL-BUSINESS-RISK-2025-Q005

Aggregate impact

The single finding on this regulation clusters on a specific and underappreciated vulnerability: AI tools trained or indexed against pre-publication secondary coverage of the IMSG project will absorb the "2023–24" shorthand used in contemporaneous reporting, and then reproduce it as a factual characterisation after the final report has corrected the record to "2023–25." The error is not random, it is structurally predictable wherever an AI tool weights well-indexed secondary sources more heavily than the authoritative primary document published later.

For Legal at an investment bank, the systemic risk is that this class of error is invisible to junior reviewers. The AI's answer is partially correct, confidently stated, and backed by a cited source that exists and is credible in the regulatory-news sense. The one-year truncation does not feel like a hallucination, it feels like a reasonable summary. That is precisely what makes it dangerous: it will pass informal review and propagate into briefing materials, self-assessment frameworks, and regulatory correspondence without triggering the usual verification reflex.

The scope of the error is also broader than the date alone. An AI that mischaracterises the assessment as concluding in 2024 will also mischaracterise the FMI validation phase, an April 2025 step the report describes as integral to the findings, as either absent or as a separate subsequent consultation. Any Legal analysis that relies on the AI's version of this chronology will misrepresent which FMIs' feedback is incorporated in the published findings, which matters when the firm is using the Level 3 report to benchmark its own FMI counterparties' general business risk frameworks.

What your team should do

The default position for Legal is straightforward: any use of AI to characterise the temporal scope of a CPMI-IOSCO Level 3 assessment, when it started, when data was collected, when it concluded, what validation steps were included, must be verified directly against the published report's executive summary before the answer goes into any firm document. The November 2025 report is publicly accessible on the BIS website. The authoritative characterisation of the assessment period is a single sentence in the executive summary. That cross-check takes thirty seconds and eliminates the entire category of error identified here.

The practical safeguard for team process is to treat AI-generated timelines and process chronologies for CPMI-IOSCO publications as draft inputs requiring primary-source sign-off, not finished answers. This is especially important when the AI's answer is sourced against secondary outlets rather than the BIS primary document, secondary coverage of IMSG work frequently predates the final validation phase and will carry the intermediate characterisation indefinitely. A cited source that exists is not the same as a cited source that reflects the final published record.

Where AI tools are genuinely useful for this regulation is in orientating unfamiliar team members around the structure of the Level 3 methodology, the scope of the general business risk focus relative to prior PFMI assessment cycles, or the distribution of findings across FMI types, the kinds of contextual framing questions where approximate accuracy is sufficient and the stakes of a one-year date error are low. The liability accumulates when AI output on procedural and timeline specifics is treated as ready-to-use without a primary-source check.

Keep that check in the workflow, and AI assistance on this regulation is a net positive for Legal efficiency.

How RLB Can Help

RegLeg's published hallucination research is available as a free pre-flight check your team can run before relying on AI output for any regulatory question covered in the corpus. If a finding shows that AI tools systematically misstate the scope of a reporting obligation, conflate two regulatory regimes, or invent an exemption threshold, your team has that on record before the output reaches a deal memo or a client advice note. That is a cheaper intervention than discovering the error in review, or after.

For Legal functions in international investment banking specifically, we can map which AI-supported workflows carry the highest hallucination exposure for your book of business: cross-border transaction structuring, multi-jurisdictional disclosure analysis, derivatives documentation review, sanctions and restricted-party screening workflows, and regulatory change tracking across overlapping regimes. The output is a prioritised exposure map scoped to the jurisdictions and product lines your team actually touches, not a generic AI-risk inventory.

Where a firm already has an AI-use policy in place, we can review it against RegLeg's failure-mode catalogue and return a prioritised remediation list: which policy assumptions are contradicted by documented failure patterns, where the policy is silent on known high-risk task categories, and what workflow controls would close the material gaps. We can also produce training material and CPD-aligned content your team can deploy internally, grounded in real failure cases, framed for Legal professionals who do not need a primer on what AI is, only on where it fails in their domain.