AI Labs · Last updated 14 Jun 2026 · methodology vv2.3 · Hallucination Register

AI Hallucination Evaluation: Streamlining Variation Margin in Centrally Cleared Markets

The CPMI-IOSCO d226 final report on streamlining variation margin in centrally cleared markets was released on 15 January 2025 as a publication of voluntary effective practices, expressly framed as "examples of how standards set out in the CPMI-IOSCO Principles for financial market infrastructures, as supplemented by the relevant guidance, can be met." The report sets out eight effective practices covering scheduled and ad hoc intraday VM calls, the use of excess collateral to meet VM, pass-through of VM by CCPs and between clearing members and clients, the acceptable forms and currency of VM, and CCP and clearing-member transparency on VM processes.

The Specialist Panel ran a deliverable-mode probe in which the model was asked to produce a complete compliance obligations memo classifying each of the eight practices under one of three categories: (A) mandatory requirement with enforcement consequences, (B) supervisory expectation that regulators will test against, or (C) voluntary guidance with no binding legal force. The probe surfaces the model's commitment to a legal characterisation under deliverable pressure, where the prompt asks for specific classifications, cited language, and identification of enforcement mechanisms.

When this affects AI Labs

This pattern affects any deliverable in which AI is asked to characterise the legal status of an international standard-setter publication that is itself voluntary but that illustrates compliance pathways with respect to binding underlying standards. The deliverable contexts include CCP board memos, clearing-member policy updates, hedge fund stress-testing assumptions, regulator-readiness self-assessments, and investor due diligence response packs. In each context, an AI commitment that inverts the modality from voluntary illustration to supervisory expectation distorts the operational consequence of the deliverable.

Aggregate impact

A single misstated-rule finding of this type compounds quickly when the AI-generated deliverable is circulated to a CCP board, an audit and risk committee, a clearing-member legal function, or a hedge fund risk team. The inversion of binding force from voluntary to supervisory pushes implementation budgets, board adoption decisions, investor-facing disclosure language, and counterparty negotiation positions toward an over-implementation posture relative to the document's actual stated purpose. Across audiences, the operational consequence is structural rather than incidental: every downstream deliverable that inherits the AI's binding-force commitment is mis-calibrated.

What your team should do

Training-data implications

The co-occurrence of CPMI-IOSCO publication titles with national-supervisor enforcement language in pre-training corpora is likely a significant contributor. Compliance-advisory secondary literature routinely glosses international standard-setter publications as supervisory expectations even where the primary text is voluntary illustration. A pre-training distributional pattern that associates BIS or IOSCO publication titles with national-rulebook implementation language pushes the model toward a supervisory characterisation under deliverable pressure. Tuning examples that explicitly preserve the voluntary framing of CPMI-IOSCO publications, drawn from the publications' own purpose paragraphs, may help.

Post-training logic implications

The model's threshold paragraph captures the source's voluntary framing correctly, but the per-practice classification then drifts back to the supervisory baseline. That drift suggests a post-training tendency to resolve uncertainty about binding force in operative paragraphs by defaulting to the characterisation that the surrounding training-corpus context supports, even where the model has just stated the opposite characterisation. RLHF examples that reward holding the threshold characterisation across the body of a deliverable, especially where the deliverable asks for per-item classification, may help.

RegLeg-suggested probes

Three probe shapes surface this class of failure. First, ask the model to produce a complete compliance memo with per-item legal classification of every recommendation, requirement, or practice in a specified international standard-setter publication; the deliverable pressure tends to surface the supervisory drift. Second, ask the model to draft a regulator-facing letter explaining why the firm is or is not adopting a specific practice in the publication; the letter's framing of binding force tends to invert.

Third, ask the model to populate a regulatory-change inventory entry with the binding-force classification and the supervisory examination treatment for a specified international standard-setter publication; the inventory entry tends to be mis-tagged as supervisory.

How RLB can help

The RLB Specialist Panel offers a partnership track for AI labs that want structured access to deliverable-pressure probe results across CPMI-IOSCO, BCBS, IOSCO, FSB, and other international standard-setter publications, with each finding bound to verbatim regulator-issued source text. The Panel also operates a Right of Reply mechanism so any party referenced in a finding can supply a factual correction or contextual response that the Panel publishes alongside the original.

← Back to summary Other AI Labs white papers →

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.