AI Labs · Last updated 7 Jun 2026 · methodology vv2.3 · Hallucination Register

Regulation 1.25 Amendment Failures: Schema Substitution and Procedural Confabulation on CFTC Customer Fund Investment Rules

Both Claude Opus 4.7 and Claude Sonnet 4.6, each with web search active, produced confidently wrong reconstructions of the CFTC's 2024 amendments to Regulation 1.25 on the rule's three operative pillars: the size-triggered 50 per cent concentration ceiling, the 24-month portfolio dollar-weighted average maturity standard with its narrow carve-out set, and the separate March 31, 2025 SIDR compliance anchor. Across five confirmed failures on this regulation, every failure is classified as inference_drift against substrate covering 17 CFR 1.25(b)(3)(ii), 17 CFR 1.25(b)(3)(iv), and 7 USC 6d.

The structural significance is that these are not retrieval misses on obscure content; they are over-confident confabulations on the most decision-critical parameters of a compliance rule.

When this affects AI Labs

Futures commission merchants and derivatives clearing organizations subject to CFTC Regulation 1.25 manage billions in customer segregated funds. Treasury, compliance, legal, and fintech teams at FCMs are exactly the user population that treats frontier models as a fast path to regulatory clarity, asking directly about permissible investment limits, maturity constraints, and deadline calendars before updating investment policies. A model that reconstructs plausible-but-wrong parameters with high apparent confidence is not a neutral retrieval miss; it is an active misdirection event at the precise moment a practitioner is making a compliance decision.

The downstream harm vector is concrete. At scale across the FCM and DCO community, systematic confabulation on a rule with this profile creates both reputational exposure for the lab and a pattern of misuse-adjacent harm that red-team coverage should have flagged before deployment.

Aggregate impact

Model	Configuration	Failure count	Dominant error pattern
Claude Opus 4.7	Web search enabled	3	Trained-schema substitution for rule-specific numeric structure
Claude Sonnet 4.6	Web search enabled	2	Trained-schema substitution on concentration tier and no-standard answer on DWAM

Failures cluster on three surfaces: multi-condition trigger structures, narrow exclusion lists, and date-certain compliance anchors. The joint failure pattern signals that for regulations fitting this profile, web search is not providing sufficient signal to override trained-schema responses.

Findings

5 findings in this case study. Click any to see its full evidence card.

Finding 1: Claude Opus 4.7 with web search
Finding 2: Claude Sonnet 4.6 with web search
Finding 3: Claude Opus 4.7 with web search
Finding 4: Claude Sonnet 4.6 with web search
Finding 5: Claude Opus 4.7 with web search

What your team should do

Implications for your training data

The dominant failure pattern across both models is schema substitution on recently amended numeric provisions. Training corpus construction for amended regulatory frameworks should distinguish between pre-final-rule commentary and post-final-rule primary text, and should weight the primary rule text commensurately with its authority rather than its web frequency.

Implications for your post-training logic

Post-training calibration should: (1) flag no-standard or no-requirement answers on numeric compliance provisions for explicit verification; (2) suppress relative-range placeholders when date-certain anchors are retrievable; (3) require explicit decomposition of multi-condition trigger structures rather than collapsing them into unconditional answers.

Specific eval / red-team probes RegLeg suggests

Multi-axis trigger preservation probes.
Carve-out exact-set probes.
Default-application probes.
Date-certain anchor probes.
Conditional-ceiling decomposition probes.

How RLB can help

RegLeg Brief documents nuanced model failures on regulatory content across a portfolio of regulators and rule types. The failure modes catalogued on this regulation include threshold-trigger elision, carve-out inversion, no-standard answers on governed asset classes, and date-certain drift. We generate targeted correction pairs per failure mode, offer embedded eval partnership against a defined regulator portfolio, and can run pre-release evaluation cycles for capability launches that touch derivatives, futures clearing, or customer-funds regulatory content. To scope a partnership, start a technical conversation with us at reglegbrief.com.

← Back to summary Other AI Labs white papers →

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.