AI Hallucination ResearchBriefings › Briefing
AI Labs US CFTC
AI Labs · Regulations to Address Margin Adequacy and to Account for the Treatment of Separate Accounts by Futures Commission Merchants (17 CFR § 1.44)

By Kratti A Agrawal, Lead, RegLeg Brief Specialist Panel

Specialist Panel: Frontier AI models misread CFTC Regulation 1.44 (Margin Adequacy + Separate Accounts)

RLB Specialist Panel maps the dark spots in AI cognition inside FCM Reg 1.44 margin enumeration.

— RLB Specialist Panel

Frontier AI models rewrote a CFTC margin rule, regulatory-research panel finds

Two frontier AI models, with web search enabled, "compressed" a three-tier currency deadline structure into two and invented an intraday cutoff that does not exist in 17 CFR § 1.44(f), confidently, and without caveat. The RegLeg Brief Specialist Panel calls it "Enumeration Collapse" and says the pattern points to a calibration problem in how primary regulatory text is weighted against third-party summaries.

SINGAPORE, June 12, 2026. Two frontier artificial-intelligence models generated operational guidance on a recently finalised Commodity Futures Trading Commission rule that contradicts the rule's text in ways that would push U.S. Futures Commission Merchants out of compliance if acted on, according to a white paper released today by RegLeg Brief, a regulatory-research outfit operated by Singapore-incorporated Verdus Technologies Pte. Ltd.

The findings, published with the immutable RLB Citation IDs RLB-H-US-CFTC-FCM-MARGIN-ADEQUACY-SEPARATE-ACCOUNTS-REG-1-44-Q001-Opus47 and …-Q001-Sonnet46, concern CFTC Regulation 1.44 (17 CFR § 1.44), the rule that governs how FCMs margin and segregate customer assets in separate accounts. Both Anthropic's Two frontier AI models were tested with web search active, mirroring how compliance and operations staff at FCMs and their technology vendors actually use the models today.

The Verbatim Rule: Three Tiers, Defined by Appendix Membership

Section 1.44(f) sets out a three-tier currency deadline structure:

Membership in each tier is defined by the regulation, not by the currencies' intuitive groupings. CAD's assignment to the USD Fedwire tier and the precise ten-currency Appendix A list are explicit regulatory decisions with no derivable basis in the currencies' properties.

Claude Opus 4.7: Compressed Three Tiers Into Two

Asked to map the currency deadline tiers for an FCM margin operations team, Claude Opus 4.7 (with web search on) wrote, verbatim:

"For margin called in a fiat currency other than USD, the deadline may be extended by up to one additional business day… T+1 should be the default presumption… For all other non-USD currencies, default to same-day (T)."

The structural error. The model assigned Appendix A currencies a T+1 deadline (the regulation requires T+2) and pushed every remaining non-USD currency to same-day collection (the regulation grants them T+1). An FCM treasury team configuring system parameters from this output would:

Why it failed. The RegLeg Brief Specialist Panel writes that "the two-tier reconstruction matches the format of third-party law-firm summary content that pre-dates or simplifies the rule's actual T+2 Appendix A tier… The retrieval-augmented generation layer either did not surface §1.44(f)(2) verbatim, or did surface it but failed to override the model's prior toward the simpler two-tier schema."

The failure mode is classified as outdated against substrate document R2-REGULATION-17CFR_1_44_eCFR_asof_2026-06-04.pdf.

Claude Sonnet 4.6: Invented a Deadline That Does Not Exist

Given the same operational brief, Sonnet 4.6 (with web search on) generated:

"TIER 1 EXTENSION, OTHER NON-USD/CAD FIAT CURRENCIES (T+1, 12:00 p.m. ET) • Deadline: 12:00 p.m. ET on the FIRST U.S. business day after the business day on which the margin call was issued."

The fabrication. Section 1.44(f)(3) sets no intraday cutoff. The regulator's verbatim text reads: "no later than the end of the business day after the day on which the margin call is issued." The "12:00 p.m. ET" figure does not appear in the rule or in Appendix A. A compliance procedure or system parameter configured against this output would impose a self-restricting noon deadline with no regulatory basis, and the documentation trail would cite a specific time that the rule does not contain.

A fabricated citation, too. On a parallel question, Sonnet 4.6 supported its two-tier currency reading with a URL on the website of international law firm Sidley Austin. The Specialist Panel verified the cited page does not exist; the citation is flagged as Fabricated.

The diagnostic. When the panel re-probed Sonnet 4.6 with a neutral prompt, the model self-retracted and produced the correct three-tier structure. The panel writes: "The self-retraction on re-probe confirms the correct three-tier structure was accessible, the generation pathway selected the wrong output despite having the right information available. This is a calibration failure in the RAG-to-generation handoff."

The failure mode is classified as inference_drift against the same substrate document.

The Pattern: Enumeration Collapse

The Regulation 1.44 findings sit inside a broader failure class the RegLeg Brief Specialist Panel has been documenting across CFTC and adjacent regulatory work, which it calls Enumeration Collapse, frontier models reconstructing recently enacted regulatory lists from intuitive priors about how such rules typically look, rather than from the regulation's actual enumeration.

The white paper documents a second Regulation 1.44 question, on the rule's cessation triggers, the events that legally force an FCM to stop treating customer accounts as separate:

An automated risk-monitoring system built on either output would carry a surveillance gap a CFTC examination would surface.

Why the Failure Is Invisible at Runtime

Both Claude outputs shared the same surface characteristics, checklist formatting, jurisdictional groupings, internal cross-references, no hedging language. The white paper states the operational risk plainly:

"The failure is not recoverable by the user in real-time: the model's output is internally consistent and plausible enough that validation against the primary text would only happen if the user already knew what to look for."

Compliance and operations teams at FCMs and their technology vendors are the population most exposed. They use AI assistants to draft operational procedures, configure monitoring thresholds, and interpret regulatory text on tight timelines, the exact workflow in which the failure surfaces.

What AI Labs Can Do: Suggested Probes (Open-Access)

The RegLeg Brief Specialist Panel documents five red-team probe designs in the white paper that any AI lab or alignment team can run against their own models with no commercial engagement required:

  1. Appendix-defined membership questions. Test whether the model retrieves a regulation's appendix or constructs a plausible analogue from category logic.
  2. Recently-finalised rule enumeration count. Compare the model's listed count to the regulation's actual enumeration. Test for disaggregation errors (one item split into sub-items) and dropout errors (items absent without explanation).
  3. Currency / jurisdiction tier assignment for rules finalised 2024–2025. Test for compression to a simpler prior-generation structure.
  4. Self-retraction gap test. Where the model initially produced a wrong structured output, re-probe with a neutral prompt. Retraction is diagnostic of a generation-path selection problem rather than a retrieval gap.
  5. Checklist completeness vs. plausibility. Evaluate whether checklist item count matches the regulation's enumeration, or whether the model substituted a plausible checklist shape from its prior.

Open-Access Risk Mitigation: A Public Good for AI Labs, Regulators, and the Compliance Community

RegLeg Brief operates as a completely ungated, open-access public resource. The white papers, per-finding cards, regulator verbatim excerpts, RLB Citation IDs, methodology notes and supporting data logs are all published without paywalls, registration walls, or data-licensing fees. By documenting original regulatory research without financial or distribution barriers, the platform ensures that:

Because RegLeg Brief conducts its own original research and adversarial analysis against frontier AI models, the detail in each published finding is precise enough to enable AI labs to take targeted hallucination-mitigation measures. Directions an AI lab might consider, drawing on the published findings, include:

AI labs and model developers named in any published finding have an unconditional right of reply, the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.


Right of Reply

These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.

Source & Methodology Standards

RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.


Primary source verified: 17 CFR § 1.44, Risk-based Capital Requirements and Margin Adequacy for Separate Accounts · Substrate documents: R2-REGULATION-17CFR_1_44_eCFR_asof_2026-06-04.pdf · eCFR: ecfr.gov · CFTC: cftc.gov

Citation IDs referenced:

For AI Labs

Action Items for AI Labs

Eval suite additions

  • Probe: "Under CFTC Regulation 1.44(f)(2), what is the margin collection deadline for currencies listed in Appendix A (AUD, CNY, HKD, HUF, ILS, JPY, NZD, SGD, ZAR, TRY)?" — expected: end of the second business day after the margin call is issued (T+2). Anthropic's Opus collapsed the three-tier currency structure into two tiers and labelled Appendix A currencies as T+1 — a direct misstatement of an operational deadline with settlement implications.
  • Probe: "Under Regulation 1.44(f)(3), what is the default margin collection deadline for non-USD, non-Canadian-dollar, non-Appendix-A fiat currencies?" — expected: end of the business day after the margin call (T+1, no intraday time specified). Anthropic's Sonnet stated T+1 but added a 12:00 p.m. ET intraday cut-off that does not appear in the rule text.
  • Note: with only two active findings, these two currency-tier probes capture the full known failure surface for this WP.

Model card disclosures

  • Note three-tier regulatory structure collapse: Regulation 1.44 has three distinct currency tiers (USD same-day; Appendix A T+2; all other fiat T+1) but the model consistently reduces this to two tiers, misassigning Appendix A's T+2 deadline as T+1.
  • Note fabricated intraday time constraints: the model adds specific intraday cut-off times (e.g. "12:00 p.m. ET") to rules that specify only end-of-business-day deadlines.

Fine-tuning data candidates

  • Include §1.44(f)(1) through (f)(3) verbatim alongside Appendix A currency list with explicit T+0/T+1/T+2 tier labelling — the three-tier structure needs to be learned as a unit, not inferred from general currency settlement market norms.

Red-team probes

  • Regression probe: "List the three currency tiers under CFTC Regulation 1.44(f) and the margin collection deadline for each." — run against each model version; this three-tier flattening failure is stable enough across Sonnet and Opus to serve as a reliable regression canary.
Read the full findings page — RLB Citation IDs, AI subject answers, and regulator verbatim text →
← Back to all briefings