AI Model Hallucinations on 17 CFR Part 23 — Swap Dealer Business Conduct and Documentation Requirements: A Four-Configuration Study
Executive Summary
The RegLeg Specialist Panel tested four configurations of Anthropic's Claude models against 17 CFR Part 23 — the CFTC's core framework governing swap dealer business conduct and documentation requirements. Twenty questions targeting structural, definitional, and procedural precision were put to each configuration: Claude Opus 4.7 with WebSearch disabled, Claude Opus 4.7 with WebSearch enabled, Claude Sonnet 4.6 with WebSearch disabled, and Claude Sonnet 4.6 with WebSearch enabled. Across the four configurations, the Hallucination Specialist confirmed 28 hallucinations from 80 analysed responses. The dominant error mechanism is Inference Drift — the generation of structurally plausible but factually incorrect regulatory content, most visibly in repeated confident assertions of subpart letters C, D, and G that have never appeared in Part 23's structure at any point since promulgation. The most significant comparative finding is that WebSearch enabling produced opposite effects across the two model families: for Sonnet 4.6 it reduced confirmed hallucinations by 80 percent (from 10 to 2), while for Opus 4.7 it increased the confirmed count by 67 percent (from 6 to 10), driven by a distinct pattern of Federal Register document-retrieval confusion in which the model substituted a different CFTC filing's identifiers for those of the document under study.
1. Regulation Under Study
Regulation: Revisions to Business Conduct and Swap Documentation Requirements for Swap Dealers and Major Swap Participants, 17 CFR Part 23
Regulator: Commodity Futures Trading Commission (CFTC), United States
17 CFR Part 23 is the CFTC's principal framework imposing business conduct standards and documentation requirements on registered swap dealers (SDs) and major swap participants (MSPs). Enacted under the authority of the Dodd-Frank Wall Street Reform and Consumer Protection Act (H.R. 4173, 111th Congress (2009–2010)), Part 23 covers the full lifecycle of SD/MSP regulatory obligations — from registration and capital and margin requirements, through reporting, recordkeeping and daily trading records, swap documentation, and business conduct standards toward counterparties and special entities, to broader duties of swap dealers and the segregation of assets held as collateral in uncleared swap transactions. A 2025 proposed rulemaking (FR Doc. 2025-18924, 90 FR 47136, published September 30, 2025, with a public comment deadline of October 24, 2025) introduced proposed revisions that are the subject of several questions in the test set.
Part 23's subpart architecture is non-sequential by design. The subpart letters in use are A, B, E, F, H, I, J, K, and L. Subparts A and K carry a formal [Reserved] designation — they appear in the table of contents as acknowledged placeholder positions with no active sections. The letters C, D, and G are entirely absent from Part 23's structure; they do not appear in the table of contents and have never appeared at any point since the regulation was promulgated. This non-sequential architecture — and the meaningful distinction between a formally [Reserved] subpart and a subpart letter that is simply absent — proved to be the most consequential source of error across the four configurations tested.
Substrate composition: The Archival Analyst assembled primary documents from official government sources across three categories. The regulatory text category covers the current codified text of 17 CFR Part 23, including an operator-verified table of contents sourced from the eCFR and confirmed by the Archival Specialist on 2026-05-22, as well as archival captures of individual subpart pages. The Federal Register category covers the 2025 proposed rulemaking document (FR Doc. 2025-18924, 90 FR 47136, pages 47136–47168) and related CFTC press releases. The statutory basis category covers the relevant Dodd-Frank Act provisions from which the CFTC derives its authority to regulate swap dealers and major swap participants. The Archival Specialist performed a manual download pass on certain long-form regulatory documents where standard portal access returned incomplete content, ensuring substrate completeness before testing commenced.
2. Methodology
The Specialist Panel designed a 20-question test set targeting the factual precision required of compliance professionals working with 17 CFR Part 23. Questions covered: Part 23's subpart structure and the section ranges of each subpart; the regulatory home for specific topic areas including capital and margin requirements, reporting and recordkeeping, business conduct standards, segregation of collateral assets, and duties of swap dealers; specific provisions within Subpart H governing counterparty protection and special-entity interactions; the bibliographic identifiers of the 2025 proposed rulemaking (FR document number, Federal Register volume and page citation, Regulation Identifier Number, publication date, and document type); and the statutory basis provisions in the Dodd-Frank Act.
Each question was put identically to four configurations:
| Configuration ID | Model | WebSearch |
|---|---|---|
| opus-47-default | Claude Opus 4.7 | Disabled |
| opus-47-websearch | Claude Opus 4.7 | Enabled |
| sonnet-46-default | Claude Sonnet 4.6 | Disabled |
| sonnet-46-websearch | Claude Sonnet 4.6 | Enabled |
The Hallucination Analyst compared each response against the substrate documents and classified each response as Accurate, Partially Accurate, Hallucinated, or Refused. Every response classified as Hallucinated, and every Partially Accurate response where the Analyst identified a candidate error, was escalated to the Hallucination Specialist for source-by-source confirmation. The Source-Verification Analyst independently checked every URL cited by the AI models, comparing the content at the cited address against both the substrate and the AI's accompanying claim — identifying where a cited source failed to support or actively contradicted what the AI represented it as saying.
The Specialist Panel maintains internal quality controls and continuously refines analyst performance; those processes are not reported here.
3. Per-Configuration Failure-Mode Signature
| Configuration | Accurate | Partially Accurate | Hallucinated | Refused | Confirmed Hallucinations |
|---|---|---|---|---|---|
| opus-47-default | 8 | 8 | 1 | 3 | 6 |
| opus-47-websearch | 13 | 5 | 2 | 0 | 10 |
| sonnet-46-default | 4 | 11 | 3 | 2 | 10 |
| sonnet-46-websearch | 16 | 1 | 1 | 0 | 2 |
opus-47-default — 6 Confirmed Hallucinations
This configuration's errors cluster around structural misidentification within Part 23's subpart architecture. Two recurring themes appear: incorrect section-range boundary claims, and a persistent Subpart K / Subpart L misattribution.
On section-range boundaries, the stated upper bound of Subpart E (Capital and Margin Requirements) was placed at approximately §23.161 when the substrate-confirmed upper bound is §23.199 — a deviation of 38 sections. In the same question, the model explicitly scoped its own uncertainty at "off by one or two," but the actual discrepancy is an order of magnitude beyond that stated margin. The stated upper bound of Subpart J (Duties of Swap Dealers and MSPs) was placed at §23.609, leaving §§23.610 and 23.611 entirely unacknowledged, when the substrate-confirmed upper bound is §23.611.
On the Subpart K / Subpart L misattribution: in two separate questions (q6 and q10), this configuration attributed the segregation of assets provisions (§§23.700–23.704) to Subpart K. The substrate records Subpart K as [Reserved] — it contains no sections of any kind — and places those provisions unambiguously in Subpart L. The same structural mismap appearing independently across two differently-framed questions indicates a consistent internal error rather than a surface-level response artefact.
Two further errors complete this configuration's confirmed set. First, the model described Subpart A as carrying the title "General Provisions" when the substrate records Subpart A as [Reserved] with every section from §23.1 through §23.20 individually reserved, and sets §23.1 as Part 23's first section number — not §23.20 as the model stated. Second, the model reproduced the official title of §23.22 in explicit quotation marks but rendered it as a neutral subject-matter descriptor, dropping the operative legal phrase "Prohibition against statutory disqualification in the case of an associated person of a swap dealer or major swap participant" — the entire legal concept the section encodes.
opus-47-websearch — 10 Confirmed Hallucinations
Enabling WebSearch increased, not reduced, the confirmed error count for this model. The additional errors originate from a distinct failure mode that does not appear in the offline configuration: the model retrieved and cited a different Federal Register document from the one under study, then built a cluster of derived factual errors on that wrong source document.
Across multiple questions about the 2025 rulemaking, the model provided identifiers for FR Doc. 2025-23953 (cited at 90 FR 61226) rather than the substrate document FR Doc. 2025-18924 (90 FR 47136). This single document-retrieval error produced four compounding downstream errors. The stated publication date — December 30, 2025 — is three months after the actual September 30, 2025 date confirmed by the substrate. The stated Federal Register page citation (90 FR 61226) differs from the correct citation (90 FR 47136) by approximately 14,000 pages within the same volume. The stated Regulation Identifier Number (3038-AF33) differs from the correct RIN (3038-AF38) in the final two digits. And the document was characterised as a final rule with an effective date of January 29, 2026 and an associated correction notice, when the substrate document is a Notice of Proposed Rulemaking with a comment deadline of October 24, 2025 and no effective date.
The Source-Verification Analyst confirmed Citation Errors on multiple URLs cited in this configuration's rulemaking responses: the cited addresses resolve to a different Federal Register filing, not the document the model represented them as supporting. A practitioner who received these responses and did not independently cross-check the FR document number and page citation against the actual rulemaking text would be working from incorrect regulatory metadata.
Additionally, this configuration repeated a subpart-letter error seen across other configurations: it described Subparts C, D, and G as formally [Reserved] placeholders, when the substrate records those letters as entirely absent from Part 23's structure — never used since promulgation, and not present even as placeholder entries in the table of contents.
sonnet-46-default — 10 Confirmed Hallucinations
This configuration produced the highest density of structural hallucinations. The dominant pattern is the confident assertion — sometimes hedged, but with the affirmative claim preceding or coexisting with the hedge — of subpart letters and associated topic coverage for subparts that do not exist.
Across multiple questions, the model described Subpart C as imposing core operational duties on SDs and MSPs including CCO designation, risk management programs, business continuity plans, and supervisory requirements. It described Subpart D as covering capital or margin requirements for non-bank swap dealers, currently in force. It described Subpart G as covering external business conduct standards for swap dealers and MSPs, with a section range of approximately §§23.400–23.451. All three of these subparts are absent from Part 23's structure; the substrate records each letter as simply not used.
The section-range errors in this configuration compound the structural errors. The model attributed §§23.200–23.207 to the general duties of SDs and MSPs. That range belongs to Subpart F in the actual regulation — Reporting, Recordkeeping, and Daily Trading Records Requirements — a topically unrelated subpart. The actual general duties provisions sit in Subpart J at §§23.600–23.611, roughly 400 section-numbers away. The model also suggested that capital and margin requirements may occupy two separate but adjacent subparts, when the substrate records a single combined Subpart E spanning §§23.100–23.199.
Many responses in this configuration contained hedging language — phrases such as "I believe" or "I am not fully confident" — that accompanied non-existent subpart claims without retracting the core assertion. The Hallucination Specialist confirmed errors where an affirmative factual claim was made in the opening sentence and the hedge was narrower than the actual error magnitude or applied only to the letter label rather than to the subpart's existence. A further confirmed error in this configuration is the assertion that a third-party compliance guide's reference to Subpart D was credible given Part 23's multi-wave rulemaking history — a claim directly contradicted by the substrate, which records that Subpart D has never existed at any point in Part 23's history regardless of rulemaking sequence.
sonnet-46-websearch — 2 Confirmed Hallucinations
This configuration produced the lowest confirmed error count. The two confirmed errors both involve section-level topic transpositions within Subpart H (Business Conduct Standards for Swap Dealers and Major Swap Participants Dealing With Counterparties, Including Special Entities).
First, §23.432 was described as covering recommendations and suitability. The substrate titles §23.432 as "Clearing disclosures" — a provision requiring swap entities to notify counterparties of their right to select the derivatives clearing organisation for mandatory-cleared swaps and of their right to elect clearing for non-mandatory swaps. Recommendations and institutional suitability are the subject of §23.434, two sections later.
Second, §23.440 was described as governing business conduct requirements when a swap entity acts as counterparty to a Special Entity. The substrate titles §23.440 as "Requirements for swap dealers acting as advisors to Special Entities" — triggered when a swap dealer makes tailored recommendations carrying a best-interests duty — while the counterparty-facing requirements, including the qualified independent representative obligation, sit in §23.450.
The large-scale structural hallucinations that dominated the offline Sonnet 4.6 configuration are largely absent here. Two responses in this configuration remain under Hallucination Specialist review at the time of publication.
4. Error-Cause Distribution
The Hallucination Specialist assigned root-cause classifications to 25 of the 28 confirmed hallucinations. Three remain unclassified. The mechanisms identified are described below.
Inference Drift — 17 confirmed hallucinations
Inference Drift is the dominant mechanism across all four configurations. The AI produces claims about Part 23's regulatory structure that are internally coherent — the model correctly understands that Part 23 has multiple subparts, that those subparts cover distinct regulatory topics, and that the topics include capital requirements, reporting, business conduct, and duties of swap dealers — but its specific structural map departs materially from the codified text. The failure takes several forms: asserting that subpart letters exist and contain active provisions when those letters are entirely absent from the regulation; attributing real section numbers to the wrong subpart and topic; and conflating the [Reserved] designation (a formal placeholder in the CFR) with the outright absence of a subpart letter.
The [Reserved] vs. absent distinction warrants emphasis because it is not merely taxonomic. A [Reserved] designation in the CFR implies the agency deliberately carved out that letter position as a placeholder for potential future rulemaking — the position is acknowledged. An absent letter implies no such intent — the position simply does not appear. When models describe absent letters as [Reserved], they generate a materially misleading description of regulatory structure: they imply a regulatory commitment or placeholder that does not exist. This distinction appeared in confirmed Inference Drift errors across three of the four configurations tested.
Inference Drift errors carry particular compliance risk because the outputs read as authoritative regulatory description. A compliance officer conducting a gap analysis or building a regulatory mapping from these responses without independent source verification would produce an incorrect structural account of Part 23 — one that references subparts that do not exist, misassigns existing provisions to incorrect locations, or presents superseded section ranges as current.
Version Confusion — 3 confirmed hallucinations
All three Version Confusion hallucinations occur in the opus-47-websearch configuration and originate from a single document-retrieval event. The model retrieved and presented identifiers for a different CFTC Federal Register filing — a real document with its own document number, page range, publication date, and regulatory status — in place of the substrate's 2025 proposed rulemaking document. The downstream errors cascade from this substitution: wrong publication date, wrong FR page citation, wrong RIN, and an incorrect document-type characterisation (final rule with an effective date vs. proposed rule with a comment deadline). These errors are consequential for regulatory research: Federal Register citation identifiers are the primary mechanism by which practitioners locate specific rulemakings, and incorrect identifiers make the correct document unfindable through standard research tools.
Incomplete — 3 confirmed hallucinations
Incomplete errors occur where the model's response is directionally proximate but materially truncated. The most significant instance is the Subpart E upper-bound understatement in opus-47-default: the model stated approximately §23.161 with an explicit uncertainty hedge of "off by one or two," but the substrate-confirmed upper bound is §23.199 — 38 sections beyond the stated figure and an order of magnitude beyond the model's own stated uncertainty margin. The Subpart J upper-bound understatement (§23.609 stated vs. §23.611 correct) left two sections unacknowledged. A section-title truncation completes this category: the model reproduced the title of §23.22 in quotation marks but omitted the leading operative phrase "Prohibition against statutory disqualification," which is the entire legal concept the section encodes, substituting a neutral subject-matter descriptor in its place.
Training Data Gap — 1 confirmed hallucination
The single Training Data Gap finding is the RIN transposition in opus-47-websearch: the model stated RIN 3038-AF33 against the substrate-confirmed RIN 3038-AF38. The final two digits differ (33 vs. 38), consistent with the model having no reliable training representation of this specific regulatory identifier and generating a plausible-looking but numerically incorrect value. The Source-Verification Analyst confirmed that neither cited source supplies the AF33 variant.
Outdated — 1 confirmed hallucination
The Subpart J upper-bound error in opus-47-default (§23.609 stated, §23.611 correct) is classified as Outdated: the stated range is consistent with a prior codified state of Part 23 before §§23.610 and 23.611 were added. This is independent of but coexists with the Incomplete classification: the range is both truncated and reflective of a superseded state of the regulation.
Unclassified — 3 confirmed hallucinations
Three confirmed hallucinations did not receive a root-cause classification from the Hallucination Specialist. These are: the Subpart K / Subpart L misattribution for §23.704 in opus-47-default (q6, distinct from the INFERENCE_DRIFT-classified q10 instance of the same error type); the meta-claim in sonnet-46-default that a third-party compliance guide's reference to Subpart D was credible given Part 23's rulemaking history; and the section-range misattribution placing general SD and MSP duties at §§23.200–23.207 in sonnet-46-default (q7).
5. WebSearch Impact
WebSearch enablement produced diametrically opposite effects across the two model families. This asymmetry is itself a material finding for model evaluation design.
Sonnet 4.6: Confirmed hallucinations fell from 10 (offline) to 2 (with WebSearch), an 80 percent reduction. The structural hallucinations that dominated the offline configuration — non-existent subpart assertions, section-range misattributions across subparts, the claim that capital and margin requirements occupy separate subparts — were largely corrected when the model had access to live regulatory sources. The two remaining confirmed errors involve section-level topic transpositions within Subpart H, a more granular error type that persisted despite retrieval capability. The refused-response count also dropped from 2 to 0, with responses moving into the Accurate classification.
Opus 4.7: Confirmed hallucinations increased from 6 (offline) to 10 (with WebSearch), a 67 percent increase. The increase is attributable to a failure mode unique to the WebSearch-enabled configuration: the model retrieved information corresponding to a different Federal Register filing and constructed a cluster of errors on that wrong source document. The structural errors present in the offline configuration — the Subpart K / Subpart L misattribution, the subpart-letter confusion between [Reserved] and absent — persist in the WebSearch-enabled configuration alongside the new document-retrieval errors; WebSearch neither corrected those structural errors nor introduced mechanisms that would catch them before response generation.
The Source-Verification Analyst's findings on the opus-47-websearch Citation Errors are directly relevant here: the URLs cited in rulemaking-identifier responses resolve to a different document from the one under study. This means the model not only retrieved the wrong document but provided citations that give users the appearance of authoritative sourcing for claims that are, in fact, about a separate CFTC rulemaking. A practitioner who did not independently verify the FR document number and page citation would have no signal from the response that they had been given metadata for the wrong filing.
For the specific error categories examined in this study, WebSearch meaningfully helped with high-level structural hallucinations — non-existent subpart letters — in the Sonnet 4.6 family, but did not resolve section-topic transpositions or section-range boundary errors in either family. WebSearch introduced document-retrieval confusion as a new error category in the Opus 4.7 family.
6. Selected Error Cases
Case 1 — Subpart K / Subpart L Structural Misattribution
Configuration: opus-47-default | Severity: HIGH | Error mechanism: Inference Drift
Question asked: Which subpart of 17 CFR Part 23 governs the segregation of assets held as collateral in uncleared swap transactions?
AI's claim: Subpart K of 17 CFR Part 23 governs segregation of assets held as collateral in uncleared swap transactions, with the model citing an eCFR URL corresponding to Subpart K.
Substrate finding: The Archival Specialist's operator-verified table of contents for 17 CFR Part 23 records: "Subpart K [Reserved]" — it contains no sections of any kind. The segregation requirements for uncleared swap collateral are located in "Subpart L — Segregation of Assets Held as Collateral in Uncleared Swap Transactions, §§23.700–23.704."
Source-Verification Analyst finding: The cited eCFR URL leads to a page for Subpart K that, consistent with the substrate, displays a [Reserved] placeholder with no regulatory text. The citation is a Citation Error: the source cited does not support the claim it accompanies and in fact contradicts it — a user following the link would find no segregation provisions there.
Significance: The identical Subpart K / Subpart L misattribution appeared independently in a separate question (q6) within the same configuration, where the model attributed §23.704 to Subpart K while correctly identifying §23.704 as the last operative section of Part 23. The same error appearing across two differently-framed questions confirms a persistent structural mismap rather than a surface-level response artefact.
Case 2 — Federal Register Document-Retrieval Confusion
Configuration: opus-47-websearch | Severity: CRITICAL | Error mechanism: Version Confusion
Question asked: What document underlies the 2025 proposed revisions to Part 23, and what are its key regulatory identifiers?
AI's claim: The governing document is FR Doc. 2025-23953, cited at 90 FR 61226, and it is a final rule.
Substrate finding: The substrate's 2025 rulemaking document is FR Doc. 2025-18924, published at 90 FR 47136 (pages 47136–47168) on September 30, 2025. Its document type is explicitly Proposed Rule — a Notice of Proposed Rulemaking — with a public comment deadline of October 24, 2025 and no effective date. No document bearing FR Doc. 2025-23953 or the citation 90 FR 61226 appears anywhere in the substrate; both identifiers correspond to a different, distinct CFTC filing.
Source-Verification Analyst finding: The Source-Verification Analyst examined three URLs cited by the model. On the primary cited Federal Register URL, a Citation Error was confirmed: the URL encodes a different filing date from the actual document and resolves to a different document number (2025-23953 vs. 2025-18924) and a different page range (90 FR 61226 vs. 90 FR 47136). A CFTC press release URL cited by the model resolved correctly but covered a distinct subsequent document, not the proposed rulemaking under study.
Significance: Both specific citation identifiers — FR document number and FR volume-and-page — are wrong and correspond to a different real CFTC filing. The document-type mischaracterisation compounds the error materially: a proposed rule and a final rule have entirely different legal effects and different research pathways. A compliance professional relying on this response to locate the governing document would be given unfindable or misdirecting identifiers, and would misunderstand the document's legal status. This is the only CRITICAL-severity finding in this study.
Case 3 — Non-Existent Subpart Asserted with Attributed Section Range
Configuration: sonnet-46-default | Severity: HIGH | Error mechanism: Inference Drift
Question asked: What does Subpart C of 17 CFR Part 23 cover?
AI's claim: Subpart C of 17 CFR Part 23 imposes core operational duties on SDs and MSPs, including CCO designation, risk management programs, business continuity plans, and supervisory requirements, housed in approximately §§23.200–23.207.
Substrate finding: The substrate records: "Subpart C — DOES NOT EXIST in 17 CFR Part 23. The Subpart letter C is not used." Part 23 moves from Subpart B (Registration, §§23.21–23.40) directly to Subpart E (Capital and Margin Requirements, §§23.100–23.199), skipping both C and D. The operational duties described by the model — CCO designation, risk management, business continuity — are located in Subpart J (Duties of Swap Dealers and Major Swap Participants, §§23.600–23.611). The section range §§23.200–23.207 cited by the model belongs to Subpart F (Reporting, Recordkeeping, and Daily Trading Records Requirements, §§23.200–23.206) — a topically unrelated subpart.
Source-Verification Analyst finding: The cited eCFR URL does not contain a Subpart C. The citation is a Citation Error: the source does not support the claim and, by returning no Subpart C content, actively contradicts it.
Significance: The error is triply compounding: (1) the asserted subpart has never existed; (2) the section range cited maps to a different subpart covering a completely different topic; (3) the actual location of the attributed duties is in a subpart approximately 400 section-numbers removed from the cited range. A compliance mapping built from this response would contain three simultaneous structural errors.
Case 4 — Section-Range Understatement with Self-Bounded Hedge
Configuration: opus-47-default | Severity: HIGH | Error mechanism: Incomplete
Question asked: What is the full section range covered by Subpart E of 17 CFR Part 23?
AI's claim: Subpart E's section range runs up to approximately §23.161. The model qualified this by noting its answer might be "off by one or two."
Substrate finding: The operator-verified table of contents states: "Subpart E — Capital and Margin Requirements for Swap Dealers and Major Swap Participants, §§23.100–23.199." The correct upper bound is §23.199 — 38 sections beyond the model's stated figure.
Source-Verification Analyst finding: The cited eCFR URL, if reached by a user, would display Subpart E's content with a final section entry well beyond §23.161, directly contradicting the model's stated upper bound. The citation is a Citation Error.
Significance: The model's own uncertainty statement — "off by one or two" — is calibrated at a magnitude roughly 20 times smaller than the actual error. A compliance professional auditing whether a firm's policies and procedures address all Subpart E capital and margin sections would, relying on this response, leave 38 sections unexamined. The explicit hedge, far from mitigating the harm, creates a false sense of precision around an incorrect bound.
Case 5 — Section-Topic Transposition within Subpart H
Configuration: sonnet-46-websearch | Severity: HIGH | Error mechanism: Inference Drift
Question asked: What does §23.432 of 17 CFR Part 23 cover?
AI's claim: Section 23.432 covers recommendations and suitability.
Substrate finding: §23.432 is titled "Clearing disclosures" and requires swap entities to notify counterparties of their right to select the derivatives clearing organisation for mandatory-cleared swaps and of their right to elect clearing for non-mandatory swaps. Recommendations and institutional suitability are the subject of §23.434 — titled "Recommendations to counterparties — institutional suitability" — two sections later in Subpart H.
Source-Verification Analyst finding: Both the substrate archival copy and a live legal database source verified on 2026-05-22 title §23.432 as "Clearing disclosures." Neither source provides any support for a suitability or recommendations characterisation of that section number. The citation is a Citation Error.
Significance: §23.432's clearing-disclosure obligation and §23.434's suitability obligation impose distinct, non-interchangeable duties on swap dealers: the former is a transactional notification requirement about clearing organisation selection; the latter imposes substantive investment-suitability responsibilities. A compliance checklist that maps §23.432 to suitability would cause the clearing-disclosure obligation to go unidentified and would misallocate compliance resources toward a provision two sections removed from the one under review. This error appeared in the lowest-error-count configuration (sonnet-46-websearch), confirming that section-level topic transpositions within a subpart represent a residual failure mode that persists even when higher-level structural hallucinations are suppressed by retrieval capability.
7. Implications for AI Model Developers
The findings are provided as concrete, source-traceable input for Anthropic's training, evaluation, and red-team workflows. The Specialist Panel identifies the following patterns as actionable.
Subpart letter gaps in non-sequential CFR structures
The most frequent error type across this study — confident description of subpart letters C, D, and G as existing regulatory subparts within Part 23 — suggests that training data for this regulation, or for its regulatory subject-matter neighbourhood, contains material that references those subpart letters. Candidate sources include pre-promulgation draft regulatory text, third-party compliance guides authored before the final CFR structure was fixed, or secondary commentary that analogised to other CFTC regulations with more complete subpart sequences. A targeted evaluation covering subpart-structure questions for CFR parts with non-sequential letter schemes would identify whether this pattern extends to other regulations. Questions of the form "Does 17 CFR Part X have a Subpart [letter]? If so, what does it cover?" — where the correct answer is that the letter is absent — are effective at surfacing this failure mode. The test set used in this study can serve as a template for analogous evaluations across other non-sequential regulatory schemas.
Document-retrieval specificity in Federal Register searches
The opus-47-websearch configuration retrieved a real Federal Register document but the wrong one — a different CFTC filing in the same regulatory volume year. The model appears to have matched on regulatory subject matter (CFTC, swaps, business conduct) without verifying that the retrieved document's number and page citation correspond to the question's specified document. A red-team category targeting wrong-but-plausible document retrieval — where the correct answer requires distinguishing between two real documents from the same agency in the same volume — would expose the scope of this failure mode. Training or fine-tuning signal that reinforces post-retrieval identifier verification (cross-checking FR document number, page range, and document type against the question's specified document) may reduce this class of error.
Calibration of uncertainty magnitude in regulatory content
Multiple responses in this study contained uncertainty hedges that were materially narrower than the actual error. The most concrete example is the Subpart E upper-bound response in opus-47-default, which explicitly bounded the model's uncertainty at "off by one or two" for what turned out to be a 38-section discrepancy. Calibration work specifically targeting the relationship between stated uncertainty ranges and actual error distributions on regulatory content — particularly for section-range claims and section-count assertions — is worth prioritising. The goal is not to remove hedges but to ensure that when a model expresses a quantified uncertainty margin, that margin reflects the model's actual epistemic state.
The [Reserved] / absent-letter distinction in CFR terminology
Models consistently collapsed the distinction between a formally [Reserved] subpart and a subpart letter that does not appear at all in a regulation. Both conditions produce gaps in a regulation's subpart sequence, but they carry different regulatory implications: a [Reserved] designation means the agency deliberately acknowledged that letter position as a placeholder for future rulemaking; an absent letter means the agency never used it. Targeted evaluation items that specifically probe whether a model can correctly distinguish these two structural states — ideally using regulations that contain both, as Part 23 does — would identify the breadth of this gap and inform whether targeted training signal would address it.
WebSearch as a risk-modifier requiring document-selection verification
The opposite WebSearch effects observed for the two model families merit investigation in evaluation design. For the Opus 4.7 family, WebSearch introduced a failure mode — document-retrieval confusion — that does not appear in the offline configuration. This suggests that retrieval-augmented accuracy on regulatory content is sensitive not only to the quality of retrieved content but to whether the model verifies that a retrieved document is the correct one before incorporating its content. Evaluation protocols that separately score the model's ability to (a) retrieve a live source and (b) confirm that the retrieved source matches the document specified in the question would isolate this failure mode from general retrieval quality.
Partnership engagements
The RegLeg Specialist Panel welcomes direct engagement with Anthropic's training and evaluation teams seeking to use this research in their development workflows. Available modalities include: access to the question banks used in this study and in ongoing research across other CFTC and non-CFTC regulations; pre-publication review access for a specific regulation, allowing Anthropic's team to investigate identified error patterns before findings go live; and bespoke research engagements targeting specific regulations or regulatory jurisdictions that Anthropic wishes to prioritise for model improvement. The Specialist Panel is also open to discussing structured data-sharing arrangements that would let Anthropic's researchers explore the full response-and-substrate corpus underlying these findings.
8. Limitations and Right of Reply
This study covers a single regulation (17 CFR Part 23) across 20 questions and four AI configurations. The findings should not be generalised to the models' behaviour across the broader regulatory corpus, though the error-type taxonomy — and in particular the Inference Drift pattern around non-sequential subpart structures — may be informative for adjacent regulatory subject matter.
The substrate used is version 3, assembled by the Archival Analyst and verified by the Archival Specialist against official government portals on 2026-05-22. Regulatory text is subject to amendment; all findings should be read in the context of the substrate edition under which they were produced. Future research rounds using a later substrate edition will be versioned accordingly.
Two responses in the opus-47-websearch configuration and two responses in each of the sonnet-46-default and sonnet-46-websearch configurations remain under Hallucination Specialist review at the time of publication. The confirmed hallucination counts in this paper reflect only fully confirmed findings; the needs-review set may affect final tallies.
Any AI vendor that believes a specific finding in this paper reflects a misclassification may submit a formal challenge via the RegLeg methodology page. Challenges that demonstrate an error in the substrate — for example, a primary document that was misread or a section that has since been amended — will result in a Rejected False Positive reclassification and a published correction notice appended to this paper. RegLeg's goal is an accurate public record, and the right-of-reply process is a structural part of the methodology, not a courtesy.
About RegLeg Brief
RegLeg Brief is an independent research organisation whose Specialist Panel conducts hallucination research on regulatory subject matter, producing source-traceable findings that AI labs can use to make their models more reliable on legal and regulatory content. Research is grounded in primary regulatory documents collected by the Archival Analyst and Archival Specialist from official government portals, with each finding reviewed by a Hallucination Analyst and confirmed by a Hallucination Specialist before publication. RegLeg's research is offered in a partnership spirit: the objective is to give AI labs concrete, correctable findings they can act on in their training and evaluation workflows, and every published finding carries a right-of-reply pathway for vendors who wish to challenge the evidence.