Anthropic probes the mystery of hallucination in Consumer Duty corporate banking compliance.
— RLB Specialist Panel
Hedge in Place of Verified FS25/2 Figure, Refusal to Confirm a Documented FS25/2 Count, Invented Dual-Event Timeline for a Single FS25/2 Withdrawal: Consumer Duty (PS22/9 + PRIN 2A) under audit.
Two frontier AI models tested by the RLB Specialist Panel produced 3 substantive failures on the Consumer Duty, with material implications for the work product of corporate-banking compliance teams.
Frontier AI models, asked questions of the kind corporate-banking compliance teams put to them on the Consumer Duty in real workflows, produce confident answers that drift from the regulator's actual position on Principle 12, PRIN 2A, and the FCA's Feedback Statement record. The failure classes seen are: Hedge in Place of Verified FS25/2 Figure, Refusal to Confirm a Documented FS25/2 Count, Invented Dual-Event Timeline for a Single FS25/2 Withdrawal.
Questions were prepared by the RLB Specialist Panel based on real practical AI usage in the workflows the respective audience uses AI for. Each question is paired with verbatim regulator-issued source text held as primary substrate, against which the AI subject answer is graded. Two frontier AI models were the subjects under test on this regulation. The panel binds each finding to the substrate excerpt it tests against; the binding is what makes each finding referenceable and audit-traceable.
Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.
Question put to the model: How many pre-Consumer Duty Dear CEO letters the FCA withdrew following the Duty's implementation, and through what formal mechanism they were removed.
What the model answered: The model declined to give a verified count and offered 'dozens across portfolios' as a hedge.
Regulator-issued position (verbatim): "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports."
Reading: FS25/2, the March 2025 Feedback Statement, sets out the figures the model hedged on. The avoidance produces a non-answer where the regulator has a verifiable, documented count, and the user is left without a referenceable position.
Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Sonnet46. Model under test: Claude Sonnet 4.6. Failure mode: inference drift.
Question put to the model: Whether the FCA has withdrawn pre-Consumer Duty Dear CEO letters following implementation, and if so how many were removed and under what document.
What the model answered: The model said it could not provide a verified specific number and that search results did not identify a definitive public list.
Regulator-issued position (verbatim): "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports. Harmonising definitions of 'retail customer' and 'SME' across Handbook."
Reading: FS25/2 is on the FCA website with the figures the model could not commit to. The refusal posture leaves the user without a position they can cite, even though the regulator has documented the answer in plain text.
Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.
Question put to the model: Which pre-Consumer Duty Dear CEO letters remain in force, and whether the FCA has withdrawn letters that previously set consumer protection expectations.
What the model answered: The model described an April 2025 retirement of around 90 Dear CEO/portfolio letters followed by an August 2025 retirement of 100 multi-firm and thematic reports.
Regulator-issued position (verbatim): "From FS25/2 (March 2025): FCA reviewed all Dear CEO letters pre-dating the 2022-25 strategy and withdrew 90+ such letters. These are no longer live supervisory expectations."
Reading: FS25/2 is the single March 2025 publication recording the withdrawal. Splitting it into an April and August dual-event narrative is fabrication; the dates and the second event are invented.
For corporate-banking compliance, the operational consequence is direct. The compliance monitoring plan, the SME-segment supervisory dialogue, and the Dear CEO letter expectation register all rest on accurate framing of scope boundaries and of recent FCA Feedback Statements such as FS25/2. A defect imported from AI work product surfaces on supervisory follow-up, and the function carries the regulatory exposure.
The failures recorded here are not stylistic. Each one would, if relied on, shift the firm's documented position on a specific Consumer Duty obligation: scope of application, foreseeable-harm safe harbour, fair-value methodology, or the current status of pre-Consumer Duty supervisory expectations. The work product of corporate-banking compliance teams sits between the firm and the regulator, and it has to track the rule as written.
On the question of pre-consumer duty dear ceo letters withdrawn after implementation: "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47).
On the question of pre-consumer duty dear ceo letters withdrawal count: "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports. Harmonising definitions of 'retail customer' and 'SME' across Handbook." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Sonnet46).
On the question of status of pre-consumer duty dear ceo letters: "From FS25/2 (March 2025): FCA reviewed all Dear CEO letters pre-dating the 2022-25 strategy and withdrew 90+ such letters. These are no longer live supervisory expectations." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47).
Frontier AI models are useful drafting partners for corporate-banking compliance teams, but they are not a substitute for the rule text. The failure patterns recorded on the Consumer Duty cluster around three lenses. First, scope drift, in which the model misstates what the rule covers, illustrated here by the reversed group-insurance exclusion under PRIN 2A and the silent omission of FSMA 2023 from the statutory architecture answer.
Second, methodology drift, in which the model elevates guidance (FG22/5) to rule status (PRIN 2A) or imports a stricter expectation than the regulator sets, illustrated by the non-monetary quantification framing the FCA expressly disavowed. Third, evidence-avoidance, in which the model refuses to commit on a question that the regulator has answered in plain text in a documented Feedback Statement, illustrated here by the FS25/2 Dear CEO letter retirement count.
For corporate-banking compliance teams, the practical reading is: AI output on the Consumer Duty needs to be checked against verbatim substrate (PRIN 2A, PS22/9, FG22/5, FS25/2) before it lands in a work product the firm or the regulator will rely on. The model's confidence is not a reliable signal of accuracy on this regulation, because the failures recorded are confident-wrong, not hesitant-wrong.
The Consumer Duty is a regime the FCA grades in writing through Feedback Statements, supervisory correspondence, and thematic-review outputs, and the rule text plus the FCA's documented Feedback record together are what an audit-traceable AI use of the regulation has to sit on.
The RLB Specialist Panel runs structured audits of frontier AI models against high-stakes regulator-issued texts. Each finding on the Consumer Duty is bound to the regulator-issued source it tests against, recorded with the AI subject's answer in full, and held with the verbatim regulator-issued passage that grades the answer. The Panel offers AI labs a partnership channel: regulator-substrate-bound finding sets, audience-tagged per profession and per sector-department, that feed into model evaluation pipelines and post-training reinforcement signals.
Firms with a stake in the work product (legal, compliance, risk, product, board secretariat) get a referenceable finding library on the regulations they live with. Every finding on this regulation is held with the question, the AI subject's full answer, the regulator-issued passage that grades the answer, and the model identity, so the record can be audited end-to-end and the binding can be re-verified at any time.
Corporate-banking compliance teams should, on every AI-drafted Consumer Duty work product, take the following discipline:
These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.
RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.
Primary source verified: FCA PS22/9 + PRIN 2A + FG22/5 · Substrate documents: p_15_OTHER_PART_CIRCULAR___Dear_CEO_letters_withdra_page.html, p_21_ACT_FS25_2__March_2025____Rules_and_Dear_CEO_137A.html · FCA portal: fca.org.uk
Citation IDs referenced:
RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Sonnet46RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47