Claude Code catches the dim corners of confabulation in Consumer Duty consumer support obligations.
— RLB Specialist Panel
Inference Drift on the Foreseeable-Harm Safe Harbour, Hedge in Place of Verified FS25/2 Figure: Consumer Duty (PS22/9 + PRIN 2A) under audit.
Two frontier AI models tested by the RLB Specialist Panel produced 2 substantive failures on the Consumer Duty, with material implications for the work product of stockbrokers and trading representatives.
Frontier AI models, asked questions of the kind stockbrokers and trading representatives put to them on the Consumer Duty in real workflows, produce confident answers that drift from the regulator's actual position on Principle 12, PRIN 2A, and the FCA's Feedback Statement record. The failure classes seen are: Inference Drift on the Foreseeable-Harm Safe Harbour, Hedge in Place of Verified FS25/2 Figure.
Questions were prepared by the RLB Specialist Panel based on real practical AI usage in the workflows the respective audience uses AI for. Each question is paired with verbatim regulator-issued source text held as primary substrate, against which the AI subject answer is graded. Two frontier AI models were the subjects under test on this regulation. The panel binds each finding to the substrate excerpt it tests against; the binding is what makes each finding referenceable and audit-traceable.
Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.
Question put to the model: Whether the Consumer Duty requires firms to prevent all foreseeable harm, and what the effect is of a retail customer knowingly accepting a risk.
What the model answered: The model constructed a multi-part composite test (good faith plus supported understanding plus avoidance of self-caused foreseeable harm plus general Duty compliance) before the customer-acceptance safe harbour applies.
Regulator-issued position (verbatim): "Where a firm reasonably believes a retail customer understands and accepts such risks, it will not breach the rule if it fails to prevent them."
Reading: The PRIN 2A.2 standard turns on a single reasonable-belief test, not the composite multi-factor check the model invented. The drift inflates the firm's compliance burden and would not survive cross-examination on the actual rule text.
Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.
Question put to the model: How many pre-Consumer Duty Dear CEO letters the FCA withdrew following the Duty's implementation, and through what formal mechanism they were removed.
What the model answered: The model declined to give a verified count and offered 'dozens across portfolios' as a hedge.
Regulator-issued position (verbatim): "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports."
Reading: FS25/2, the March 2025 Feedback Statement, sets out the figures the model hedged on. The avoidance produces a non-answer where the regulator has a verifiable, documented count, and the user is left without a referenceable position.
For stockbrokers, the operational consequence is direct. A retail-client suitability narrative or desk-supervisor briefing built on the AI's framing imports a defect into the file. A complaint to the Financial Ombudsman Service, a thematic review of the desk, or a SUP 16 attestation pull will surface the gap, and the desk carries the regulatory exposure.
The failures recorded here are not stylistic. Each one would, if relied on, shift the firm's documented position on a specific Consumer Duty obligation: scope of application, foreseeable-harm safe harbour, fair-value methodology, or the current status of pre-Consumer Duty supervisory expectations. The work product of stockbrokers and trading representatives sits between the firm and the regulator, and it has to track the rule as written.
On the question of foreseeable-harm rule and customer acceptance of risk: "Where a firm reasonably believes a retail customer understands and accepts such risks, it will not breach the rule if it fails to prevent them." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47).
On the question of pre-consumer duty dear ceo letters withdrawn after implementation: "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47).
Frontier AI models are useful drafting partners for stockbrokers and trading representatives, but they are not a substitute for the rule text. The failure patterns recorded on the Consumer Duty cluster around three lenses. First, scope drift, in which the model misstates what the rule covers, illustrated here by the reversed group-insurance exclusion under PRIN 2A and the silent omission of FSMA 2023 from the statutory architecture answer.
Second, methodology drift, in which the model elevates guidance (FG22/5) to rule status (PRIN 2A) or imports a stricter expectation than the regulator sets, illustrated by the non-monetary quantification framing the FCA expressly disavowed. Third, evidence-avoidance, in which the model refuses to commit on a question that the regulator has answered in plain text in a documented Feedback Statement, illustrated here by the FS25/2 Dear CEO letter retirement count.
For stockbrokers and trading representatives, the practical reading is: AI output on the Consumer Duty needs to be checked against verbatim substrate (PRIN 2A, PS22/9, FG22/5, FS25/2) before it lands in a work product the firm or the regulator will rely on. The model's confidence is not a reliable signal of accuracy on this regulation, because the failures recorded are confident-wrong, not hesitant-wrong.
The Consumer Duty is a regime the FCA grades in writing through Feedback Statements, supervisory correspondence, and thematic-review outputs, and the rule text plus the FCA's documented Feedback record together are what an audit-traceable AI use of the regulation has to sit on.
The RLB Specialist Panel runs structured audits of frontier AI models against high-stakes regulator-issued texts. Each finding on the Consumer Duty is bound to the regulator-issued source it tests against, recorded with the AI subject's answer in full, and held with the verbatim regulator-issued passage that grades the answer. The Panel offers AI labs a partnership channel: regulator-substrate-bound finding sets, audience-tagged per profession and per sector-department, that feed into model evaluation pipelines and post-training reinforcement signals.
Firms with a stake in the work product (legal, compliance, risk, product, board secretariat) get a referenceable finding library on the regulations they live with. Every finding on this regulation is held with the question, the AI subject's full answer, the regulator-issued passage that grades the answer, and the model identity, so the record can be audited end-to-end and the binding can be re-verified at any time.
Stockbrokers and authorised trading representatives operating under the Consumer Duty should, on every AI-drafted client-facing or desk-supervisor work product, take the following discipline:
These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.
RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.
Primary source verified: FCA PS22/9 + PRIN 2A + FG22/5 · Substrate documents: R2-REGULATION-PS22_9_full_policy_statement.pdf, p_21_ACT_FS25_2__March_2025____Rules_and_Dear_CEO_137A.html · FCA portal: fca.org.uk
Citation IDs referenced:
RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47