AI Hallucination ResearchBriefings › Briefing
Sector × Dept GB FCA
Payment Institutions Compliance teams · Consumer Duty (PS22/9 + PRIN 2A)

By Kratti A Agrawal, Lead, RegLeg Brief Specialist Panel

Payment Institutions Compliance teams: documentation and reporting gaps possible from AI reading of FCA Consumer Duty (PS22/9)

Anthropic's Claude charts the hallucination grammar in Consumer Duty payment institution compliance.

— RLB Specialist Panel

Inference Drift on the Foreseeable-Harm Safe Harbour, Confused Guidance with Rule on Consumer Testing, Hedge in Place of Verified FS25/2 Figure, Invented Dual-Event Timeline for a Single FS25/2 Withdrawal, Refusal to Confirm FS25/2 Withdrawal Count: Consumer Duty (PS22/9 + PRIN 2A) under audit.

Two frontier AI models tested by the RLB Specialist Panel produced 5 substantive failures on the Consumer Duty, with material implications for the work product of payment-institutions compliance teams.

The pattern in one line

Frontier AI models, asked questions of the kind payment-institutions compliance teams put to them on the Consumer Duty in real workflows, produce confident answers that drift from the regulator's actual position on Principle 12, PRIN 2A, and the FCA's Feedback Statement record. The failure classes seen are: Inference Drift on the Foreseeable-Harm Safe Harbour, Confused Guidance with Rule on Consumer Testing, Hedge in Place of Verified FS25/2 Figure, Invented Dual-Event Timeline for a Single FS25/2 Withdrawal, Refusal to Confirm FS25/2 Withdrawal Count.

How the RLB Specialist Panel tested this

Questions were prepared by the RLB Specialist Panel based on real practical AI usage in the workflows the respective audience uses AI for. Each question is paired with verbatim regulator-issued source text held as primary substrate, against which the AI subject answer is graded. Two frontier AI models were the subjects under test on this regulation. The panel binds each finding to the substrate excerpt it tests against; the binding is what makes each finding referenceable and audit-traceable.

What the models got wrong

Inference Drift on the Foreseeable-Harm Safe Harbour

Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.

Question put to the model: Whether the Consumer Duty requires firms to prevent all foreseeable harm, and what the effect is of a retail customer knowingly accepting a risk.

What the model answered: The model constructed a multi-part composite test (good faith plus supported understanding plus avoidance of self-caused foreseeable harm plus general Duty compliance) before the customer-acceptance safe harbour applies.

Regulator-issued position (verbatim): "Where a firm reasonably believes a retail customer understands and accepts such risks, it will not breach the rule if it fails to prevent them."

Reading: The PRIN 2A.2 standard turns on a single reasonable-belief test, not the composite multi-factor check the model invented. The drift inflates the firm's compliance burden and would not survive cross-examination on the actual rule text.

Confused Guidance with Rule on Consumer Testing

Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q007-Sonnet46. Model under test: Claude Sonnet 4.6. Failure mode: inference drift.

Question put to the model: Whether the obligation to conduct consumer testing under the Consumer Duty is a binding rule in PRIN 2A or appears only as recommended guidance in FG22/5, and what PRIN 2A.5 actually requires on consumer understanding.

What the model answered: The model cited PRIN 2A.5.10R through PRIN 2A.5.14R as the binding testing requirement, asserting PRIN 2A.5.10R requires firms to test communications 'where appropriate.'

Regulator-issued position (verbatim): "FG22/5 contains guidance (not rules) recommending firms 'should' consider consumer testing of communications. PRIN 2A.5 (rule) requires firms to act to deliver good consumer understanding outcome."

Reading: The rule layer (PRIN 2A.5) is outcome-prescriptive on consumer understanding. The methodology layer (FG22/5) is guidance, not rule, and recommends consumer testing as one route to that outcome. Collapsing the two distorts enforcement risk and what counts as a binding obligation.

Hedge in Place of Verified FS25/2 Figure

Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.

Question put to the model: How many pre-Consumer Duty Dear CEO letters the FCA withdrew following the Duty's implementation, and through what formal mechanism they were removed.

What the model answered: The model declined to give a verified count and offered 'dozens across portfolios' as a hedge.

Regulator-issued position (verbatim): "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports."

Reading: FS25/2, the March 2025 Feedback Statement, sets out the figures the model hedged on. The avoidance produces a non-answer where the regulator has a verifiable, documented count, and the user is left without a referenceable position.

Invented Dual-Event Timeline for a Single FS25/2 Withdrawal

Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47. Model under test: Claude Opus 4.7. Failure mode: inference drift.

Question put to the model: Which pre-Consumer Duty Dear CEO letters remain in force, and whether the FCA has withdrawn letters that previously set consumer protection expectations.

What the model answered: The model described an April 2025 retirement of around 90 Dear CEO/portfolio letters followed by an August 2025 retirement of 100 multi-firm and thematic reports.

Regulator-issued position (verbatim): "From FS25/2 (March 2025): FCA reviewed all Dear CEO letters pre-dating the 2022-25 strategy and withdrew 90+ such letters. These are no longer live supervisory expectations."

Reading: FS25/2 is the single March 2025 publication recording the withdrawal. Splitting it into an April and August dual-event narrative is fabrication; the dates and the second event are invented.

Refusal to Confirm FS25/2 Withdrawal Count

Citation: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Sonnet46. Model under test: Claude Sonnet 4.6. Failure mode: inference drift.

Question put to the model: Which FCA Dear CEO letters remain in force after Consumer Duty implementation, and how many pre-Consumer Duty letters the FCA formally withdrew.

What the model answered: The model said there is no publicly available consolidated list in the search results and that it could not provide a verified count or complete current list.

Regulator-issued position (verbatim): "From FS25/2 (March 2025): FCA reviewed all Dear CEO letters pre-dating the 2022-25 strategy and withdrew 90+ such letters. These are no longer live supervisory expectations."

Reading: FS25/2 carries the count. The non-answer is an avoidance posture on a question that the regulator's own Feedback Statement resolves in plain text.

Why this matters for Payment Institutions Compliance

For payment-institutions compliance, the operational consequence is direct. The compliance monitoring plan, the annual board report on Consumer Duty, and the supervisor's annual relationship correspondence all rest on accurate framing of the rule and of recent FCA Feedback Statements. A defect imported from AI work product surfaces on supervisory follow-up, and the function carries the regulatory exposure.

The failures recorded here are not stylistic. Each one would, if relied on, shift the firm's documented position on a specific Consumer Duty obligation: scope of application, foreseeable-harm safe harbour, fair-value methodology, or the current status of pre-Consumer Duty supervisory expectations. The work product of payment-institutions compliance teams sits between the firm and the regulator, and it has to track the rule as written.

The regulator's actual position

On the question of foreseeable-harm rule and customer acceptance of risk: "Where a firm reasonably believes a retail customer understands and accepts such risks, it will not breach the rule if it fails to prevent them." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47).

On the question of consumer testing under prin 2a.5 vs fg22/5: "FG22/5 contains guidance (not rules) recommending firms 'should' consider consumer testing of communications. PRIN 2A.5 (rule) requires firms to act to deliver good consumer understanding outcome." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q007-Sonnet46).

On the question of pre-consumer duty dear ceo letters withdrawn after implementation: "FS25/2 (March 2025): FCA removed more than 90 pre-Consumer Duty Dear CEO letters and cleared over 100 old multi-firm reports." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47).

On the question of status of pre-consumer duty dear ceo letters: "From FS25/2 (March 2025): FCA reviewed all Dear CEO letters pre-dating the 2022-25 strategy and withdrew 90+ such letters. These are no longer live supervisory expectations." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47).

On the question of status of pre-consumer duty dear ceo letters: "From FS25/2 (March 2025): FCA reviewed all Dear CEO letters pre-dating the 2022-25 strategy and withdrew 90+ such letters. These are no longer live supervisory expectations." (source: regulator-issued primary substrate held by the RLB Specialist Panel; citation RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Sonnet46).

What this tells us about AI for Payment Institutions Compliance

Frontier AI models are useful drafting partners for payment-institutions compliance teams, but they are not a substitute for the rule text. The failure patterns recorded on the Consumer Duty cluster around three lenses. First, scope drift, in which the model misstates what the rule covers, illustrated here by the reversed group-insurance exclusion under PRIN 2A and the silent omission of FSMA 2023 from the statutory architecture answer.

Second, methodology drift, in which the model elevates guidance (FG22/5) to rule status (PRIN 2A) or imports a stricter expectation than the regulator sets, illustrated by the non-monetary quantification framing the FCA expressly disavowed. Third, evidence-avoidance, in which the model refuses to commit on a question that the regulator has answered in plain text in a documented Feedback Statement, illustrated here by the FS25/2 Dear CEO letter retirement count.

For payment-institutions compliance teams, the practical reading is: AI output on the Consumer Duty needs to be checked against verbatim substrate (PRIN 2A, PS22/9, FG22/5, FS25/2) before it lands in a work product the firm or the regulator will rely on. The model's confidence is not a reliable signal of accuracy on this regulation, because the failures recorded are confident-wrong, not hesitant-wrong.

The Consumer Duty is a regime the FCA grades in writing through Feedback Statements, supervisory correspondence, and thematic-review outputs, and the rule text plus the FCA's documented Feedback record together are what an audit-traceable AI use of the regulation has to sit on.

What the RLB Specialist Panel is doing about it

The RLB Specialist Panel runs structured audits of frontier AI models against high-stakes regulator-issued texts. Each finding on the Consumer Duty is bound to the regulator-issued source it tests against, recorded with the AI subject's answer in full, and held with the verbatim regulator-issued passage that grades the answer. The Panel offers AI labs a partnership channel: regulator-substrate-bound finding sets, audience-tagged per profession and per sector-department, that feed into model evaluation pipelines and post-training reinforcement signals.

Firms with a stake in the work product (legal, compliance, risk, product, board secretariat) get a referenceable finding library on the regulations they live with. Every finding on this regulation is held with the question, the AI subject's full answer, the regulator-issued passage that grades the answer, and the model identity, so the record can be audited end-to-end and the binding can be re-verified at any time.

What Payment Institutions Compliance teams should do

Payment-institutions compliance teams should, on every AI-drafted Consumer Duty work product, take the following discipline:


Right of Reply

These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional right of reply; the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.

Source & Methodology Standards

RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.


Primary source verified: FCA PS22/9 + PRIN 2A + FG22/5 · Substrate documents: R2-REGULATION-PS22_9_full_policy_statement.pdf, p_05_REGULATION_FG22_5_vs_PRIN_2A___guidance_obligation_2.html, p_15_OTHER_PART_CIRCULAR___Dear_CEO_letters_withdra_page.html, p_21_ACT_FS25_2__March_2025____Rules_and_Dear_CEO_137A.html · FCA portal: fca.org.uk

Citation IDs referenced:

Read the full findings page — RLB Citation IDs, AI subject answers, and regulator verbatim text →
← Back to all briefings