Compliance officers at retail banks operating under the Consumer Duty are increasingly using AI to validate threshold language for fair value assessments, update customer-outcome monitoring rule sets, generate board-pack summaries of Consumer Duty annual review evidence, and reconcile FCA Feedback Statements such as FS25/2 against the bank's existing supervisory expectations register. The work product sits at the centre of the firm's annual Consumer Duty board report and the supervisor's annual relationship-management correspondence.
Two frontier AI models tested by the RLB Specialist Panel produced 8 substantive failures on this regulation under audit conditions. The failure classes recorded are: Inference Drift on the Foreseeable-Harm Safe Harbour, Confused Guidance with Rule on Consumer Testing, Inference Drift on Fair Value Quantification Expectation, Inference Drift on Required Depth of Non-Monetary Analysis, Hedge in Place of Verified FS25/2 Figure, Refusal to Confirm a Documented FS25/2 Count, Invented Dual-Event Timeline for a Single FS25/2 Withdrawal, Refusal to Confirm FS25/2 Withdrawal Count.
Questions were prepared by the RLB Specialist Panel based on real practical AI usage in the workflows the respective audience uses AI for, and each finding is bound to verbatim regulator-issued source text held as primary substrate. The Consumer Duty (PS22/9 introducing Principle 12 and PRIN 2A, in force for open products from 31 July 2023 and for closed products from 31 July 2024) is the central retail-conduct regime the FCA now uses to grade firm behaviour, and the failure modes seen here all land inside the day-to-day work product that retail-banking compliance teams sign off on.
For retail-banking compliance, the operational consequence is direct. The annual Consumer Duty board report, the supervisor's annual relationship-management correspondence, and the firm's product-governance monitoring evidence all rest on accurate framing of the rule. A defect imported from AI work product surfaces on the next thematic review, and the compliance function carries the supervisory exposure.
Citation IDs for the findings in this brief: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q007-Sonnet46, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Sonnet46, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q013-Sonnet46, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Sonnet46. Each citation links to the per-finding record, the AI subject answer, and the regulator-issued substrate excerpt the answer was tested against. The RLB Specialist Panel maintains an audit-traceable record of which model produced which answer, against which substrate passage, and the binding is what makes the finding referenceable in firm work product and in supervisory correspondence.
The findings below are the ones that retail-banking compliance teams working under the Consumer Duty are most likely to encounter in the AI tools they already use, and the briefing sections that follow read each finding against the regulator-issued text.
This is the consolidated view of findings. Click the Citation IDs or 'see details →' on any item for the full details for each finding.
Retail Banking compliance teams build cross-cutting policy on foreseeable-harm in customer journeys. The model's multi-factor reconstruction would, if imported into a compliance manual or customer-warning template, raise the standard above the FCA's actual single-test rule, inflating both control cost and customer friction without an enforcement benefit. The compliance function would carry a fabricated test through Q1 reviews until a regulator interaction surfaced the mis-fit.
Retail Banking compliance teams preparing PRIN 2A.5 monitoring frameworks need to distinguish the rule from FG22/5 guidance. The model's specific PRIN 2A.5.10R citation as a binding consumer-testing requirement, if adopted in policy, creates an obligation the FCA has not imposed and locks the firm into a methodology that exceeds what the rule asks. Cost overhang and a hard-to-reverse policy commitment follow.
Retail Banking compliance teams running fair-value assessments under FG22/5 need the qualitative-only methodology preserved. The model's reversal of the regulator's stated position would push the function to build quantitative non-monetary analysis the FCA does not require, distorting both fair-value templates and product approval workstreams.
Retail Banking compliance teams reviewing AI-assisted fair-value methodology summaries should treat 'substantiated comparisons' as a model-fabricated standard. Adopting it into the fair-value template raises the bar above FG22/5's qualitative-assessment expectation and creates documentation overhead with no compliance dividend.
Retail Banking compliance teams tracking the live FCA supervisory-letter landscape need an accurate account of FS25/2's March 2025 withdrawals. The model's split timeline of April and August 2025 events misrepresents the regulator's record and, if carried into a horizon-scanning summary, would mislead the compliance committee on which letters and multi-firm reports are in force.
Retail Banking compliance teams cannot afford an evasive AI response on a question the regulator has answered in print. FS25/2's '90+' figure is on the FCA website; a compliance-summary that records the AI's 'cannot confirm' position leaves the firm without a verified position on a directly retrievable fact.
Retail Banking compliance teams meeting on the FCA's withdrawn-letter landscape need a clean account; the model's repeated April/August 2025 timeline across two questions strongly suggests the AI's internal representation is fabricated, and any compliance brief that imports it will be wrong in two places, not one.
Retail Banking compliance teams should treat the model's combined evasion-plus-fabricated-citation pattern as the highest-risk failure mode in AI-assisted regulatory work. A compliance brief that includes the fabricated Clifford Chance URL imports a sourcing failure that survives surface review but fails any deeper check.
Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.