AI Hallucination on Consumer Duty for Financial Advisers in the United Kingdom

<- Take me back to my Financial Advisers (UK) overview

Executive Summary

Financial Advisers responsible for retail customer outcomes work daily inside the Consumer Duty framework: customer-journey design, fair-value assessments, disclosure templates, and product-governance reviews all turn on the FCA's actual text. Across five findings in this cell, frontier AI models reversed FG22/5's qualitative-only fair-value methodology into an affirmative quantification standard (in both Opus and Sonnet variants), confused FG22/5 guidance with binding PRIN 2A.5 rules on consumer testing, added conditions to the foreseeable-harm safe harbour at PRIN 2A.2, and reversed the express PRIN 2A.1.8R scope exclusion for group insurance distribution.

Each of these maps directly onto the kinds of templates, reviews, and product specs that financial advisers produce, and each carries real over-build cost when relied on without source-text verification.

How AI gets this regulation wrong

The findings in this cell are inference drift and rule-misstatement. The models tested committed to specific operational answers where the FCA's actual text either holds a single-test standard (PRIN 2A.2), a qualitative-only methodology (FG22/5 fair-value), or an express scope exclusion (PRIN 2A.1.8R). The errors are particularly damaging because each one would push a financial adviser to build a more demanding compliance programme than the FCA has asked for.

AI's Failure Mode	Count	Affected findings
Inference Drift	1	Finding#1
Inference Drift	1	Finding#2
Inference Drift	1	Finding#3
Inference Drift	1	Finding#4
Misstated Rule	1	Finding#5

What that means for your practice

For Financial Advisers, the five findings cluster on operational over-build risk: customer-warning standards built above the rule, fair-value templates built to a quantification bar the FCA has not set, and scope decisions that pull excluded activities into the Duty's compliance programme. Each represents real cost, additional template review cycles, additional fair-value documentation, additional product-governance steps, with no offsetting compliance benefit because the underlying rule does not require any of it.

Risk Impact	Count	Affected findings
Regulatory enforcement / professional liability exposure	5	Finding#1 · Finding#2 · Finding#3 · Finding#4 · Finding#5

When this affects Financial Advisers

Financial Advisers responsible for retail customer outcomes encounter the Consumer Duty across product-governance reviews, customer-journey design, fair-value assessment templates, disclosure design, and ongoing customer-monitoring frameworks. Each of these workstreams turns on the FCA's published text in PRIN 2A and FG22/5, and each is increasingly supported by AI-assisted research at the framing stage.

The findings in this cell map onto the most operationally consequential questions a financial adviser receives. The foreseeable-harm safe harbour (Finding#1) underpins customer-warning templates across every retail product line, and the model's multi-factor reconstruction would build a defensive standard the rule does not require. The PRIN 2A.5 versus FG22/5 confusion (Finding#2) is the most expensive: the model treats consumer testing as a binding rule under PRIN 2A.5.10R when FG22/5 guidance recommends it, and the over-build cost of running mandatory testing programmes runs into real adviser time and cost.

The fair-value inversion (Findings#3 and 4) is the most pervasive: both Opus and Sonnet, in different surface forms, push the firm to quantify non-monetary costs and benefits when FG22/5 expressly says qualitative is sufficient. The group insurance reversal (Finding#5) affects financial advisers in the insurance distribution chain who would otherwise apply the Duty to activities the FCA has excluded.

Each of these is a question a financial adviser is most likely to ask an AI tool when scoping a new product approval or a fair-value review cycle, and each is a question where the testing showed the AI's answer is wrong in a way that consistently elevates the apparent compliance burden.

The findings at a glance

The table below lists each finding from the Consumer Duty testing in this cell, showing the question area, the AI's failure mode, and the citation identifier for the underlying finding record.

#	Finding title	Type	Citation ID
1	Fabricated multi-part safe harbour for foreseeable-harm rule	Hallucination	RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47
2	Confused FG22/5 guidance with PRIN 2A.5 rule on consumer testing	Hallucination	RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q007-Sonnet46
3	Inverted FG22/5 on fair-value quantification for non-monetary benefits	Hallucination	RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Opus47
4	Imposed substantiated-comparison expectation FG22/5 does not require	Hallucination	RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Sonnet46
5	Reversed the PRIN 2A scope exclusion for group insurance distribution	Hallucination	RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q018-Opus47

Aggregate impact

The five findings in this cell describe a coherent pattern of model behaviour on retail-conduct rules: the models tested consistently produced answers that elevated the apparent compliance burden above the FCA's actual standard. The foreseeable-harm safe harbour got an extra three conditions, the consumer-understanding outcome got a binding-rule consumer-testing requirement it does not have, the fair-value assessment got a quantification expectation FG22/5 expressly disclaims, and the group insurance distribution chain got pulled into a scope the FCA has expressly excluded.

For financial advisers, this pattern is structurally dangerous because the over-build is invisible at the point of adoption. A fair-value template that asks for quantitative non-monetary analysis looks more rigorous, not less; a customer-warning programme that runs through multiple compliance conditions looks more careful, not less; a Duty-compliant programme that covers group insurance distribution looks more comprehensive, not less. Each of these increases cost and adviser time without any underlying regulatory requirement.

The implication is that AI-assisted research on retail-conduct workstreams cannot be relied on for methodology decisions, scope determinations, or rule-versus-guidance distinctions. Each of these is a question type that produces wrong answers in the direction of more work for the financial adviser, not less.

What your team should do

Financial Advisers should treat AI tools as a starting point for Consumer Duty workstream design, not as a source of methodology or scope decisions. Any output that names a fair-value standard, characterises a rule-versus-guidance distinction, or recites a scope position requires direct verification against the FCA Handbook or FG22/5 before it can be built into a template, review process, or product spec.

For practical safeguards on retail-product work: (a) when an AI tool describes the FCA's expectation on fair-value methodology, pull the FG22/5 passage on qualitative versus quantitative assessment before building or reviewing a template. (b) When an AI tool characterises a PRIN 2A provision as binding, confirm the rule-versus-guidance status by reading the provision directly in the FCA Handbook; the typographic distinction (R, G, E) is the operative signal. (c) When an AI tool gives a scope position on a particular product line, group insurance, large-risk commercial contracts, reinsurance, confirm against PRIN 2A.1.8R before adjusting the product-governance programme.

Where AI tools are most safely used in this practice area: framing customer-journey design, identifying which Duty workstreams are likely relevant to a new product line, drafting customer-facing summaries for review against the source text, and surfacing cross-references between Duty workstreams and adjacent FCA expectations. The risk concentrates in the next step, where the AI is asked to specify the actual methodology, the applicable scope, or the rule-versus-guidance status. At that point the source document is the only reliable input.

How RLB Can Help

RegLeg's published Hallucination Research gives UK Financial Advisers a free pre-flight check before relying on AI tools for Consumer Duty workstream design. Before an AI-assisted template, review process, or product spec is finalised, the research identifies which areas of the Duty, the foreseeable-harm safe harbour, the rule-versus-guidance boundary on consumer testing, the fair-value methodology in FG22/5, and the scope exclusions in PRIN 2A.1.8R, have historically generated confident but incorrect AI output that would push the firm to over-build its compliance programme.

Beyond the published research, RegLeg works with UK retail-facing firms on bespoke regulator deep-dives that map AI-supported workflows within financial-adviser practice to their actual hallucination exposure. The deep-dive identifies which workstreams (customer-warning templates, fair-value assessments, product-governance reviews, distribution-chain scope decisions) warrant additional controls or independent verification steps. RegLeg also conducts a confidential review of the firm's existing AI-use policy against the failure-mode catalogue, delivering a prioritised remediation plan.

For teams that want to build durable in-house capability, RegLeg develops training material and CPD-aligned content tailored to the UK Financial Adviser context, covering how to interpret AI-generated regulatory summaries critically and how to document AI-assisted decision-making consistently with FCA conduct standards.