AI Hallucination ResearchAudiencesSectorsUnited KingdomPayment InstitutionsRisk › Consumer Duty (PS22/9 + PRIN 2A)
Payment Institutions × Risk — United Kingdom · updated 2026-06-11 · methodology v2.3
Share / Print Twitter LinkedIn Email

AI Hallucination on Consumer Duty for Risk teams at Payment Institutions firms in the United Kingdom

Payment Institutions Risk teams: documentation and reporting gaps possible from AI reading of Consumer Duty

Risk teams at payment institutions and e-money firms operating under the Consumer Duty are increasingly using AI to update foreseeable-harm risk matrices for retail-customer journeys, validate fair-value risk assessments for new products, and stress-test the firm's customer-outcome KPIs against PRIN 2A. The work product feeds directly into the firm's risk register and the executive-risk-committee dashboard.

Two frontier AI models tested by the RLB Specialist Panel produced 3 substantive failures on this regulation under audit conditions. The failure classes recorded are: Inference Drift on the Foreseeable-Harm Safe Harbour, Inference Drift on Fair Value Quantification Expectation, Inference Drift on Required Depth of Non-Monetary Analysis. Questions were prepared by the RLB Specialist Panel based on real practical AI usage in the workflows the respective audience uses AI for, and each finding is bound to verbatim regulator-issued source text held as primary substrate.

The Consumer Duty (PS22/9 introducing Principle 12 and PRIN 2A, in force for open products from 31 July 2023 and for closed products from 31 July 2024) is the central retail-conduct regime the FCA now uses to grade firm behaviour, and the failure modes seen here all land inside the day-to-day work product that payment-institutions risk teams sign off on.

For payment-institutions risk, the operational consequence is direct. The risk register, the foreseeable-harm matrix for retail-customer journeys, and the executive-risk-committee dashboard all rest on accurate PRIN 2A framing. A defect imported from AI work product surfaces on internal-audit pull or supervisor review, and the risk function carries the second-line exposure.

Citation IDs for the findings in this brief: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Sonnet46. Each citation links to the per-finding record, the AI subject answer, and the regulator-issued substrate excerpt the answer was tested against. The RLB Specialist Panel maintains an audit-traceable record of which model produced which answer, against which substrate passage, and the binding is what makes the finding referenceable in firm work product and in supervisory correspondence.

The findings below are the ones that payment-institutions risk teams working under the Consumer Duty are most likely to encounter in the AI tools they already use, and the briefing sections that follow read each finding against the regulator-issued text.

<- Take me back to my Risk x Payment Institutions (UK) overview

Executive Summary

The FCA Consumer Duty (PS22/9 and PRIN 2A, with FG22/5 guidance) is the UK retail conduct framework that frames day-to-day work for Risk teams at Payment Institutions firms. Across the 3 findings in this cell, frontier AI models tested with web search produced confidently wrong reconstructions of the FCA's text in ways that bear directly on Risk workstreams at Payment Institutions firms. Each error converts into either an over-build cost (defensive controls or templates the rule does not require) or a supervisory-record misstatement that surfaces on review.

None of the errors deliver any compliance benefit; all of them add operational cost or expose the team to challenge.

How AI gets this regulation wrong

The findings in this cell are inference drift and rule-misstatement, not refusal. The models committed to specific operational answers where the FCA's actual text would have resolved the question differently. For Risk teams at Payment Institutions firms, the consequence is that AI-assisted summaries of the FCA's published positions cannot be relied on without source-text verification.

AI's Failure ModeCountAffected findings
Inference Drift1Finding#1
Inference Drift1Finding#2
Inference Drift1Finding#3

What that means for your team

For Risk teams at Payment Institutions firms, the findings cluster on the same risk category: regulatory enforcement exposure where the FCA's text resolves the question differently, paired with the operational cost of building controls or analytical work the rule does not require. The audit trail of the team's regulatory engagement becomes the durable record, and importing AI-fabricated reconstructions into that record undermines the team's ability to respond to a supervisory or internal-audit challenge.

Risk ImpactCountAffected findings
Regulatory enforcement / professional liability exposure3Finding#1 · Finding#2 · Finding#3

When this affects your department

Risk teams at Payment Institutions firms encounter the Consumer Duty across the team's core workstreams. The Payment Institutions business model brings retail customers and the Duty's product-governance, fair-value, and consumer-understanding obligations into the team's daily work, and the team increasingly uses AI tools to surface FCA requirements at the framing and drafting stages.

The findings in this cell map onto the most operationally consequential question types for this audience. Where the AI is asked about a binding rule, an FCA scope position, a methodology expectation, or a recent supervisory action, the models tested produce confident wrong answers. The error patterns are consistent between Opus 4.7 and Sonnet 4.6, suggesting structural failure modes rather than model-specific slips.

The findings at a glance

The table below summarises each finding from our testing on the Consumer Duty for this audience, including the question area tested, the type of AI failure observed, and the risk category that failure creates for Risk teams at Payment Institutions firms.

#Finding titleTypeCitation ID
1Fabricated multi-part safe harbour for foreseeable-harm ruleHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q003-Opus47
2Inverted FG22/5 on fair-value quantification for non-monetary benefitsHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Opus47
3Imposed substantiated-comparison expectation FG22/5 does not requireHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q008-Sonnet46

Aggregate impact

For Risk teams at Payment Institutions firms, the findings show a coherent pattern: AI tools produce confident, operationally consequential answers on Consumer Duty questions that the FCA's published text directly contradicts. The pattern holds across both models tested and across the cross-cutting rules, the four-outcomes structure, the scope exclusions, the fair-value methodology, and the FCA's recent supervisory-letter record.

The implication for the team's AI-use posture is structural rather than tactical. Any AI-assisted summary that names a specific PRIN 2A provision, characterises an FCA scope position, or recites figures from a feedback statement requires direct source verification before it can be built into a template, brief, or control framework. The verification cost is real but the over-build cost of relying on the AI's framing is larger.

What your team should do

Risk teams at Payment Institutions firms should treat AI tools as a starting point for Consumer Duty research, not as a source of FCA text. Any output that quotes a PRIN 2A provision, describes the FCA's scope position, or recites figures from a feedback statement requires direct verification against the FCA Handbook or the published feedback statement before it can be transmitted to a colleague or included in a deliverable. The findings in this cell show that the verification cost is not theoretical.

For practical safeguards on Consumer Duty work: (a) pull the underlying PRIN 2A paragraph from the FCA Handbook before relying on an AI tool's characterisation of a rule. (b) Confirm any AI-supplied figure or date from an FCA publication against the underlying PDF before it appears in a deliverable. (c) Build into the team's AI-use practice a specific carve-out for scope and methodology questions: these are precisely the question types where this testing shows AI tools produce confident wrong answers.

Where AI tools are most safely used in this practice area: framing the structure of a Duty-related deliverable, identifying which Duty workstreams are likely relevant to a particular product line, drafting first-draft summaries for review against the source text, and surfacing cross-references between Duty obligations and adjacent FCA expectations. The risk concentrates in the rule-specification, methodology, and supervisory-record steps. At that point the source document is the only reliable input.

How RLB Can Help

RegLeg's published Hallucination Research is available as a free pre-flight check for Risk teams at Payment Institutions firms operating across UK conduct supervision. Before relying on AI-assisted output for Consumer Duty interpretation, the research identifies precisely which areas of the Duty's text have historically generated confident but incorrect AI output, letting the team apply targeted scrutiny.

RegLeg also works with UK firms on bespoke regulator deep-dives that map AI-supported workflows in the Risk function at Payment Institutions firms to their actual hallucination exposure, and conducts confidential reviews of the firm's existing AI-use policy against the failure-mode catalogue. For teams building durable in-house capability, RegLeg develops training material tailored to the Risk x Payment Institutions context.

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.