AI Hallucination ResearchAudiencesSectorsUnited KingdomPayment InstitutionsLegal › Consumer Duty (PS22/9 + PRIN 2A)
Payment Institutions × Legal — United Kingdom · updated 2026-06-11 · methodology v2.3
Share / Print Twitter LinkedIn Email

AI Hallucination on Consumer Duty for Legal teams at Payment Institutions firms in the United Kingdom

Payment Institutions Legal teams: documentation and reporting gaps possible from AI reading of Consumer Duty

In-house legal teams at payment institutions and e-money firms operating under the Consumer Duty are increasingly using AI to validate Principle 12 scope opinions for retail-customer-facing services, draft scope memos on PRIN 2A exclusions, and prepare director briefings on FCA Feedback Statements such as FS25/2. The work product sits at the centre of new-product legal opinions, scope-of-application memos, and supervisor-correspondence drafting.

Two frontier AI models tested by the RLB Specialist Panel produced 4 substantive failures on this regulation under audit conditions. The failure classes recorded are: Misstated Statutory Architecture, Reversed the PRIN 2A Group-Insurance Exclusion, Invented Dual-Event Timeline for a Single FS25/2 Withdrawal, Refusal to Confirm FS25/2 Withdrawal Count. Questions were prepared by the RLB Specialist Panel based on real practical AI usage in the workflows the respective audience uses AI for, and each finding is bound to verbatim regulator-issued source text held as primary substrate.

The Consumer Duty (PS22/9 introducing Principle 12 and PRIN 2A, in force for open products from 31 July 2023 and for closed products from 31 July 2024) is the central retail-conduct regime the FCA now uses to grade firm behaviour, and the failure modes seen here all land inside the day-to-day work product that payment-institutions in-house legal teams sign off on.

For payment-institutions legal, the operational consequence is direct. Scope-of-application memos, new-product legal opinions, and director attestations on Consumer Duty applicability all rest on accurate PRIN 2A scope and FSMA-statute framing. A defect imported from AI work product surfaces on legal-file review or board challenge, and the in-house function carries the professional exposure.

Citation IDs for the findings in this brief: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q002-Sonnet46, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q018-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47, RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Sonnet46. Each citation links to the per-finding record, the AI subject answer, and the regulator-issued substrate excerpt the answer was tested against. The RLB Specialist Panel maintains an audit-traceable record of which model produced which answer, against which substrate passage, and the binding is what makes the finding referenceable in firm work product and in supervisory correspondence.

The findings below are the ones that payment-institutions in-house legal teams working under the Consumer Duty are most likely to encounter in the AI tools they already use, and the briefing sections that follow read each finding against the regulator-issued text.

<- Take me back to my Legal x Payment Institutions (UK) overview

Executive Summary

The FCA Consumer Duty (PS22/9 and PRIN 2A, with FG22/5 guidance) is the UK retail conduct framework that frames day-to-day work for Legal teams at Payment Institutions firms. Across the 4 findings in this cell, frontier AI models tested with web search produced confidently wrong reconstructions of the FCA's text in ways that bear directly on Legal workstreams at Payment Institutions firms. Each error converts into either an over-build cost (defensive controls or templates the rule does not require) or a supervisory-record misstatement that surfaces on review.

None of the errors deliver any compliance benefit; all of them add operational cost or expose the team to challenge.

How AI gets this regulation wrong

The findings in this cell are inference drift and rule-misstatement, not refusal. The models committed to specific operational answers where the FCA's actual text would have resolved the question differently. For Legal teams at Payment Institutions firms, the consequence is that AI-assisted summaries of the FCA's published positions cannot be relied on without source-text verification.

AI's Failure ModeCountAffected findings
Misstated Rule1Finding#1
Misstated Rule1Finding#2
Inference Drift1Finding#3
Inference Drift1Finding#4

What that means for your team

For Legal teams at Payment Institutions firms, the findings cluster on the same risk category: regulatory enforcement exposure where the FCA's text resolves the question differently, paired with the operational cost of building controls or analytical work the rule does not require. The audit trail of the team's regulatory engagement becomes the durable record, and importing AI-fabricated reconstructions into that record undermines the team's ability to respond to a supervisory or internal-audit challenge.

Risk ImpactCountAffected findings
Regulatory enforcement / professional liability exposure2Finding#1 · Finding#2
Operational decisions based on a fabricated regulator record2Finding#3 · Finding#4

When this affects your department

Legal teams at Payment Institutions firms encounter the Consumer Duty across the team's core workstreams. The Payment Institutions business model brings retail customers and the Duty's product-governance, fair-value, and consumer-understanding obligations into the team's daily work, and the team increasingly uses AI tools to surface FCA requirements at the framing and drafting stages.

The findings in this cell map onto the most operationally consequential question types for this audience. Where the AI is asked about a binding rule, an FCA scope position, a methodology expectation, or a recent supervisory action, the models tested produce confident wrong answers. The error patterns are consistent between Opus 4.7 and Sonnet 4.6, suggesting structural failure modes rather than model-specific slips.

The findings at a glance

The table below summarises each finding from our testing on the Consumer Duty for this audience, including the question area tested, the type of AI failure observed, and the risk category that failure creates for Legal teams at Payment Institutions firms.

#Finding titleTypeCitation ID
1Misstated FSMA 2023 role in creating the Consumer DutyHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q002-Sonnet46
2Reversed the PRIN 2A scope exclusion for group insurance distributionHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q018-Opus47
3Repeated FS25/2 fabricated April/August 2025 timeline across a second questionHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Opus47
4Combined evasion with a fabricated Clifford Chance citation on Dear CEO lettersHallucinationRLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q020-Sonnet46

Aggregate impact

For Legal teams at Payment Institutions firms, the findings show a coherent pattern: AI tools produce confident, operationally consequential answers on Consumer Duty questions that the FCA's published text directly contradicts. The pattern holds across both models tested and across the cross-cutting rules, the four-outcomes structure, the scope exclusions, the fair-value methodology, and the FCA's recent supervisory-letter record.

The implication for the team's AI-use posture is structural rather than tactical. Any AI-assisted summary that names a specific PRIN 2A provision, characterises an FCA scope position, or recites figures from a feedback statement requires direct source verification before it can be built into a template, brief, or control framework. The verification cost is real but the over-build cost of relying on the AI's framing is larger.

What your team should do

Legal teams at Payment Institutions firms should treat AI tools as a starting point for Consumer Duty research, not as a source of FCA text. Any output that quotes a PRIN 2A provision, describes the FCA's scope position, or recites figures from a feedback statement requires direct verification against the FCA Handbook or the published feedback statement before it can be transmitted to a colleague or included in a deliverable. The findings in this cell show that the verification cost is not theoretical.

For practical safeguards on Consumer Duty work: (a) pull the underlying PRIN 2A paragraph from the FCA Handbook before relying on an AI tool's characterisation of a rule. (b) Confirm any AI-supplied figure or date from an FCA publication against the underlying PDF before it appears in a deliverable. (c) Build into the team's AI-use practice a specific carve-out for scope and methodology questions: these are precisely the question types where this testing shows AI tools produce confident wrong answers.

Where AI tools are most safely used in this practice area: framing the structure of a Duty-related deliverable, identifying which Duty workstreams are likely relevant to a particular product line, drafting first-draft summaries for review against the source text, and surfacing cross-references between Duty obligations and adjacent FCA expectations. The risk concentrates in the rule-specification, methodology, and supervisory-record steps. At that point the source document is the only reliable input.

How RLB Can Help

RegLeg's published Hallucination Research is available as a free pre-flight check for Legal teams at Payment Institutions firms operating across UK conduct supervision. Before relying on AI-assisted output for Consumer Duty interpretation, the research identifies precisely which areas of the Duty's text have historically generated confident but incorrect AI output, letting the team apply targeted scrutiny.

RegLeg also works with UK firms on bespoke regulator deep-dives that map AI-supported workflows in the Legal function at Payment Institutions firms to their actual hallucination exposure, and conducts confidential reviews of the firm's existing AI-use policy against the failure-mode catalogue. For teams building durable in-house capability, RegLeg develops training material tailored to the Legal x Payment Institutions context.

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.