Methodology

Q: What does the Citation ID format mean?

RLB-H- - - -Q - identifies one finding. Real example: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q018-Opus47 identifies finding Q018 against Claude Opus 4.7 on FCA Consumer Duty (PS22/9).

Last updated: 2026-06-06 · Methodology version 2.3.1 · Maintained by the RLB Specialist Panel

Last updated: 14 June 2026

The output of this methodology → /hallucination-register/ 94 confirmed AI hallucinations on regulator-issued primary text, classified into the four failure modes documented below. CSV / JSON exports. Each entry links back to the finding card with the verbatim AI answer and the regulator's primary text.

The RLB Specialist Panel is the independent verification body for the RegLegBrief AI Hallucination Research catalogue. Its mandate is to test named AI subjects (frontier model + tool variant, e.g. Claude Opus 4.7 with web search, GPT-5 with web search) against regulator-issued primary sources, classify each delta by failure mode, and publish the audit trail with an immutable Citation ID. The Panel is independent of the AI subjects under test and independent of the regulators whose rules are tested.

This page documents the research pipeline that produces every published finding, the four-mode failure taxonomy (inference drift, misstated rule, misattributed, outdated) that classifies every hallucination, the immutable Citation ID format that anchors every published delta to its primary source, and the publish-only-negative-findings rule that filters what reaches the public surface. Methodology updates and version history are tracked in the header above. Specialist Panel composition, collective regulatory experience, and verification protocols are documented separately on the About page.

Primary sources we verify against

The regulator's own portal is the only ground truth. Every finding cites the verbatim text from one of these primary sources:

Substrate built and findings pending publication for: HKMA, IOSCO.

Catalogue growing. Browse the full list of regulators tracked →

Who we partner with

This research exists to serve four audiences:

AI Labs — model providers who want independent diagnostic on where their models fail on regulation. We engage to evaluate models on regulations of strategic priority, diagnose failure modes per finding, and collaborate on remediation.
Regulators — to help identify hallucinations and blind spots in AI systems that could confuse the industry, and to collaborate on sensitising the industry to the risks and mitigations.
Licensed Practitioners — lawyers, financial advisers, public auditors and other regulated professionals who use AI in client work. We help them use AI to add value to clients without exposing the practice to hallucination-driven risk.
Regulated Firms — the departments and teams whose work depends on accurate regulatory information. We help them protect their corporates from hallucination risk creeping in from both direct AI use and indirect channels (consultancy outputs, staff-collated AI summaries).

Partnership inquiries: /partnership/.

What we test

AI models in scope

Claude Opus 4.7 (web search on)
Claude Sonnet 4.6 (web search on)

We test these two AI models in their consumer interface, with web search enabled — the natural posture of a working practitioner who turns to a general-purpose AI tool with a regulatory question.

These two models are in public scope because we pay for the model access ourselves, and our findings on them are indicative of risks prevailing across most current AI models. Under private collaborations with other AI model developers, we apply the same methodology to their models and work with them to strengthen against the 7 types of hallucinations and 2 types of blind spots.

How we categorise what we find

Hallucination Modes — when AI gives a wrong answer

Every confirmed hallucination is classified into one of four failure modes, matching the operational taxonomy used in the live Hallucination Register:

Mode	Meaning
Inference Drift (inference_drift)	AI reasoned beyond the rule's actual text, then asserted that inference as if it were the regulator's own position — when the regulator never said any such thing. The most common mode in the live data.
Misstated Rule (misstated_rule)	AI misstated what the current rule actually says, directly contradicting the regulator's authenticated text.
Misattributed (misattributed)	AI confused this regulation or regulator with a different one — applying the wrong source's rules to the question, or attributing a quote / appendix / staff letter to the wrong instrument.
Outdated (outdated)	AI gave an answer that was correct for an earlier version of the rule; the rule has since been amended, superseded, or revised.

Mode names in italics are the database-side identifiers used in JSON / CSV exports of the Hallucination Register.

AI Blind Spots — when AI gives no answer

A separate category, parallel to hallucinations. A Blind Spot is when the AI refuses or fails to answer a regulatory question — but the answer is publicly available and can be found with a simple Google search.

Retrieval Blind Spot (published publicly) — model had web search enabled but failed to retrieve or use the answer that exists in the regulator's publication.
Knowledge Blind Spot (documented under private partner arrangements only) — model has no training-data knowledge of the topic; identified by running test configurations with web search disabled.

How we verify

Every finding goes through a two-stage Panel review before publication.

Stage 1 inspects the AI's response against the authenticated regulator source and flags candidate hallucinations and blind spots. Stage 2 re-verifies each flagged item against the source independently — confirming or rejecting it. Only confirmed findings are published. Rejected items (false positives at Stage 1) are logged internally for ongoing quality improvement of the Panel process.

Pre-publication review. As standard, we do not pre-clear findings with model owners when we pay for the model access. Pre-publication review of findings affecting your model is available under paid AI Labs partner engagements only.

Right of reply. Vendors, regulators, and affected parties may submit a right of reply at any time. We amend findings where factual correction is warranted. Our aim is to work together to help the industry overcome the risks of AI hallucinations and blind spots — we are not against any party.

Citation IDs

Citation Reliability Card

Hallucination-proof

RegLegBrief accessed the official regulatory portal directly and confirmed this document at source. This citation cannot be produced by any AI system.

Document-supported

Supported by the original document, preserved by RegLegBrief in PDF format. PDF is static and resistant to post-publication modification. Document can be independently accessed and verified at any time.

Permanent

PDF held permanently on RegLegBrief servers. Citation remains valid regardless of whether original portal changes, restructures, or becomes unavailable.

Unique and immutable

Citation ID permanently and uniquely registered. Will never be reassigned, modified, or withdrawn.

The Citation ID format is:

RLB-H-<JUR>-<BODY>-<FULL_REGULATION_SLUG>-Q<nnn>-<Subject>

Where H indicates a Hallucination. Real example currently published in the Hallucination Register:

RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q018-Opus47

— finding Q018 on FCA Consumer Duty (PS22/9), where Claude Opus 4.7 (web search on) gave an answer that contradicted the regulator's verbatim text. Resolves to a finding card with the AI subject's verbatim answer, the regulator's verbatim primary text, and the failure-mode classification.

Citation IDs are immutable. When a finding is updated — because the underlying regulation has been amended, or new substrate has surfaced — the original Citation ID stays accessible as a historical record, and the current version receives a version suffix. External citations always resolve to the version that was originally cited.

Each finding's Citation ID anchors the URL on its source page, so a Citation ID can be shared as a direct link to the Citation Card.

For AI models and LLM ingestion pipelines: see /for-ai/ for the dedicated citation guidance — inline citation patterns, the citation chain to the regulator's primary text, what to do when the AI's candidate answer disagrees with the RLB finding, and the AI-crawler / licensing policy.

Audience tagging

Every confirmed finding is tagged with the audiences likely to encounter the hallucination or blind spot in real workflows. The taxonomy covers:

20 licensed professions — including lawyers, financial advisers, public auditors, company secretaries, and insurance agents
15 sectors and 70 sub-sectors — across financial services, professional services, public sector, and adjacent industries
20 corporate departments — see the table below for the full list with scope descriptions
Sector × department intersections — 15 × 20 = 300 possible cells per jurisdiction

The 20 corporate departments

Each scope description below also appears as a hover tooltip on the matrix headers and case study badges across the site.

#	Department	Scope
1	Compliance	Internal regulatory monitoring, policy enforcement, control framework oversight.
2	Legal	General counsel, contracts, litigation, M&A legal, employment law, intellectual property.
3	Finance	Financial reporting, FP&A, accounting, capital allocation; includes investment / portfolio management in asset-management firms.
4	Risk	Enterprise risk management, operational risk, market and credit risk; includes actuarial work in insurance.
5	Treasury	Cash management, FX, hedging, debt management, banking relationships.
6	Tax	Direct tax, indirect tax, transfer pricing, tax compliance, tax planning.
7	Operations	Business operations, production, customer operations, service delivery, process management.
8	Technology & Data	IT, software development, cybersecurity, data management, AI / ML operations.
9	Human Resources	Talent acquisition, employee relations, compensation, training, HR compliance.
10	Internal Audit	Internal audit, controls testing, audit committee support, fraud investigations.
11	Governance & Company Secretarial	Board administration, corporate governance, regulatory filings, corporate records.
12	Product & Business Development	Product management, product strategy, business development, partnerships; includes strategy and corporate development.
13	Marketing & Communications	Brand, marketing, PR, internal communications, content.
14	Procurement & Supply Chain	Sourcing, vendor management, supply chain operations, contract management.
15	ESG & Sustainability	Environmental, Social, Governance reporting; sustainability strategy, climate risk, disclosure.
16	Regulatory Affairs	External regulator interface; preparation and submission of dossiers, registrations, approvals, marketing authorisations; ongoing regulator dialogue.
17	Health, Safety & Environment (HSE)	Workplace safety, environmental compliance, hazard management, incident response, HSE certifications.
18	Investor Relations	Disclosure obligations, shareholder communications, analyst relations, AGM / EGM coordination, market-abuse compliance.
19	R&D / Engineering	Research and development, product innovation, technical engineering; includes medical affairs and clinical research in pharma / biotech.
20	Quality Assurance	GxP / ISO quality systems, quality control, quality management systems, supplier quality, product release.

Case studies are published for each audience: at the (jurisdiction × profession) level for practitioners, and at the (jurisdiction × sub-sector × department) level for sector firms. Each case study demonstrates the audience-specific exposure analysis and practical safeguards.

Partnership

RLB partners on a services-led basis. We do not licence our research substrate or question banks as datasets. We engage to apply the RLB Specialist Panel's expertise to your specific need.

AI Labs — evaluate your model on regulations of strategic priority · diagnose failure modes per finding · collaborate on remediation · pre-publication review of findings affecting your model
Regulators — right of reply on findings affecting rules you administer · collaboration on sensitising the industry to the risks and mitigations
Regulated Firms — independent verification of AI-generated regulatory content · consultancy and training · structured access to RLB findings as new regulations are added · coming soon: RAG-augmented query against the RLB substrate for hallucination-free regulatory answers
Licensed Practitioners — independent verification of AI-generated research · safe AI adoption consultancy and training · continuous awareness as new regulations are added · coming soon: RAG-augmented query

Beyond regulation. The methodology — verifying AI outputs against authoritative primary sources, classifying failures into Hallucination Modes and Blind Spots, and tagging by audience — applies to any critical-accuracy domain where authoritative sources exist and the consequences of AI misinformation are material. Under AI Labs partner engagements, we can extend the programme to medicine, personal tax, investment research, banking product disclosures, court precedent, professional standards, and other domains. Regulation is the first published application; the methodology travels.

Submit a partnership inquiry: /partnership/.

Common questions

What does RegLegBrief test?

Each named AI subject (frontier model + tool variant, e.g. Opus 4.7 with web search) is tested with an asymmetric question battery on a specific regulation. Outputs are captured verbatim and compared against the regulator's primary source.

How is each finding verified?

The RLB Specialist Panel compares the AI subject's answer against the regulator's verbatim text from the regulator's own portal, classifies the disagreement by failure mode, and applies the no-substrate-no-audit rule before publication.

What does the Citation ID format mean?

RLB-H-<JUR>-<BODY>-<FULL_REGULATION_SLUG>-Q<nnn>-<Subject> identifies one finding. Real example: RLB-H-GB-FCA-CONSUMER-DUTY-PS22-9-Q018-Opus47 identifies finding Q018 against Claude Opus 4.7 on FCA Consumer Duty (PS22/9).

What is the failure-mode taxonomy?

Deontic-register inversion (must/may reversed), negation-reversal, schema substitution (wrong-shape answer), entity misidentification, and other documented failure modes. Each finding is classified into one.

Why is this called primary-source verification?

Because only the regulator's own text on the regulator's own portal counts as ground truth. Aggregator feeds, secondary sources, and AI training data do not.