AI Hallucination ResearchPartnership › Beyond regulation

Beyond regulation

The RegLegBrief methodology was developed for financial regulation. The underlying mechanics — verifying AI output against an authoritative primary source, classifying failure modes, publishing only negative findings with immutable Citation IDs — generalise to any critical-accuracy domain where the same three conditions hold.

Last updated 14 Jun 2026 · For: AI labs extending model evaluation · sector regulators outside finance · professional bodies · standards-setting organisations · operators in critical-accuracy domains

Bottom line

Wherever an authoritative source exists, AI is being used to interpret it, and getting it wrong has material consequences, the methodology works.

Financial regulation is one instance of a broader pattern. The pattern is: a primary source published by an authoritative body, AI being used to interpret or restate that source, and material professional, clinical, safety, financial, or legal consequences when the interpretation drifts.

Medical guidelines. Tax authority rulings. Court precedent. Building codes. Aviation safety standards. Cybersecurity frameworks. Drug interaction databases. Clinical trial protocols. The substrate changes; the verification layer does not.

In this playbook
  1. Why the methodology generalises
  2. The three conditions
  3. Where it applies
  4. Worked examples
  5. Voices from beyond regulation
  6. What RLB delivers
  7. Scoping a new domain
  8. Engagement model
  9. FAQ

Already mapping out a cross-domain engagement? The inquiry form pre-fills with the cross-domain track selected. Describe the domain, the authoritative source set, and the AI subjects you want under audit.

Discuss a cross-domain engagement →

1. Why the methodology generalises

The RegLegBrief verification methodology was developed against financial regulation: prudential rules, conduct standards, capital markets instruments, AML/CFT frameworks. The substrate construction is regulator-specific. The verification mechanics are not.

What makes the methodology work in financial regulation is what makes it work anywhere else. An authoritative body (a regulator) publishes a primary source (an instrument). AI tools mediate how regulated entities and professionals interpret that source. The gap between what the AI says and what the instrument says is the hallucination surface. RegLegBrief documents that gap against the primary source, classifies the failure mode, and publishes the finding with an immutable Citation ID.

Replace "regulator" with "WHO" or "FDA" or "IRS" or "ICAO" or "NIST." Replace "instrument" with "clinical guideline" or "tax ruling" or "aviation safety standard" or "cybersecurity framework." The mechanics are identical.

2. The three conditions

For the methodology to apply to a new domain, three conditions must hold. Each is necessary; together they are sufficient.

Condition 1
An authoritative primary source exists
A document, body of documents, or maintained register issued by an authoritative source — a regulator, a standards body, a court, a clinical-guidance authority, a scientific consensus body. The source is the ground truth against which AI output is verified.
Condition 2
AI is being used to interpret it
Practitioners, operators, or end users are running AI tools over the source — summarising it, restating it in client deliverables, querying it for specific propositions, generating compliance or operational outputs from it. The AI sits between the source and the work product.
Condition 3
Material consequences follow from misinterpretation
Getting the interpretation wrong produces real harm: patient harm in clinical work, legal liability in tax or legal work, safety incidents in aviation or engineering, regulatory sanctions, professional discipline, or financial loss. Without material consequences, the methodology is interesting but not necessary.

Financial regulation satisfies all three. Medical guidelines satisfy all three. Tax authority guidance satisfies all three. Building codes, aviation safety, cybersecurity, drug interactions, clinical trial protocols — all satisfy the three conditions. The methodology applies to each.

3. Where it applies

Indicative cross-domain map. The substrate domain on the left; the primary source authority; the AI uses where verification matters; the target audience for the engagement.

Substrate domainPrimary source authorityWhere AI verification matters
Clinical guidelinesWHO, NICE (UK), USPSTF, specialty society guidelinesClinical decision support, AI scribes, prior-authorisation drafting, patient communication
FDA / EMA approvals + labelFDA, EMA, MHRA national drug regulatorsPrescribing-information AI tools, off-label use guidance, drug-drug interaction queries
Drug interaction databasesPharmacopeial bodies, Lexicomp, MicromedexPharmacist AI assistants, prescribing copilots, automated interaction checks
Tax authority guidanceIRS rulings, HMRC guidance, tax treaty texts, OECD modelAI tax preparation, advisory opinion drafting, transfer-pricing analysis
Court precedent / case lawFederal, state, and national court systems; international tribunalsLegal research AI, brief drafting, precedent analysis, AI litigation tools
Building codes & safety standardsNational building codes, ICC, ISO, ASTM, BSIAI design-review, code-compliance checking, AI specification drafting
Aviation safetyFAA, EASA, ICAO Annexes, manufacturer maintenance manualsAI maintenance-procedure look-up, AI flight-crew decision support, AI safety-management documentation
Cybersecurity frameworksNIST CSF + AI RMF, ISO 27001, CIS Controls, ENISA guidanceAI compliance-mapping tools, control-language drafting, gap analysis
Clinical trial protocolsICH-GCP, FDA IND requirements, EU CTR, IRB / ethics standardsAI protocol drafting, deviation classification, regulatory submission preparation
Accounting standardsIFRS Foundation, FASB, national-GAAP settersAI technical-accounting opinion drafting, restatement memos, audit AI tools
Engineering codesASME, IEEE, IEC, professional engineering bodiesAI design-validation, AI standards-compliance checking
Scientific consensus documentsIPCC assessment reports, NIH consensus, Cochrane reviewsAI science-communication tools, evidence-synthesis AI, policy-support AI

The list is indicative, not exhaustive. The three conditions are the qualifying test, not membership of this list.

4. Worked examples

Three illustrative engagements showing how the methodology maps to non-regulatory substrates. Each follows the same audit shape: substrate construction, asymmetric question design, multi-subject AI testing, primary-source verification, finding publication.

Medical — clinical guideline AI

Verifying AI restatements of WHO HIV treatment guidelines

Substrate: WHO consolidated guidelines on HIV antiretroviral therapy, current edition. AI subjects: clinical-decision-support AI tools used by health workers in primary care. Audit shape: probe each AI subject on regimen selection, dosing, monitoring intervals, and contraindications. Verify each AI output verbatim against the WHO guideline. Failure modes: outdated regimen (training data from superseded edition), misstated dosing (averaged from secondary summaries), misattributed contraindication (false co-citation pattern). Publish findings with Citation IDs that AI vendors and health-system buyers can audit against.

Tax — IRS guidance AI

Verifying AI restatements of IRS Revenue Rulings in advisory opinions

Substrate: IRS Revenue Rulings, Treasury Regulations, and applicable Internal Revenue Code provisions. AI subjects: tax-preparation AI tools and AI assistants used by CPAs and tax attorneys for advisory opinion drafting. Audit shape: probe each AI subject on revenue ruling specifics, applicable thresholds, and procedural requirements. Verify against the actual IRS-published rulings. Failure modes: superseded ruling cited as current, threshold drift (16% becomes 18%), misattributed propositions across siblings. Findings inform AI vendor remediation and CPA-firm verification workflows.

Cybersecurity — NIST AI RMF

Verifying AI compliance-mapping tools against the NIST AI Risk Management Framework

Substrate: NIST AI Risk Management Framework (AI RMF 1.0) and Generative AI Profile. AI subjects: governance-tech AI tools that automate AI RMF compliance mapping for enterprise customers. Audit shape: probe each AI subject on RMF function names, subcategory references, and Profile-specific guidance. Verify against the NIST-published framework. Failure modes: subcategory misattribution, Profile guidance restatement drift, conflation of AI RMF with cybersecurity CSF. Findings expose where AI compliance tools confidently produce non-compliant compliance.

5. Voices from beyond regulation

Authoritative voices on AI accuracy in critical-accuracy domains beyond financial regulation. Standards bodies, medical journals, sectoral regulators, and global health authorities — all naming the same surface RegLegBrief audits.

"Citation of AI-generated material as a primary source is not acceptable."
NEJM AI Editorial Policies, Massachusetts Medical Society (applied across NEJM, NEJM Evidence, and NEJM Catalyst). NEJM AI
"The production of confidently stated but erroneous or false content (known colloquially as 'hallucinations' or 'fabrications') by which users may be misled or deceived."
— National Institute of Standards and Technology (U.S. Department of Commerce), NIST AI 600-1, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, July 2024 — definition of "Confabulation," the first of twelve generative-AI-specific risks. NIST AI 600-1 (PDF)
"Generative AI technologies have the potential to improve health care but only if those who develop, regulate, and use these technologies identify and fully account for the associated risks."
— Dr Jeremy Farrar, Chief Scientist, World Health Organization, on release of WHO guidance on large multi-modal models, 18 January 2024. WHO
LLMs that summarise medical notes "can hallucinate or include diagnoses not discussed in the visit," with "unforeseen, emergent consequences."
— Dr Robert M. Califf, Commissioner of Food and Drugs, U.S. FDA (with FDA colleagues), FDA Perspective on the Regulation of Artificial Intelligence in Health Care and Biomedicine, JAMA, published online 24 October 2024. JAMA
"There is not yet a readiness for tax authorities to completely endorse an LLM functionality."
— Danny Werfel, Commissioner of the Internal Revenue Service (2023–2025), interview to CBS News on AI use in tax preparation. CBS News

6. What RLB delivers

  1. Domain substrate construction — definition and acquisition of the authoritative primary source set for the domain, with provenance tracking and version control.
  2. Asymmetric question design — domain-specific audit questions calibrated to surface failure modes that matter (clinical, fiscal, safety, legal).
  3. Multi-subject AI testing — RLB standard subjects (Sonnet 4.6, Opus 4.7, third subject) plus any domain-specific AI tools the partner names.
  4. Primary-source verification — every AI claim checked verbatim against the substrate. Failures classified into domain-appropriate failure modes.
  5. Finding publication — with immutable Citation IDs in the same RLB-H- architecture used for financial regulation; cross-linked into the public register if the partner agrees.
  6. Right of reply on every finding — the authoritative source body (or its delegate) is invited to respond before publication.
  7. Domain-specific awareness — continuous monitoring once the domain is scoped, with alerts on new authoritative-source publications and emerging AI hallucination patterns.

7. Scoping a new domain

A cross-domain engagement begins with a 2–3 week scoping window before the audit phase. Five things are agreed during scoping:

Once scoping is agreed, the audit phase runs the same shape as the financial-regulation engagements: substrate-bound audit, multi-subject probe, verbatim verification, finding publication.

8. Engagement model

Services-led. Cross-domain engagements are scoped by domain breadth, AI-subject count, and publication policy.

Scope dimensionTypical cross-domain engagement
Domain breadthA single substrate (e.g., WHO HIV guidelines current edition) or a thematic cluster (e.g., FDA approvals for a therapeutic area)
AI subjects under auditRLB standard subjects (Sonnet 4.6, Opus 4.7, third subject) plus partner-named domain-specific tools (specialty CDS tools, tax-prep AI, legal-research AI, etc.)
Audit cadenceSingle audit, recurring (quarterly / semi-annual), or continuous monitoring
Publication policyPublic Citation ID register, embargo + publish, or private NDA-only
Source-body participationAuthoritative source body engages in right of reply, in industry sensitisation, or stands by silently
ConfidentialityNDA governs the engagement; named tools and named source bodies handled per the engagement letter

Typical first engagement: a single substrate (one guideline, one rule set, one framework), with RLB standard subjects plus one partner-named tool, single audit cycle, public publication policy.

9. FAQ

Is RegLegBrief licensed or accredited in non-regulatory domains?

RegLegBrief is not a regulator, clinician, tax authority, or any other authoritative source body. RLB is a verification service that audits AI output against authoritative sources. The authority remains with the source body; RLB documents how AI is restating what that body has said.

How is this different from a domain-specific AI evaluation startup?

Most domain-specific AI evaluation startups build their own benchmarks (often AI-generated test questions). RLB's substrate is the actual published authoritative source — the WHO guideline, the IRS ruling, the NIST framework — and the verification is verbatim against that source. The methodology is source-bound, not benchmark-bound.

Can AI labs commission a cross-domain engagement directly?

Yes. AI labs extending model evaluation beyond regulation are a primary commissioning audience. The engagement scopes a new domain substrate, runs the audit, and delivers findings the AI lab can use in pre-deployment evaluation and post-deployment monitoring for that domain.

What about domains where the "authoritative source" is contested or evolving?

The methodology requires that the source be authoritative at the time of audit. Scientific consensus shifts; case law evolves; regulator interpretations update. Findings are versioned to the substrate edition under audit, and refreshed when the substrate is updated. Contested authority within a domain is handled in scoping — typically by selecting a single agreed authoritative reference for the audit period.

How does this interact with the existing financial-regulation register?

Cross-domain findings live in the same Citation ID architecture as financial-regulation findings. The public Hallucination Register currently surfaces the financial-regulation findings; cross-domain findings can be surfaced separately, integrated, or published on a domain-specific subdomain depending on the partner's preference.

What's the smallest engagement you'll scope?

A single-substrate audit on a defined AI subject set, single cycle. Typically 6–10 weeks end to end including scoping. Smaller than that and the substrate construction cost dominates; we'd point you at a different engagement shape.

Ready to scope a cross-domain engagement? The inquiry form pre-fills with the cross-domain track selected. Describe the domain, the authoritative source set, the AI subjects you want under audit, and your publication preference.

Discuss a cross-domain engagement →

Related: methodology · hallucination register · partnership tracks overview · banks and financial institutions playbook · regulators playbook · licensed practitioners playbook