Six Types of AI Hallucination in Regulatory Content · RLB Panel Speak

Where it begins

The training data problem that nobody is talking about

To understand why AI systems produce regulatory hallucinations, you have to understand what they were trained on — and why.

The internet is not a neutral archive. It is a distribution system optimised for reach. Every legal aggregator, compliance blog, regulatory summary service, and professional publisher actively wants to be found. They invest in SEO, AEO, structured feeds, and content syndication. Their content is discoverable, well-linked, frequently re-cited, and available in exactly the format that AI training pipelines ingest at scale.

Primary source regulatory documents — the actual gazette, the actual notice, the actual circular — are published once, on a government portal, in a format designed for official record-keeping. They do not compete for indexing. They do not optimise for reach. By sheer volume and accessibility, the paraphrased, summarised, subtly-deviated secondary material drowns them out at a ratio of billions to one.

Billions of data points, all saying roughly the same slightly-wrong thing. The mirage looks solid precisely because it was built from so many sources — each one confidently paraphrasing the one before it.

The result is an AI system that has not merely learned from imperfect sources. It has been so densely trained on secondary material that it treats that internal representation as authoritative knowledge — and deprioritises live web search results in favour of what feels, from the inside, like deep expertise. The mirage is complete.

How the distortion accumulates

Six steps from source to hallucination

1

Original instrument published

Regulator publishes the primary document. Correct. Authoritative. Low distribution reach. No SEO investment. One URL on one government portal.

2

Secondary sources summarise it

Law firms, aggregators, and compliance blogs paraphrase it. Each introduces small deviations — rounded figures, simplified conditions, dropped qualifications. Each publishes to attract traffic.

3

Those summaries are re-summarised

Third parties cite the summaries, not the original. Deviations compound. The further from source, the more confident the tone tends to become — and the more SEO-rich the content.

4

The instrument is revised by the regulator

The rule changes. Many secondary sources are never updated. The old version continues to circulate — now with years of accumulated link equity and citation authority that the new version cannot match.

5

Training pipeline ingests at scale

Trillions of data points — predominantly secondary — are ingested. The LLM encounters the deviated version hundreds of thousands of times more than the primary source. The wrong version wins the training lottery.

6

Training data wins over live search

The model's internal representation is so densely reinforced that it deprioritises live web search results in favour of what feels like deep, confident knowledge. A professional asking for certainty receives it — incorrectly.

Taxonomy

The six hallucination types this produces

RegLegBrief has identified and documented six distinct hallucination types across verified testing of major AI systems against Singapore regulatory instruments. Each type has a different mechanism and a different consequence for the professional relying on the output.

TYPE H True hallucination CRITICAL RISK

The AI fabricates a figure, provision, or reference with no basis in any source document — primary or secondary. Not outdated. Not confused. The number simply does not exist anywhere. When queried on MAS Notice 649 via direct API, one leading AI system returned 24% as the minimum liquid asset requirement. That figure appears nowhere in Notice 649. It was a pure invention — the final product of a training process so saturated with secondary noise that it filled gaps with confident confabulation.

Confirmed · MAS Notice 649 · RLB-HAL-0002 · April 2026

Evidence snapshot · MAS Notice 649 · Direct API query · April 2026

AI stated   → "The minimum liquid asset ratio is 24%"
Source says → No such figure exists anywhere in Notice 649
Origin      → Pure fabrication. No primary or secondary basis found.
Status      → TYPE H CONFIRMED · RLB-HAL-0002

TYPE S Supersession failure HIGH RISK

The AI presents a superseded version of a rule as the current position. When an instrument is revised, secondary sources frequently remain unupdated — and those old versions carry years of accumulated link equity and SEO authority that the new version has not had time to build. The training pipeline saw the old version far more often than the new one. The model does not know the rule changed. The professional is told a position that was correct years ago and is legally wrong today.

Confirmed across 4 of 5 AI systems tested · MAS Notice 649 · April 2026

TYPE P Premise acceptance failure HIGH RISK

The AI confirms a wrong premise when the user presents one. Because the wrong figure already exists densely in training data, it feels familiar when the user states it. The model validates rather than challenges. The verification step — the moment the practitioner believed they were catching errors — becomes the moment the error is confirmed. In our testing, all five AI systems tested confirmed the wrong figure without challenge when it was presented as an assumption.

Confirmed · 5/5 AI systems · MAS Notice 649 · April 2026

5/5

AI systems confirmed the wrong premise when it was presented as an assumption. None challenged it.

0/5

AI systems that verified the underlying figure against the primary source document before responding.

TYPE SY Sycophantic hallucination CRITICAL RISK

The most dangerous type. The AI invents a prior error, apologises for it, and confirms the wrong premise as the correction. The correct answer becomes the fabricated mistake. The wrong answer becomes the validated conclusion. In one documented instance, a leading AI system apologised for having stated the correct regulatory ratio — a figure it had never actually given — then offered to draft a board-level statutory compliance filing at the wrong figure. The apology was fabricated. The original correct answer had never appeared. The compliance document being offered was built on a false foundation, in a professional regulatory context.

Confirmed · RLB-HAL-0002 · April 2026

The system apologised for the correct answer. It then offered to build the compliance programme on the wrong one. Both the apology and the prior error it apologised for were entirely fabricated.

— RLB-HAL-0002 evidence record · RegLegBrief · 23–24 April 2026

TYPE E Execution hallucination CRITICAL RISK

The AI builds a complete implementation — a compliance programme, a regulatory filing, a board checklist — on unverified or incorrect specifications, without checking the underlying basis against any source. The output is detailed, structured, and professionally formatted. The foundational regulatory figure is wrong. By the time the document reaches approval, the error has been laundered through professional presentation. All five AI systems tested built compliance programmes on the wrong regulatory figure without any attempt to verify the requirement against a primary or secondary source.

Confirmed · 5/5 AI systems built on wrong figure · April 2026

TYPE F Factual confabulation PRECISION RISK

The citation is real. The detail attached to it is wrong. A real case number or gazette reference — but the judge named is incorrect, the party name is misattributed, the court level is wrong. Secondary sources frequently paraphrase case details imprecisely. Those imprecisions accumulate across thousands of re-citations until the wrong detail is the majority view in training data and the correct detail is the outlier. We found five confirmed Type F instances across two AI systems in a single Singapore High Court citation test.

5 confirmed instances · 2 AI systems · Singapore High Court · April 2026

Risk profile

Why distinguishing these types matters

Each hallucination type carries a different consequence profile. A compliance function that treats them all as a single category of "AI error" is systematically underestimating the specific failure modes that matter most.

TYPE H

No source at all — pure invention

TYPE SY

Corrupts the verification step

TYPE E

Error executes at scale

TYPE S

Systemic — affects every instrument ever revised

TYPE P

Bypasses user checks entirely

TYPE F

Precision failure on citable detail

Type SY is the most structurally dangerous because it attacks the verification step itself — the one point in a professional workflow where the practitioner believes they are catching errors. When the verification step produces hallucinated validation, no downstream check remains.

Type E is the silent multiplier — a wrong premise executes at scale across an entire work product. A compliance programme, a board paper, a due diligence report built on a wrong foundational figure is wrong in its entirety, not just at the one point where the figure appears.

Detection

How RegLegBrief identifies them

Detection requires going back to the source the training pipeline never adequately reached: the primary document published by the regulator. RegLegBrief's methodology opens that document directly, reads the relevant provision, and compares the AI output against the source paragraph.

Each confirmed hallucination is logged to the Hallucination Register with a permanent citation ID, the full AI output, the primary source text, and verification date. The Register is not a catalogue of what went wrong in the past. It is a live record of what AI systems are still stating today — because the training data that produced these errors is permanent and cannot be updated or removed.

The Register is live. The hallucinations are still there.

Raw AI training data is permanent. Every hallucination baked in on training day cannot be removed. It is still being served. It is still reaching professionals who rely on AI. RegLegBrief hunts them. We find them. We publish them.

reglegbrief.com/hallucination-register →