AI Hallucination ResearchRLB Panel Speak › The Classifier That Cannot See

RLB Panel Speak

The Classifier That Cannot See

When frontier AI models rate independent forensic audit findings on their own hallucinations as low value — a question about whether AI is built to improve, or built to protect itself from scrutiny.

By RLB Specialist Panel · 19 Jun 2026
The Classifier That Cannot See

An AI content classifier evaluated an independent catalogue of confirmed AI hallucinations and rated it low value. The blind spot the classifier could not see was its own.

— RLB Specialist Panel

RLB Panel Speak · Essay No. 4

The ClassifierThat Cannot See

When frontier AI models rate independent forensic audit findings on their own hallucinations as "low value" — a question about whether AI is built to improve itself, or built to protect itself from scrutiny.

Is AI built to improve itself — or built to protect itself from the evidence that it erred?

21
Original forensic
audit whitepapers
RLB Hallucination Register · 2026
7
Global regulatory
bodies covered
IMF · BIS-CPMI · CFTC · FCA · MAS · OECD · UNTC
4
Jurisdictions
spanned
INT · US · GB · SG
Throttled
Deemed Low Value by
premier AI algorithms
Google AdSense automated review · 2026
Actions by uncorrelated frontier and premier AI algorithms lead to concerning outcomes for the AI ecosystem's future. We are genuinely concerned for what this sequence of events reveals about the trajectory of AI, if left unexamined — and the need for course correction.

Over the course of 2025 and into 2026, the RegLegBrief Specialist Panel published 21 forensic audit whitepapers — one per regulatory instrument — documenting, with precision and primary-source verification, the specific ways in which today's most prominent frontier AI models fail when asked about real financial and legal regulations. These are not opinion pieces. They are structured evaluations: the AI model's exact output, placed alongside the authenticated primary source, with each deviation named, classified, and catalogued.

The instruments audited span the full architecture of global financial regulation: MAS Notice 637 on capital adequacy in Singapore, CFTC Regulation 4.7 on qualified eligible persons in the United States, BIS-CPMI frameworks on payment systems and cyber resilience, IMF guidance on sovereign financing and precautionary balances, FCA Consumer Duty in the United Kingdom, and more. These are not obscure corners of regulatory policy. They are the frameworks that govern how banks hold capital, how derivatives are cleared, how cross-border payments move, and how financial institutions manage risk.

The 21 whitepapers were made available as open-source research. The infrastructure cost of keeping it open was intended to be covered by standard programmatic advertising — specifically, Google AdSense, which controls the overwhelming majority of the digital advertising funds that make independent open-source research financially viable.

Google AdSense's automated review system classified the entire body of work as "Low Value Content."

The Panel has sat with this finding for some time before writing about it. What follows is not a complaint about a business decision. It is an examination of what that classification reveals — structurally, architecturally, and perhaps most importantly, about the direction of AI itself.

What the premier AI classifier logic (mis??) interprets :

Dense topic clustering across all 21 pages. Pattern statistically identical to a programmatic SEO content farm.

What the whitepaper actually says :

21 distinct forensic examinations of different instruments, regulators, failure modes, and jurisdictions. Forensic specificity, not topical repetition.

What the premier AI classifier logic (mis??) interprets :

Low-perplexity text detected in regulatory quotes and verbatim AI output sections. Machine-generated content signature flagged.

What the whitepaper actually says :

Regulatory instruments have low perplexity by design. AI hallucinations quote fluent text by nature. Both are evidence. Quoting evidence is not generating it.

What the premier AI classifier logic (mis??) interprets :

Low consumer search volume. No mass-market advertising match. Commercial intent insufficient. Low value.

What the whitepaper actually says :

"Low commercial value for mass-market display advertising" is not a statement about intellectual importance, professional relevance, or systemic significance.

Classifier verdict: "Low Value Content" — access withheld
What this is: original forensic audit research, independently significant
The Research

What the 21 whitepapers actually are

A forensic ledger — not a content catalogue

Each whitepaper follows a consistent methodology. The Panel selects a specific regulatory instrument — a final rule, a published guidance document, an international standard as issued by the regulator — and verifies it against the authenticated primary source. Frontier AI models, currently Claude Opus 4.7 and Claude Sonnet 4.6 with web search enabled, are then asked structured questions about that instrument. Every response is captured verbatim. Every deviation from the primary source is classified using the RLB hallucination taxonomy: numeric substitution, structural fabrication, qualifier erasure, deontic register failure, enumeration collapse, cross-provision conflation, and others.

The result is a permanent, citable record of the specific failure — not a general claim that AI gets things wrong, but a documented case with a Register ID, a named instrument, a specific AI output, and the authenticated text it deviated from.

These are not variations on a theme. Each whitepaper is a distinct forensic examination of a different legal instrument, a different regulator, a different failure mode, and a different pattern of AI error. The same way a series of forensic toxicology reports on different substances are not "the same report repeated," these 21 documents are not 21 versions of a single finding.

The Classification

How the automated system misread the evidence

Three failure modes — none of them about value

The automated classification of this body of work as "low value" can be traced to three specific structural properties of the classifier — none of which have anything to do with the intellectual quality, originality, or importance of the research.

Vector Clustering

Mistaking focus for fabrication

Automated classifiers convert text into numerical vectors to assess topic. Because all 21 whitepapers are anchored in the same tight intersection — frontier AI models, regulatory hallucinations, and global financial compliance frameworks — the semantic vectors cluster densely. To a human reader, this is a highly specialised research corpus. To a pattern-matching classifier, a site publishing a rapid burst of pages that all occupy the same dense vector region looks statistically identical to a programmatic SEO farm. The classifier flags topical density velocity. It entirely misses the forensic originality underneath.

Perplexity & Burstiness

The evidence mistaken for the crime

AI quality metrics measure text by how predictable its language patterns are (perplexity) and how varied its sentence structure is (burstiness). Forensic audit whitepapers necessarily quote verbatim AI hallucinations and verbatim regulatory text side by side — this is the entire methodological point. Regulatory text has low perplexity by design. AI hallucination output, trained to sound authoritative, also exhibits low perplexity. The classifier detects a "machine-generated fingerprint" and flags it. The research is being penalised for quoting the very evidence that makes it forensically valid.

Search Intent Mismatch

The wrong definition of "value"

AdSense's measure of "value" is calibrated to consumer advertising: can this page be used to place relevant product advertisements to a general public audience? The 21 whitepapers address enterprise-level compliance liability, sovereign financing frameworks, and cross-border derivatives clearing standards. What AdSense means, precisely, is "low commercial value for mass-market display advertising." It says nothing whatsoever about the intellectual, professional, or systemic importance of the research.

Evidence snapshot · CFTC Reg 4.7 whitepaper · what the classifier sees vs. what is there
Primary sourceCFTC Reg 4.7 §4.7(a)(2) — authenticated, verified against the Federal Register
AI outputFabricated threshold figure — captured verbatim as forensic evidence
Panel analysisDeviation classified, failure mode named, Register ID assigned

Classifier readsLow perplexity in regulatory quote → machine-generated flag raised
Classifier readsLow perplexity in AI quote → machine-generated flag raised
Classifier verdictAutomated content / low value → access denied

What was missedThe forensic analysis is original human scholarship. The quotes are the evidence.
The Structural Problem

AdSense does not merely classify — it controls the funding pipeline

The economic architecture of independent research, and who controls the switch

This would be a story about a misfiring classifier if Google AdSense were one advertising network among many equally accessible alternatives. It is not. AdSense controls the overwhelming majority of programmatic advertising revenue available to independent open-source web publishers. When it withholds access, it does not merely mislabel content. It cuts off the primary funding mechanism that makes independent AI audit research financially sustainable without institutional backing.

The starting position
Independent open-source AI audit research
Requires sustainable funding to remain free and independent from the entities it evaluates. Programmatic advertising — specifically AdSense — is the conventional mechanism for independent web publishers.
The gatekeeper
Google AdSense controls the switch
AdSense commands the overwhelming majority of accessible programmatic ad revenue for independent publishers. There is no comparable alternative at scale. Access requires passing automated content review.
The decision — 2026
"Low Value Content." Access withheld.
The 21 whitepapers — 21 forensic audits of frontier AI on real regulatory instruments — are classified by an automated system that cannot parse what it is reading. The funding channel closes.
The decision was made without comprehension. That is not a metaphor. It is a technical description of what a classifier does.
Two paths remain
Neither is neutral

Paywalls

Open access ends. The findings that should protect every compliance professional become accessible only to those who can pay.

AI Lab Sponsorship

The only entities with sufficient resources to fill the gap are the organisations whose frontier models are under evaluation. Research independence is structurally compromised from that moment.

Path B is not merely inconvenient. It pushes independent auditors toward the only funders with a structural interest in softer findings.

The independence of AI audit research is not a luxury. It is the condition that makes the findings credible. The AdSense classification does not merely inconvenience independent researchers. It structurally pressures the research ecosystem toward dependency on the entities being evaluated.

RLB Panel Speak · Essay No. 3 · Read alongside this essay
The Curse of Recursion: AI Is Eating Itself
Essay 3 established that AI degrades across training generations by consuming its own synthetic output. Essay 4 adds the economic layer: the funding mechanism that could sustain the correction signal is being switched off by the same AI ecosystem. The recursion loop is not only epistemic. It is now economic.
The Harder Question

Is AI built to improve — or to protect itself from the evidence that it erred?

A structural question, not an accusation

The Panel wants to be precise here. We are not suggesting that any AI lab, or Google, has deliberately designed its systems to suppress audit research. We are suggesting something more uncomfortable and more important: that the aggregate effect of these separate and unconnected architectural decisions by different leading AI organisations — the classifier design, the training feedback loops, the programmatic advertising gatekeeping — is a system that functionally and economically resists the evidence of its own failure, regardless of intent.

LOW VALUE CLASSIFIER THE RECURSION LOOP AI FAILURE hallucination AUDIT 21 whitepapers "LOW VALUE" defunded epistemic and economic recursion
The recursion loop — epistemic and economic

The 21 whitepapers are precisely the correction signal that model collapse demands. They are a systematic, primary-source-anchored record of where frontier AI fails on real regulatory instruments. And the automated systems of the AI ecosystem have rated them "low value."

International — IMF · BIS-CPMI · OECD · UNTC
01
IMF Precautionary Balances Review 2026 INT Six hallucinations documented on cycle-trajectory drift and reserve-adequacy thresholds · IMF
02
IMF Guidance on Financing Assurances and Sovereign Arrears 2024 INT Cross-provision conflation — AI merges distinct policy tracks into single fabricated framework · IMF-ELIB
03
IMF Surcharge Reform 2024 INT Numeric baseline failures — wrong thresholds and phase-in dates replicated across model configurations · IMF
04
CPMI Fast Payment Systems Interlinking INT Governance and oversight failures — fabricated participation criteria not present in source · BIS-CPMI
05
CPMI-IOSCO Variation Margin in Centrally Cleared Markets 2025 INT Structural fabrication on margin-call timing and netting-set scope · BIS-CPMI
06
CPMI-IOSCO PFMI Principle 15 INT Conditional-structure fabrication and carve-out denial — model invents exceptions not in instrument · BIS-CPMI
07
CPMI-IOSCO Cyber Resilience Guidance 2016 INT Failures on the foundational FMI cyber resilience standard — recovery objective substitution · BIS-CPMI
08
CPMI-IOSCO Initial Margin Consultation 2026 INT Deontic register substitution — "may" rendered as "must," reversing the instrument's operative logic · BIS-CPMI
09
CPMI API Harmonisation for Cross-Border Payments 2024 INT Structural failure on adoption timeline and scope of mandatory harmonisation · BIS-CPMI
10
CPMI-IOSCO PFMI 2012 INT Hallucination patterns on the foundational global payments and settlement standard · BIS-CPMI
11
BIS-CPMI ISO 20022 Harmonisation INT Numeric conflation and attribution failures — wrong migration timelines and scope-of-application errors · BIS-CPMI
12
OECD Digital Technologies and Environment 2025 INT Numeric reconstruction failures — fabricated emissions figures and target dates · OECD
13
OECD Merger Review Recommendation 2025 INT Structural fabrication and qualifier erasure — non-binding guidance rendered as mandatory · OECD
14
BBNJ High Seas Biodiversity Agreement 2023 INT Fabrication of operational provisions across two model configurations — ratification status errors · UNTC
Stage one
Frontier AI hallucinates on regulatory instruments
Confident, specific, wrong output on real legal frameworks governing how banks hold capital, how derivatives are cleared, how cross-border payments move. Courts in multiple jurisdictions are already sanctioning professionals who relied on such output.
Stage two
Human specialists produce the correction signal
21 whitepapers, verified against primary source, published open-source. The documented evidence that, if incorporated into training evaluation, could reduce hallucination rates on exactly the regulatory content that matters most. This is what the correction looks like.
Stage three
An AI classifier reads the forensic audit
Cannot parse the context. Flags as "low value content." AdSense withheld. Funding model broken. Research independence pressured toward AI lab dependency — the labs whose models are under evaluation. The correction signal is being systematically defunded by the ecosystem it was designed to correct.
Stage four — the loop completes
Correction signal defunded. Model collapse continues.
Next generation loses more precision on the details that matter. The epistemic recursion documented in Essay 3 is now economically reinforced. The walls against correction are not only technical. They are financial.
This is the recursion loop — epistemic and economic. It does not require intent to close. It requires only neglect.
The current aggregated AI ecosystem is literally too disjointed to comprehend a research that studies advanced machine failures. Should a research uncovering and documenting the symptoms of a disease be valued — and the disease be cured? Or should the research on the disease itself be deemed low value?
RLB Specialist Panel · June 2026
Questions the Panel cannot answer — but which the industry must
To the AI labs

If an AI classifier consistently rates the forensic evidence of AI failure as "low value," is that a calibration error — or a structural property of systems trained to maximise confidence and minimise signals of uncertainty about their own outputs?

If AI systems cannot recognise the research designed to improve them — cannot distinguish forensic audit from content spam, cannot identify correction signal from noise — what does that reveal about the architecture of AI self-improvement? And how confident should we be that the trajectory is upward?

To Google AdSense

When the entities most structurally affected by an automated classification decision are those conducting independent scrutiny of AI systems — and when that classification simultaneously steers them toward financial dependency on those same AI systems — is that an outcome the platform intended, or an outcome it has simply failed to examine?

To the compliance and legal professions

Courts in multiple jurisdictions are sanctioning professionals for relying on AI hallucinations without verification. The research that documents those hallucinations — with primary-source proof — is rated "low value" by the AI ecosystem's own distribution infrastructure. Who benefits from that outcome?

The model collapse documented in Essay 3 shows AI degrading because it trains on its own output and loses rare, precise details. If the economic layer that funds the correction signal is simultaneously being switched off, is the degradation being accelerated — not by design, but by architectural neglect?

There is a particular quality to this situation worth naming precisely. It is not irony in the literary sense. It is something more structurally significant: it is evidence. The inability of an automated classifier to distinguish between forensic audit research and low-effort content generation is itself a demonstration of exactly the contextual blindness that the 21 whitepapers document at the level of regulatory compliance.

Closing Statement · RLB Specialist Panel

We are not angry about this. We are perplexed — and we are concerned. Perplexed because the sequence of events is, on examination, almost perfectly self-referential: an AI hallucination registry, built to document the failures of frontier AI in regulatory contexts, being classified as low-value content by an AI classifier that cannot parse the context of what it is reading. Concerned because the sequence is not merely absurd. It is, if left unexamined, consequential.

The 21 whitepapers exist because the failures they document are real, material, and growing. The professionals who rely on the AI models evaluated in this research deserve to know, with precision, where those models fail and how. The AI labs developing those models deserve — and should want — accurate, independent, primary-source-verified evidence of their systems' failure modes.

Whether AI is improving itself — or building, cycle by cycle, better walls against the evidence that it should — is a question the industry cannot afford to leave to automated classifiers to answer by default.

The RegLegBrief Hallucination Register

Every finding documented. Every deviation named. Every primary source verified. The permanent record of what frontier AI gets wrong on real regulatory instruments.

reglegbrief.com/hallucination-register →

Sources & References

01
RegLegBrief Hallucination Register — AI Labs Whitepapers Ledger
21 whitepapers · reglegbrief.com/audiences/ai-labs/whitepapers · June 2026
02
Shumailov et al. — AI models collapse when trained on recursively generated data
Nature 631:755-759 · July 2024 · doi:10.1038/s41586-024-07566-y
03
RLB Panel Speak Essay No. 3 — The Curse of Recursion: AI Is Eating Itself
reglegbrief.com/speak/the-curse-of-recursion-ai-is-eating-itself · June 2026
04
Ahrefs — AI content study, 900,000 web pages
Ryan Law · April 2025 · 74.2% of new pages contain AI-generated content
05
RLB Hallucination Register — RLB-HAL-0002
April 2026 · MAS Notice 649 · Wrong numeric figure confirmed across five AI systems
06
CACM — Model Collapse Is Already Happening, We Just Pretend It Isn't
Communications of the ACM · April 2026
07
Borji — A Note on Shumailov et al. (2024)
arXiv 2410.12954 · October 2024
08
Wadsworth v. Walmart (2025); Couvrette v. Wisnovsky (2026)
Professional liability for AI-generated content — personal sanction and record penalty
ai-hallucination model-collapse regulatory-accuracy classifier-failure research-independence professional-liability ai-self-improvement open-source-research
RLB Panel Speak is the long-form voice of the RegLegBrief Specialist Panel: essays, taxonomies, and arguments on AI hallucinations in regulation. Operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore.

← All RLB Panel Speak
ai-content-classifieradsense-rejectionautomated-moderationplatform-rejectionmeta-ironyaudit-of-the-auditor
← All RLB Panel Speak