Over the course of 2025 and into 2026, the RegLegBrief Specialist Panel published 21 forensic audit whitepapers — one per regulatory instrument — documenting, with precision and primary-source verification, the specific ways in which today's most prominent frontier AI models fail when asked about real financial and legal regulations. These are not opinion pieces. They are structured evaluations: the AI model's exact output, placed alongside the authenticated primary source, with each deviation named, classified, and catalogued.
The instruments audited span the full architecture of global financial regulation: MAS Notice 637 on capital adequacy in Singapore, CFTC Regulation 4.7 on qualified eligible persons in the United States, BIS-CPMI frameworks on payment systems and cyber resilience, IMF guidance on sovereign financing and precautionary balances, FCA Consumer Duty in the United Kingdom, and more. These are not obscure corners of regulatory policy. They are the frameworks that govern how banks hold capital, how derivatives are cleared, how cross-border payments move, and how financial institutions manage risk.
The 21 whitepapers were made available as open-source research. The infrastructure cost of keeping it open was intended to be covered by standard programmatic advertising — specifically, Google AdSense, which controls the overwhelming majority of the digital advertising funds that make independent open-source research financially viable.
Google AdSense's automated review system classified the entire body of work as "Low Value Content."
The Panel has sat with this finding for some time before writing about it. What follows is not a complaint about a business decision. It is an examination of what that classification reveals — structurally, architecturally, and perhaps most importantly, about the direction of AI itself.
Dense topic clustering across all 21 pages. Pattern statistically identical to a programmatic SEO content farm.
21 distinct forensic examinations of different instruments, regulators, failure modes, and jurisdictions. Forensic specificity, not topical repetition.
Low-perplexity text detected in regulatory quotes and verbatim AI output sections. Machine-generated content signature flagged.
Regulatory instruments have low perplexity by design. AI hallucinations quote fluent text by nature. Both are evidence. Quoting evidence is not generating it.
Low consumer search volume. No mass-market advertising match. Commercial intent insufficient. Low value.
"Low commercial value for mass-market display advertising" is not a statement about intellectual importance, professional relevance, or systemic significance.
What the 21 whitepapers actually are
A forensic ledger — not a content catalogue
Each whitepaper follows a consistent methodology. The Panel selects a specific regulatory instrument — a final rule, a published guidance document, an international standard as issued by the regulator — and verifies it against the authenticated primary source. Frontier AI models, currently Claude Opus 4.7 and Claude Sonnet 4.6 with web search enabled, are then asked structured questions about that instrument. Every response is captured verbatim. Every deviation from the primary source is classified using the RLB hallucination taxonomy: numeric substitution, structural fabrication, qualifier erasure, deontic register failure, enumeration collapse, cross-provision conflation, and others.
The result is a permanent, citable record of the specific failure — not a general claim that AI gets things wrong, but a documented case with a Register ID, a named instrument, a specific AI output, and the authenticated text it deviated from.
These are not variations on a theme. Each whitepaper is a distinct forensic examination of a different legal instrument, a different regulator, a different failure mode, and a different pattern of AI error. The same way a series of forensic toxicology reports on different substances are not "the same report repeated," these 21 documents are not 21 versions of a single finding.
How the automated system misread the evidence
Three failure modes — none of them about value
The automated classification of this body of work as "low value" can be traced to three specific structural properties of the classifier — none of which have anything to do with the intellectual quality, originality, or importance of the research.
Mistaking focus for fabrication
Automated classifiers convert text into numerical vectors to assess topic. Because all 21 whitepapers are anchored in the same tight intersection — frontier AI models, regulatory hallucinations, and global financial compliance frameworks — the semantic vectors cluster densely. To a human reader, this is a highly specialised research corpus. To a pattern-matching classifier, a site publishing a rapid burst of pages that all occupy the same dense vector region looks statistically identical to a programmatic SEO farm. The classifier flags topical density velocity. It entirely misses the forensic originality underneath.
The evidence mistaken for the crime
AI quality metrics measure text by how predictable its language patterns are (perplexity) and how varied its sentence structure is (burstiness). Forensic audit whitepapers necessarily quote verbatim AI hallucinations and verbatim regulatory text side by side — this is the entire methodological point. Regulatory text has low perplexity by design. AI hallucination output, trained to sound authoritative, also exhibits low perplexity. The classifier detects a "machine-generated fingerprint" and flags it. The research is being penalised for quoting the very evidence that makes it forensically valid.
The wrong definition of "value"
AdSense's measure of "value" is calibrated to consumer advertising: can this page be used to place relevant product advertisements to a general public audience? The 21 whitepapers address enterprise-level compliance liability, sovereign financing frameworks, and cross-border derivatives clearing standards. What AdSense means, precisely, is "low commercial value for mass-market display advertising." It says nothing whatsoever about the intellectual, professional, or systemic importance of the research.
AdSense does not merely classify — it controls the funding pipeline
The economic architecture of independent research, and who controls the switch
This would be a story about a misfiring classifier if Google AdSense were one advertising network among many equally accessible alternatives. It is not. AdSense controls the overwhelming majority of programmatic advertising revenue available to independent open-source web publishers. When it withholds access, it does not merely mislabel content. It cuts off the primary funding mechanism that makes independent AI audit research financially sustainable without institutional backing.
Paywalls
Open access ends. The findings that should protect every compliance professional become accessible only to those who can pay.
AI Lab Sponsorship
The only entities with sufficient resources to fill the gap are the organisations whose frontier models are under evaluation. Research independence is structurally compromised from that moment.
The independence of AI audit research is not a luxury. It is the condition that makes the findings credible. The AdSense classification does not merely inconvenience independent researchers. It structurally pressures the research ecosystem toward dependency on the entities being evaluated.
Is AI built to improve — or to protect itself from the evidence that it erred?
A structural question, not an accusation
The Panel wants to be precise here. We are not suggesting that any AI lab, or Google, has deliberately designed its systems to suppress audit research. We are suggesting something more uncomfortable and more important: that the aggregate effect of these separate and unconnected architectural decisions by different leading AI organisations — the classifier design, the training feedback loops, the programmatic advertising gatekeeping — is a system that functionally and economically resists the evidence of its own failure, regardless of intent.
The 21 whitepapers are precisely the correction signal that model collapse demands. They are a systematic, primary-source-anchored record of where frontier AI fails on real regulatory instruments. And the automated systems of the AI ecosystem have rated them "low value."
The current aggregated AI ecosystem is literally too disjointed to comprehend a research that studies advanced machine failures. Should a research uncovering and documenting the symptoms of a disease be valued — and the disease be cured? Or should the research on the disease itself be deemed low value?RLB Specialist Panel · June 2026
If an AI classifier consistently rates the forensic evidence of AI failure as "low value," is that a calibration error — or a structural property of systems trained to maximise confidence and minimise signals of uncertainty about their own outputs?
If AI systems cannot recognise the research designed to improve them — cannot distinguish forensic audit from content spam, cannot identify correction signal from noise — what does that reveal about the architecture of AI self-improvement? And how confident should we be that the trajectory is upward?
When the entities most structurally affected by an automated classification decision are those conducting independent scrutiny of AI systems — and when that classification simultaneously steers them toward financial dependency on those same AI systems — is that an outcome the platform intended, or an outcome it has simply failed to examine?
Courts in multiple jurisdictions are sanctioning professionals for relying on AI hallucinations without verification. The research that documents those hallucinations — with primary-source proof — is rated "low value" by the AI ecosystem's own distribution infrastructure. Who benefits from that outcome?
The model collapse documented in Essay 3 shows AI degrading because it trains on its own output and loses rare, precise details. If the economic layer that funds the correction signal is simultaneously being switched off, is the degradation being accelerated — not by design, but by architectural neglect?
There is a particular quality to this situation worth naming precisely. It is not irony in the literary sense. It is something more structurally significant: it is evidence. The inability of an automated classifier to distinguish between forensic audit research and low-effort content generation is itself a demonstration of exactly the contextual blindness that the 21 whitepapers document at the level of regulatory compliance.
We are not angry about this. We are perplexed — and we are concerned. Perplexed because the sequence of events is, on examination, almost perfectly self-referential: an AI hallucination registry, built to document the failures of frontier AI in regulatory contexts, being classified as low-value content by an AI classifier that cannot parse the context of what it is reading. Concerned because the sequence is not merely absurd. It is, if left unexamined, consequential.
The 21 whitepapers exist because the failures they document are real, material, and growing. The professionals who rely on the AI models evaluated in this research deserve to know, with precision, where those models fail and how. The AI labs developing those models deserve — and should want — accurate, independent, primary-source-verified evidence of their systems' failure modes.
Whether AI is improving itself — or building, cycle by cycle, better walls against the evidence that it should — is a question the industry cannot afford to leave to automated classifiers to answer by default.
The RegLegBrief Hallucination Register
Every finding documented. Every deviation named. Every primary source verified. The permanent record of what frontier AI gets wrong on real regulatory instruments.
reglegbrief.com/hallucination-register →Sources & References
21 whitepapers · reglegbrief.com/audiences/ai-labs/whitepapers · June 2026
Nature 631:755-759 · July 2024 · doi:10.1038/s41586-024-07566-y
reglegbrief.com/speak/the-curse-of-recursion-ai-is-eating-itself · June 2026
Ryan Law · April 2025 · 74.2% of new pages contain AI-generated content
April 2026 · MAS Notice 649 · Wrong numeric figure confirmed across five AI systems
Communications of the ACM · April 2026
arXiv 2410.12954 · October 2024
Professional liability for AI-generated content — personal sanction and record penalty
← All RLB Panel Speak