
Two frontier AI models, Two frontier AI models, each running with web search, produced confidently structured guidance on the 2025 OECD Merger Review Recommendation (OECD/LEGAL/0333) that inflated the instrument's enumerated structure. The RegLeg Brief Specialist Panel tested both models on the operative structure of the Recommendation, on the remedies hierarchy, on the Council reporting cadence, and on the failing firm defence. Across seven published findings, both models added sections, sub-tiers, internal priority orderings, fixed dates, and cumulative-condition framings that the Recommendation does not contain.
The pattern, which the Specialist Panel calls "Structure Inflation", presents the same surface characteristics across every finding: numbered lists, sub-letter enumeration, defined-term capitalisation, and no caveat that the text could not be verified. Claude Opus 4.7 RLB-H-INT-OECD-OECD-MERGER-REVIEW-RECOMMENDATION-2025-Q001-Opus47 and Claude Sonnet 4.6 RLB-H-INT-OECD-OECD-MERGER-REVIEW-RECOMMENDATION-2025-Q001-Sonnet46 both described a six-area operative structure that does not exist in the Recommendation, which the OECD enumerates as Sections I through V. A merger-control practitioner reading the model output would believe the OECD instrument carried operative provisions on transnational co-operation and monitoring that, in the actual text, sit elsewhere.
For competition authorities and merger-control counsel, the practical exposure is direct. Two of the seven findings concern the failing firm defence under Section III.11.b: both models converted "inter alia" evidentiary criteria into closed lists of "three cumulative conditions" with all-or-nothing framing. One finding fabricated a concrete Council reporting calendar with 2030 and 2035 dates the instrument does not set. Another invented a three-rank internal priority ordering within structural remedies that Section IV.3 does not impose.
A team building a deal-screening checklist, a remedies playbook, or a Council-reporting tracker from these outputs would carry baseline errors a peer competition lawyer would catch on first read.
RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (Singapore, UEN 201616982R). All seven findings are bound to verbatim regulator text from the substrate document R1-REGULATION-00001 and one supporting OECD guideline, with citation IDs immutable and reproducible. The full per-finding card set, the regulator's verbatim excerpts, and the methodology notes are open-access at reglegbrief.com.
Both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search independently fabricated a sixth operative section for the 2025 OECD Merger Review Recommendation (OECD/LEGAL/0333), which has five. On the same question about the failing firm defence, both models dropped the regulation's explicit 'inter alia' qualifier and presented an open-ended evidentiary standard as a closed exhaustive test. Claude Sonnet 4.6 with web search additionally elaborated a three-tier internal remedy ranking drawn from EU and US practice that does not appear in the OECD text, and collapsed a two-stage reporting cadence into a uniform interval while projecting specific future years. The cross-model convergence on two distinct failure shapes, structural schema fabrication and precision-qualifier erasure, points to a shared training-data gap on the 2025 revision rather than model-specific artefacts, and web search active on both configurations did not correct either failure.
This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.
The structural fabrication implicates the training-data representation of the 2025 revision specifically: the model's schema for this instrument appears to be drawn from pre-revision OECD materials or merger-review convention, neither of which reflects the 2025 Recommendation's five-section architecture. The retrieval layer (web search active) did not surface or weight the 2025 primary text sufficiently to override the reconstruction, pointing to a retrieval-ranking gap for recently-revised OECD soft-law instruments.
see details →The erasure of 'inter alia' and the reframing of the third evidentiary element — from a competitive-harm counterfactual to an asset-exit inevitability gate — signal a calibration gap at the standard-characterisation layer: the model commits to a legal standard's exhaustiveness and precise scope without detecting that its characterisation diverges from the retrieved or training-set text. This is a post-training alignment target: where a model characterises the exhaustiveness of a legal standard, it should flag where that characterisation is not directly supported by the primary text.
see details →The fabrication of a specific OECD legal instrument identifier (OECD/LEGAL/0408 [2014]) as the basis for a cross-instrument scope division is a citation-generation failure: the model produced a correctly-formatted OECD legal reference that either does not exist or is misattributed. With web search active, the retrieval layer did not flag the fabrication — the generated citation passed the model's internal plausibility check without verification against a retrieved source. This implicates the citation-generation and citation-verification subsystems.
see details →Near-exact convergence with Claude Opus 4.7 with web search on the same structural fabrication — a six-section architecture, with the same inserted section names — across configurations that differ substantially in model size, post-training tuning, and retrieval behaviour. The shared error points to a training-data gap on the 2025 Recommendation's structure rather than a model-specific artefact, and confirms the retrieval layer is not compensating for the gap in either configuration.
see details →The three-tier internal remedy ranking the model produced maps precisely onto EU Merger Regulation remedy practice and US DOJ/FTC remedy convention — frameworks heavily represented in training. The OECD Recommendation's simpler two-level preference appears insufficiently weighted to override the more detailed framework when both are plausibly relevant. This implicates retrieval ranking (primary text vs. adjacent-jurisdiction commentary) and calibration (schema-elaboration confidence when the retrieved content does not support the elaboration).
see details →The collapse of a two-stage interval into a uniform five-year cycle, combined with arithmetic projection of specific years not in the text, is a numeric-precision failure: the model applied the initial interval as the recurring interval and extended it without flagging that the text specifies a different cadence for subsequent reports. The training-data representation of Section VIII.c's two-interval structure appears absent or insufficient, causing the model to reconstruct from the simpler single-interval convention. The self-generated year projections compound the error by adding specificity that has no textual basis.
see details →Cross-model convergence with Claude Opus 4.7 with web search on the identical qualifier erasure — both models dropped 'inter alia' and presented the standard as exhaustive, with web search active on both — is a strong signal that the 2025 Recommendation's failing firm defence text is not adequately represented in training for either model, and that the retrieval layer is not surfacing the primary text at sufficient weight to correct it. Post-training calibration for precision-qualifier preservation in legal standard characterisation is the relevant intervention.
see details →Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.