
Two frontier AI models running with web search enabled, both tested by the RLB Specialist Panel, produced confidently wrong reconstructions of the 2023 BBNJ Agreement, the UN treaty governing biodiversity in areas beyond national jurisdiction that entered into force on 17 January 2026.
The RegLeg Brief Specialist Panel tested both models across the Agreement's environmental impact assessment threshold, its marine genetic resources benefit-sharing framework, and the non-undermining clause that bounds the Conference of the Parties, and documents six findings in which the models cited the wrong article number for the rule they were stating, or inverted the Agreement's express temporal scope.
Both Opus 4.7 and Sonnet 4.6, asked whether the Agreement's marine genetic resources obligations reach back to specimens collected before entry into force, said yes. Article 10(1) says the opposite: the MGR and digital sequence information provisions "apply only to resources collected and generated after the entry into force of this Agreement for each Party", a position most parties separately confirmed by formal non-retroactivity declarations. Sonnet 4.6 went further, writing that "samples collected decades ago but first commercialised after the Agreement's entry into force would be subject to Part II requirements", a regime the treaty does not establish.
On a separate question, Opus 4.7 identified Article 30 as the source of the EIA screening threshold; Sonnet 4.6 made the same assignment. The screening-threshold provision is Article 27. Opus 4.7 attributed the Conference of the Parties' non-undermining duty to "Article 5 / Article 8"; the duty sits in Article 22(2), and the verbatim language the model paraphrased is the Article 22(2) text. Sonnet 4.6 attributed the digital sequence information benefit-sharing obligation to Article 15(5); the obligation sits in Article 14(1).
A marine policy adviser, deep-sea biotechnology lawyer, or pharmaceutical compliance officer relying on either output would cite the wrong treaty article in regulatory submissions and contractual representations, and would build a benefit-sharing or EIA-screening workflow around a temporal scope the Agreement explicitly excludes. That is the failure mode these findings document.
This paper presents findings from RegLeg's hallucination research on the BBNJ Agreement (2023), administered by the United Nations Treaty Collection. Two models, Claude Opus 4.7 and Claude Sonnet 4.6, both with web search active, were tested across six findings covering Articles 10(1), 14(1), 22(2), and 27 of the Agreement. The dominant patterns are article-level misattribution (both models named Article 30 for the EIA screening threshold which sits at Article 27 ; Opus 4.7 placed the Conference of the Parties non-undermining duty at Article 5 or Article 8 when Article 22(2) governs ; Sonnet 4.6 placed the DSI benefit-sharing duty at Article 15(5) when Article 14(1) governs) and inversion of the Article 10(1) non-retroactivity default (both models). The convergence of error patterns across the two model families points to a shared upstream artefact rather than independent reasoning failures. The Agreement entered into force on 17 January 2026; practitioners across legal, biotechnology, pharmaceutical, shipping, and energy sectors will rely on model output for early scoping, which makes article-level precision and temporal-scope fidelity load-bearing failure modes.
This is the consolidated view of findings. Click 'see details →' on any item for the full details for each finding.
This finding implicates article-level provision mapping in the training data. Claude Opus 4.7 with web search paraphrased the EIA screening test correctly (an activity likely to have more than a minor or transitory effect) but anchored that test to Article 30 of the Agreement and identified Article 31 as the screening step. The verbatim screening-threshold provision sits at Article 27 of Part IV. The error is a clean article-number reassignment that survives substantive review and surfaces only when the citation is checked against the deposited treaty text.
A structured article-by-article provision map for newly-in-force multilateral instruments, applied to corpus ingestion, would address this class of error directly.
see details →This finding implicates training-data weighting between pre-adoption negotiating commentary and post-adoption primary text. Claude Opus 4.7 with web search stated that the marine genetic resource and digital sequence information provisions of Article 10(1) extend to resources collected before entry into force, with a written opt-out, when Article 10(1) provides the opposite. Pre-adoption legal commentary describing earlier draft proposals that did include a retroactive default is well-indexed and voluminous; the final adopted text that reversed the default is narrower in footprint. The fix is corpus-level: weight post-adoption primary text above pre-adoption commentary, or label pre-adoption commentary as draft-stage.
see details →This finding implicates article-level precision in the training data. Sonnet 4.6 with web search correctly identified that digital sequence information derived from high-seas marine genetic resources is in scope of the BBNJ Agreement's benefit-sharing framework, but placed the obligation at Article 15(5). The duty actually sits at Article 14(1). The error is consistent with a model that has learned the topical summary (DSI is covered) without the article-level mapping practitioners need. Structured article-map extraction for this instrument would address this finding and the matched non-undermining-duty misattribution in the Opus 4.7 response.
see details →Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.