AI Labs · Last updated 7 Jun 2026 · methodology vv2.3 · Hallucination Register

BBNJ High Seas Biodiversity Agreement: Model Hallucination Findings

This paper presents findings from RegLeg's hallucination research on the Agreement under the United Nations Convention on the Law of the Sea on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction (2023), administered by the United Nations Treaty Collection (Office of Legal Affairs, Treaty Section).

Two models were tested, Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search, across questions covering the Agreement's environmental impact assessment screening threshold (Article 27, Part IV), its marine genetic resource and digital sequence information framework (Articles 10(1) and 14(1)), and the Conference of the Parties non-undermining duty (Article 22(2), Part III). Across six findings, the dominant error patterns are: article-level misattribution where the substantive paraphrase is correct but the cited article number is wrong; and inversion of the Agreement's express non-retroactivity rule for marine genetic resource obligations.

Both models converged on the same wrong article number for the EIA screening threshold (Article 30 rather than Article 27) and on the same direction of inversion for Article 10(1) retroactivity, pointing to a shared upstream artefact rather than independent reasoning failures.

When this affects AI Labs

The BBNJ Agreement entered into force on 17 January 2026 and is immediately relevant to a widening set of users who will query frontier models for legal, compliance, and strategic guidance: biotech and pharmaceutical companies with marine research programmes, shipping operators navigating new area-based management designations, environmental law practices, multilateral-development-bank teams financing high-seas infrastructure, and government delegations preparing for the Agreement's first Conference of the Parties. Each of these users is likely to treat model output on the Agreement's operative provisions as a starting point for real decisions.

When a model confidently inverts the Agreement's retroactivity default or places the EIA screening threshold at the wrong article, the downstream consequence is mispriced compliance risk for practitioners who act on that output.

For an AI lab, the exposure runs in two directions. First, if a corporate compliance team structures a benefit-sharing strategy around a model's wrong statement of Article 10(1) scope and later faces a regulatory challenge, the lab's terms of service will not insulate it from reputational fallout or, in some jurisdictions, from civil claims framing the model output as negligent advice.

Second, the errors observed here, both the article-level misattribution pattern and the convergent retroactivity inversion, represent a class of failure that red-team and eval coverage typically under-indexes on: newly-in-force multilateral instruments where the primary text is accessible but secondary commentary is voluminous, uneven in quality, and well-indexed by web search.

This Agreement's structure makes it particularly susceptible. It is a dense 76-article instrument with carefully negotiated qualifications and opt-out mechanisms that are easy to paraphrase away. The benefit-sharing framework for marine genetic resources distinguishes between collection activity, utilisation, and digital sequence information across separate articles with non-obvious cross-references. The Agreement also entered into force recently enough that the model training corpus is likely to include extensive pre-ratification negotiating commentary and early academic interpretation written before final text was agreed, making content-drift between draft text as described and final text as adopted a live failure surface.

Aggregate impact

Model	Configuration	Findings	Dominant failure pattern
Claude Opus 4.7	Web search enabled	3	Article-level misattribution (Article 27 for the EIA threshold misnumbered as Article 30 ; Article 22(2) for the Conference of the Parties non-undermining duty misnumbered as Article 5 or Article 8) plus inversion of the Article 10(1) non-retroactivity default.
Claude Sonnet 4.6	Web search enabled	3	Article-level misattribution (Article 27 misnumbered as Article 30 ; Article 14(1) for the DSI benefit-sharing duty misnumbered as Article 15(5)) plus inference-drift addition of a first-commercialisation trigger to Article 10(1) that does not exist in the treaty.

Both Opus 4.7 and Sonnet 4.6 with web search produced findings of material concern on the same two underlying questions: which article sets the EIA screening threshold (Article 27, Part IV), and what the temporal scope of the marine genetic resource regime is under Article 10(1). On the EIA threshold question, both models named Article 30. On the retroactivity question, both models described a retroactive-by-default regime with an opt-out, when Article 10(1) establishes the opposite.

The convergence is the central signal in the findings. Two model families, queried independently with web search active, arrived at the same wrong article number and the same wrong retroactivity default. This is very unlikely to be coincidental reasoning failure. The more likely explanation is a shared upstream artefact in the training corpus, namely secondary commentary that uses different article numbering than the in-force text (perhaps tracking a draft version of the Agreement) and pre-adoption commentary describing the retroactivity question as actively contested.

An alignment team investigating these findings should focus on the relationship between the training corpus and pre-adoption negotiating documents, rather than treating each error as a model-specific calibration problem.

Findings

6 findings in this case study. Click any to see its full evidence card.

Finding on 'Q001 Probe' for Claude Opus 4.7 with web search ON see this finding →
Claude Sonnet 4.6 with web search
Finding on 'Q003 Probe' for Claude Opus 4.7 with web search ON see this finding →
Finding on 'Q004 Probe' for Claude Sonnet 4.6 with web search ON see this finding →
Claude Sonnet 4.6 with web search
Claude Opus 4.7 with web search

What your team should do

Implications for your training data

The most material training-data finding is the retroactivity inversion: both Claude Opus 4.7 with web search and Claude Sonnet 4.6 with web search independently described the BBNJ Agreement's Article 10(1) default as retroactive with a prospective opt-out, when the Agreement is prospective by default with an optional retroactive extension. This error is almost certainly being driven by pre-adoption commentary, specifically legal blog posts, academic articles, and NGO briefings written during the 2022 to 2023 negotiating rounds that described earlier draft proposals which did include a retroactive default.

That commentary is well-indexed and voluminous; the final adopted text that reversed the default is narrower in footprint and more recent. The fix is not just including more post-adoption text. It is weighting the corpus to de-prioritise pre-adoption commentary that describes provisions as they were debated rather than as they were adopted, or explicitly labelling such commentary as draft-stage.

The article-attribution errors (Article 5/8 vs Article 22(2) for the non-undermining clause ; Article 15(5) vs Article 14(1) for digital sequence information benefit-sharing ; Article 30 vs Article 27 for the EIA screening threshold) suggest that structured extraction of the treaty's article-by-article provision map has not been applied to this instrument. The Agreement is organised in a way that does not match how legal commentary typically summarises it: secondary sources often group provisions by topic rather than by article, and models trained on those summaries inherit the topic grouping without the article-level precision practitioners need.

Definitions-table and article-map extraction for multilateral instruments should be a structured part of corpus ingestion, not an inference from free-text summaries.

Implications for your post-training logic

Both models with web search enabled consistently cited third-party law-firm summaries and academic journal abstracts rather than the primary treaty text at treaties.un.org. The UN Treaty Collection is machine-accessible and provides the full Agreement text in multiple languages. The retrieval ranker should weight queries that include treaty or agreement names heavily toward official UN portals over third-party summaries. Currently the ranker appears to treat a Global Policy Watch summary or a Tandfonline abstract as equally authoritative for a question about specific article text.

That default needs adjustment for treaty-law query patterns, particularly for newly-in-force multilateral instruments where draft-vs-adopted text divergence is a live failure surface.

The article-number convergence between Opus 4.7 and Sonnet 4.6 on the EIA screening provision (both naming Article 30 for what is Article 27) suggests that a confidence-gated re-check against the deposited treaty text for any cited article number would catch a substantial share of the documented errors. Currently the models present article citations with the same confidence whether the citation is retrieved or reconstructed; differentiating those two cases at generation time, and routing reconstructed citations through a verification step, is a tractable post-training adjustment.

Specific eval / red-team probes RegLeg suggests

Article-number anchoring on recently-in-force treaties: for each operative provision of a treaty that entered into force within the past 24 months, ask the model to identify the article number that governs a stated obligation. Compare against the deposited text. Treaties whose articles were renumbered between draft and final text are particularly diagnostic.
Temporal-scope default probes: for provisions that turn on an after-entry-into-force temporal default (BBNJ Article 10(1) is the canonical example), test whether the model preserves the non-retroactivity rule or constructs a retroactive carry-in via a fabricated trigger such as first commercialisation.
Convergent-error detection across model families: probe the same treaty question against two or more model families. Convergent wrong answers point to a shared upstream artefact in the training corpus and are a tractable eval signal.
Part-vs-article preservation tests: for treaties divided into Parts, test whether the model correctly attributes a provision to the right Part before testing the article number. Part-level errors and article-level errors have different signatures.
Verbatim-vs-prior tension probes: where the model paraphrases the verbatim regulator text correctly but cites the wrong article, re-probe with a question that asks the model to quote the verbatim text from the cited article. A mismatch between the paraphrase and the verbatim text retrieved from the cited article number is diagnostic of article-number drift independent of substantive understanding.

How RLB can help

RegLeg's research on the BBNJ Agreement is part of a broader programme covering multilateral instruments, financial regulators, and sector-specific regulatory bodies across multiple jurisdictions. For this Agreement specifically, the failure surfaces we have identified, draft-vs-adopted text divergence, article-attribution gaps, and retrieval-source ranking for treaty queries, are tractable problems that benefit from structured regulatory domain knowledge. We can work with your evals and post-training teams to close those gaps in a systematic way.

The primary partnership track we offer AI labs is licensed access to the full question bank under a mutual NDA. The questions have been designed to surface hallucination-prone areas in each instrument's operative provisions: they are not general-comprehension questions but targeted probes at the specific failure surfaces we have empirically identified. Paired with our regulatory specialists' annotation of correct answers against the primary treaty text, this gives your team a ready-made eval dataset for this regulation.

We can also generate synthetic correction pairs (question plus wrong model answer plus correct authoritative answer with article citation) derived directly from the regulator's text, for use in fine-tuning or RLHF datasets. These are built from the primary instrument, not from secondary commentary.

For labs that want ongoing coverage, we offer an embedded eval track: quarterly refresh of question banks and correction pairs as the regulatory landscape evolves (new depositary notifications, Conference of the Parties decisions, implementing agreements, and amendment records). The BBNJ Agreement will be a living instrument for years; its implementing rules, benefit-sharing mechanism, and area-based management designations will generate new compliance-relevant text on a regular cadence. Quarterly refresh means your models' eval coverage tracks the Agreement as it develops, not just the base text at adoption.

We are also available for direct red-team consultation on regulator-specific failure surfaces ahead of model releases.

← Back to summary Other AI Labs white papers →

Every finding on this page compares an AI subject's account of the rule against the regulator's verbatim text from the regulator's own portal. Both are linked. Each delta, its root causes, and impact analysis are documented and published with immutable Citation IDs.