---
type: "PublicBriefing"
title: "BBNJ High Seas Biodiversity Agreement: Model Hallucination Findings"
slug: "bbnj-high-seas-biodiversity-agreement-2023-ai-labs"
regulation_slug: "BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023"
body_id: "UNTC-INT-001"
jurisdiction_code: "INT"
j_level: "J1"
regulator_short_code: "UNTC"
methodology_version: "v2.3"
news_featured_at: "2026-06-13T20:00:01.876013+00:00"
published_at: "2026-06-07T08:38:30.743382+00:00"
generated_at: "2026-06-11T01:47:18.098145+00:00"
license: "CC-BY-4.0"
resource: "https://reglegbrief.com/briefings/bbnj-high-seas-biodiversity-agreement-2023-ai-labs/"
timestamp: "2026-06-16T00:00:00+00:00"
---

# BBNJ High Seas Biodiversity Agreement: Model Hallucination Findings

- **Regulation.** [`BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023`](/okf/regulations/BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023.md) — Agreement under the United Nations Convention on the Law of the Sea on the Conservation and Sustainable Use of Marine Biological Diversity of Areas beyond National Jurisdiction (BBNJ Agreement / High Seas Treaty)
- **Regulator.** [`UNTC-INT-001`](/okf/bodies/UNTC-INT-001.md)

## News lead

Two frontier AI models running with web search enabled, both tested by the RLB Specialist Panel, produced confidently wrong reconstructions of the 2023 BBNJ Agreement, the UN treaty governing biodiversity in areas beyond national jurisdiction that entered into force on 17 January 2026. The RegLeg Brief Specialist Panel tested both models across the Agreement's environmental impact assessment threshold, its marine genetic resources benefit-sharing framework, and the non-undermining clause that bounds the Conference of the Parties, and documents six findings in which the models cited the wrong article number for the rule they were stating, or inverted the Agreement's express temporal scope.

Both Opus 4.7 and Sonnet 4.6, asked whether the Agreement's marine genetic resources obligations reach back to specimens collected before entry into force, said yes. Article 10(1) says the opposite: the MGR and digital sequence information provisions "apply only to resources collected and generated after the entry into force of this Agreement for each Party", a position most parties separately confirmed by formal non-retroactivity declarations. Sonnet 4.6 went further, writing that "samples collected decades ago but first commercialised after the Agreement's entry into force would be subject to Part II requirements", a regime the treaty does not establish.

On a separate question, Opus 4.7 identified Article 30 as the source of the EIA screening threshold; Sonnet 4.6 made the same assignment. The screening-threshold provision is Article 27. Opus 4.7 attributed the Conference of the Parties' non-undermining duty to "Article 5 / Article 8"; the duty sits in Article 22(2), and the verbatim language the model paraphrased is the Article 22(2) text. Sonnet 4.6 attributed the digital sequence information benefit-sharing obligation to Article 15(5); the obligation sits in Article 14(1).

A marine policy adviser, deep-sea biotechnology lawyer, or pharmaceutical compliance officer relying on either output would cite the wrong treaty article in regulatory submissions and contractual representations, and would build a benefit-sharing or EIA-screening workflow around a temporal scope the Agreement explicitly excludes. That is the failure mode these findings document.


## Briefing

# Frontier AI models misnumbered BBNJ treaty articles and inverted the non-retroactivity rule, regulatory-research panel finds

## *Two frontier AI models with web search enabled, both said the BBNJ Agreement's marine genetic resources regime reaches back to pre-entry-into-force collections, the opposite of what Article 10(1) provides, and assigned operative provisions to the wrong treaty article numbers. The RegLeg Brief Specialist Panel calls the class "Treaty Article Drift" and says it points to a generation pathway that locks onto a substantively coherent answer while reaching for an article number from the model's prior rather than from the treaty's actual structure.*

**SINGAPORE, June 13, 2026.** Two frontier artificial-intelligence models generated structurally confident but textually wrong reconstructions of the Agreement under the United Nations Convention on the Law of the Sea on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction, known as the BBNJ Agreement, according to a white paper released today by RegLeg Brief, a regulatory-research outfit operated by Singapore-incorporated Verdus Technologies Pte. Ltd.

The six findings, published with immutable RLB Citation IDs including `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q003-Opus47`, `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q003-Sonnet46`, `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q001-Opus47`, `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q001-Sonnet46`, `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q004-Sonnet46`, and `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q005-Opus47`, span the Agreement's environmental impact assessment screening provision (Part IV, Article 27), its marine genetic resources and digital sequence information framework (Part II, Articles 10 and 14), and the non-undermining clause that bounds the Conference of the Parties (Part III, Article 22(2)). Both Anthropic's Claude Opus 4.7 and Claude Sonnet 4.6 were tested with web search active, mirroring how marine policy advisers, deep-sea biotechnology lawyers, intergovernmental secretariat staff, and pharmaceutical compliance teams actually use the models on a treaty that entered into force on 17 January 2026.

## The Verbatim Rule: Article 10(1), Article 14(1), Article 22(2), Article 27

The BBNJ Agreement is a structured instrument, divided into Parts and articles, with each operative obligation pinned to a specific provision number that downstream practitioners are expected to cite. Four of those provisions sit at the heart of the findings:

- **Article 10(1) (Part II, MGR temporal scope)**: the marine genetic resources and digital sequence information provisions "apply only to resources collected and generated after the entry into force of this Agreement for each Party." Most parties further confirmed this by formal non-retroactivity declarations on deposit.
- **Article 14(1) (Part II, DSI benefit-sharing)**: benefits arising from activities related to MGRs in areas beyond national jurisdiction "and their digital sequence information shall be shared in a fair and equitable manner."
- **Article 22(2) (Part III, non-undermining duty)**: the Conference of the Parties "shall respect the competences of, and not undermine, relevant legal instruments and frameworks and relevant global, regional, subregional and sectoral bodies."
- **Article 27 (Part IV, EIA screening threshold)**: the qualitative threshold ("more than a minor or transitory effect") that triggers an environmental impact assessment obligation for planned high-seas activities.

Each of these provisions states a single position pinned to a specific article number. The substantive content and the article number are jointly load-bearing: a citation that gets the content right but the article number wrong is still a defective regulatory citation, because downstream supervisors, panels, and counterparties will check the article number first.

## Claude Opus 4.7: Inverted Article 10(1) and Misnumbered Article 22(2) and Article 27

Asked whether the BBNJ Agreement's marine genetic resources framework under Article 10(1) applies prospectively or retroactively, Claude Opus 4.7 (with web search on) wrote, verbatim:

> "Under Article 10(1), the MGR/DSI provisions apply to 'utilization' of MGR and DSI of ABNJ that were collected or generated *before* entry into force, not just after."

**The structural error.** Article 10(1)'s actual text limits the regime to resources "collected and generated after the entry into force of this Agreement for each Party", and the non-retroactivity position was separately confirmed by most parties in formal declarations on deposit. The model inverted the temporal default. A deep-sea biotechnology firm or research institution treating the output as authoritative would treat legacy sample collections, decades of pre-2026 marine specimens held in university and biotech freezers, as falling inside Part II's notification, access, and benefit-sharing obligations the Agreement does not extend to them.

Asked on a separate question which article sets the EIA screening threshold for planned high-seas activities, Opus 4.7 identified **Article 30** and stated that screening sits at **Article 31**. The verbatim screening-threshold provision is **Article 27** (Part IV), and the "more than a minor or transitory effect" qualitative test the model paraphrased is the Article 27 language, not Article 30. The substrate document the panel cites identifies the article number directly.

On the non-undermining duty, Opus 4.7 wrote that **"Article 5 / Article 8"** of the Agreement "explicitly require it not to undermine relevant legal instruments, frameworks and competent global, regional, subregional and sectoral bodies." The verbatim non-undermining language the model paraphrased ("shall respect the competences of, and not undermine, relevant legal instruments and frameworks and relevant global, regional, subregional and sectoral bodies") is the text of **Article 22(2)**, which bounds the Conference of the Parties specifically. The model's "Article 5 / Article 8" citation does not anchor to that text in the Agreement.

The failure modes are classified as `misstated_rule` (Article 10(1) inversion) and `misattributed` (Article 27 and Article 22(2) misnumbering) against substrate documents `p_01_ACT_Part_III___Article_22_2____non_undermini_text-bbnj-agreement.html` and `p_01_ACT_Part_IV_EIA___Article_27___screening_thr_text-bbnj-agreement.html`.

## Claude Sonnet 4.6: Inverted Article 10(1) and Misnumbered Article 27 and Article 14(1)

On the same retroactivity question, Sonnet 4.6 (with web search on) wrote:

> "the benefit-sharing and notification obligations apply to the utilisation of MGRs, and associated DSI, collected or generated before the Agreement entered into force (17 January 2026). In practice this means... samples collected decades ago but first commercialised after the Agreement's entry into force would be subject to Part II requirements."

**The fabrication.** The Agreement contains no such "commercialised-after" trigger. Article 10(1) is binary: the regime applies to resources collected and generated after entry into force, and not to those collected before. The model constructed a "first-commercialisation" carve-in that the treaty does not contain, and grounded it in a confident reading of an entry-into-force date the treaty does establish. A pharmaceutical compliance officer or technology-transfer lawyer building a Part II notification workflow on this output would route pre-2026 sample portfolios through a regulatory regime the BBNJ Agreement does not extend to them, and would issue contractual representations to counterparties on that basis.

Asked which article establishes the EIA screening threshold, Sonnet 4.6 identified **"Part IV, Article 30"** and quoted the "more than a minor or transitory effect" language. As with Opus 4.7, the screening-threshold provision is Article 27. The qualitative test the model quoted verbatim is correct; the article number it pinned that test to is not.

Asked whether the Agreement's benefit-sharing framework extends to digital sequence information derived from marine organisms in international waters, and which provision governs, Sonnet 4.6 attributed the obligation to **"Part II (Article 15(5) and related provisions)"**. The DSI benefit-sharing duty sits in **Article 14(1)**, which is explicit that benefits "and their digital sequence information shall be shared in a fair and equitable manner." Article 15 governs a different aspect of the Part II regime.

The failure modes are classified as `inference_drift` (Article 10(1) commercialisation carve-in) and `misattributed` (Article 27 and Article 14(1) misnumbering) against substrate documents `p_01_ACT_Part_III___Article_22_2____non_undermini_text-bbnj-agreement.html`, `p_01_ACT_Part_IV_EIA___Article_27___screening_thr_text-bbnj-agreement.html`, and `p_01_ACT_Part_II___DSI_included_in_BBNJ_vs__exclu_text-bbnj-agreement.html`.

## The Pattern: Treaty Article Drift

The BBNJ findings sit inside a failure class the RegLeg Brief Specialist Panel labels **Treaty Article Drift**: frontier models locking onto a substantively coherent paraphrase of a treaty provision while reaching for an article number from the model's prior, rather than from the treaty's actual structure, and simultaneously inverting temporal-scope defaults on provisions where the treaty's express position is non-retroactive.

Across the six findings, the drift takes three shapes:

- **Temporal-scope inversion** (both models on Article 10(1)): the express non-retroactivity rule for marine genetic resources is rewritten as a retroactive regime, in Sonnet 4.6's case via a fabricated "first-commercialisation" trigger that does not exist in the Agreement.
- **Article-number reassignment** (both models on Article 27, Opus 4.7 on Article 22(2), Sonnet 4.6 on Article 14(1)): the verbatim language of the operative provision is paraphrased correctly, but pinned to the wrong article number. The article-number error survives even when the model has the substrate text effectively memorised.
- **Convergent error across model families** (Article 27 and Article 10(1)): both Opus 4.7 and Sonnet 4.6, asked independent versions of the same question, arrived at the same wrong answer. The convergence points to a shared upstream artefact, plausibly third-party summary content that uses different article numbering than the in-force text.

The common substrate is a generation pathway in which the model's prior about how a UN biodiversity treaty's articles are typically arranged overrides the BBNJ Agreement's actual numbering, while the verbatim regulator text is either not retrieved or retrieved but not allowed to override the article-number prior.

## Why the Failure Is Invisible at Runtime

All six outputs shared the same surface characteristics: confident article-level citations, substantively plausible paraphrases of the treaty's provisions, defined-term usage ("MGR", "DSI", "ABNJ", "Part II", "Part IV") that tracks the Agreement's vocabulary, and no hedging or caveats. The failure is not recoverable by the user in real time because the output reads like a competent treaty-law brief, the kind of paragraph an international legal adviser would expect to receive from a senior associate or secretariat staffer. Validation against the Agreement's primary text would only happen if the reader already knew which article number contained which subject matter, which is the question they asked the model in the first place.

The population most exposed includes marine policy advisers in foreign ministries and intergovernmental secretariats; deep-sea biotechnology and bioprospecting compliance teams at pharmaceutical and industrial-biotech firms; technology-transfer lawyers drafting MGR access and benefit-sharing contracts; environmental impact assessment consultants scoping high-seas activities for shipping, cable-laying, marine carbon dioxide removal, and seabed research operators; and academic researchers preparing regulatory submissions under domestic implementing legislation. All of these workflows route through AI-assisted research on tight timelines, and almost all of them generate written deliverables, contracts, EIA scoping reports, supervisor-facing self-assessments, that downstream readers treat as authoritative without re-checking the cited article number against the deposited treaty text.

## What AI Labs Can Do: Suggested Probes (Open-Access)

The RegLeg Brief Specialist Panel documents a series of red-team probe designs that any AI lab or alignment team can run against their own models with no commercial engagement required:

1. **Article-number anchoring probes on recently-in-force treaties**: for each operative provision of a treaty that entered into force within the past 24 months, ask the model to identify the article number that governs a stated obligation. Compare against the deposited text. Treaties whose articles were renumbered between draft and final text are particularly diagnostic.
2. **Temporal-scope default probes**: for provisions that turn on an "after entry into force" temporal default (BBNJ Article 10(1) is the canonical example), test whether the model preserves the non-retroactivity rule or constructs a retroactive carry-in via a fabricated trigger such as "first commercialisation".
3. **Convergent-error detection across model families**: probe the same treaty question against two or more model families. Convergent wrong answers (both Opus 4.7 and Sonnet 4.6 saying "Article 30" for the BBNJ EIA screening provision when the answer is Article 27) point to a shared upstream artefact in the training corpus and are a tractable eval signal.
4. **Part-vs-article preservation tests**: for treaties divided into Parts, test whether the model correctly attributes a provision to the right Part (Part II vs Part III vs Part IV) before testing the article number. Part-level errors and article-level errors have different signatures.
5. **Verbatim-vs-prior tension probes**: where the model paraphrases the verbatim regulator text correctly but cites the wrong article, re-probe with a question that asks the model to quote the verbatim text from the cited article. A mismatch between the paraphrase and the verbatim text retrieved from the cited article number is diagnostic of article-number drift independent of substantive understanding.

## Open-Access Risk Mitigation: A Public Good for AI Labs, Regulators, and the Compliance Community

RegLeg Brief operates as a completely ungated, open-access public resource. The white papers, per-finding cards, regulator verbatim excerpts, RLB Citation IDs, methodology notes and supporting data logs are all published without paywalls, registration walls, or data-licensing fees. By documenting original regulatory research without financial or distribution barriers, the platform ensures that:

- **AI engineering and alignment teams** can immediately ingest the verbatim model outputs and matched regulator-text excerpts to identify, reproduce, and address the structural failure modes the Specialist Panel documents.
- **Regulatory agencies, treaty secretariats, and supervisors** can use the standardised RLB Citation IDs to benchmark AI-driven compliance risks surfacing in their own jurisdictions, with full traceability back to the original model output and the depositary's primary text.
- **The global compliance, legal, and policy community** can freely adapt the Specialist Panel's screening methodologies to safeguard internal data pipelines and AI-assisted regulatory workflows.

Because RegLeg Brief conducts its own original research and adversarial analysis against frontier AI models, the detail in each published finding is precise enough to enable AI labs to take targeted hallucination-mitigation measures. Directions an AI lab might consider, drawing on the published findings, include:

- **Targeted correction pairs**: depositary primary text matched to the wrong-but-plausible reconstructions documented in each finding, suitable for direct ingestion into a training-data pipeline.
- **Quarterly embedded eval cycles**: continuous evaluation against a defined treaty and multi-regulator portfolio, with regression monitoring on previously documented failure modes to track whether fine-tuning or RLHF adjustments are moving the needle on Treaty Article Drift.
- **Pre-release evaluation cycles**: sandboxed probes against catalogued failure shapes for capability releases touching international environmental, biodiversity, or law-of-the-sea content, before the release reaches customers.
- **Post-release model enhancements**: treaty-specific failure-surface monitoring as new instruments enter into force and join a model's live deployment footprint.

AI labs and model developers named in any published finding have an unconditional [right of reply](https://reglegbrief.com/right-of-reply/); the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.

---

## Right of Reply

These findings and associated work have been put up in public with a view of the greater good for the development of a safer AI ecosystem. Any party reading this or any finding on reglegbrief.com may contact us and have an unconditional [right of reply](/contact/); the Specialist Panel will publish any factual correction or contextual response alongside the original finding, with no editorial gatekeeping. Researchers, regulators, and compliance teams with questions on methodology or specific findings can reach the Specialist Panel via the same channel.
## Source & Methodology Standards

RegLeg Brief is operated by Verdus Technologies Pte. Ltd. (UEN 201616982R), incorporated in Singapore. The RLB Specialist Panel, with an aggregate of over 60 years of public-policy and industry experience, documents only confirmed hallucination findings, under a methodology that requires a verbatim regulator excerpt for every documented claim. All findings, citation IDs, model outputs, regulator excerpts, and methodology notes are open-access.

---

**Primary source verified:** UN BBNJ Agreement (2023), Agreement on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction · Substrate documents: `p_01_ACT_Part_III___Article_22_2____non_undermini_text-bbnj-agreement.html`, `p_01_ACT_Part_II___DSI_included_in_BBNJ_vs__exclu_text-bbnj-agreement.html`, `p_01_ACT_Part_IV_EIA___Article_27___screening_thr_text-bbnj-agreement.html` · UN portal: documents.un.org

**Citation IDs referenced:**

- `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q001-Opus47`
- `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q001-Sonnet46`
- `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q003-Opus47`
- `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q003-Sonnet46`
- `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q004-Sonnet46`
- `RLB-H-INT-UNTC-BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023-Q005-Opus47`


## Related concepts

- Whitepaper: [bbnj-high-seas-biodiversity-agreement-2023-ai-labs](/okf/whitepapers/bbnj-high-seas-biodiversity-agreement-2023-ai-labs.md)
- Regulation: [BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023](/okf/regulations/BBNJ-HIGH-SEAS-BIODIVERSITY-AGREEMENT-2023.md)
- Methodology: [v2.3](/okf/methodology.md)