Skip to main content
UK AI Safety Institute versus the EU AI Office: turf battle or uneasy complement?
Deep Dive
· 9 min read

UK AI Safety Institute versus the EU AI Office: turf battle or uneasy complement?

The UK's AI Safety Institute runs voluntary frontier evaluations while the EU AI Office enforces mandatory compliance. On the surface they look complementary. Look closer and you find genuine contradictions, particularly over open-weights models, that will force global labs to choose which regime they anchor against.

Two regulatory architectures are now competing, quietly but consequentially, for the right to define what safe frontier AI looks like in Europe, and the frontier labs are watching both with very careful eyes.

The UK AI Safety Institute, established in November 2023 under then-Science Secretary Michelle Donelan and housed within the Department for Science, Innovation and Technology, operates through a voluntary model. Labs submit their frontier systems for pre-deployment evaluation; the Institute publishes findings; no legal sanction follows non-cooperation. The EU AI Office, formally stood up under the AI Act in early 2024 and based in Brussels, operates under an entirely different logic. For general-purpose AI models that clear the hundred-million-parameter threshold for systemic risk, compliance is not optional. Assessments, incident reporting, and ongoing obligations apply whether a company wants them or not.

Advertisement

On paper these two bodies occupy different lanes. In practice, the lanes are merging at speed.

"The open-source question represents a hard limit on what the current bilateral framework between the UK and EU can achieve in frontier AI governance."
Centre for the Governance of AI commentary, 2024

Where they actually overlap

Start with the technical substance. Both institutions are, at their core, trying to answer the same question: can a given model cause serious harm at scale, and if so, under what conditions? The UK AI Safety Institute's 2025 annual review described a portfolio of evaluations covering dangerous capability uplift in chemistry and biology, cyber-offensive capability, and what the Institute calls "scheming" behaviours. The EU AI Office's published workplan for 2026 lists almost identical priority domains: CBRN uplift, autonomous cyber operations, and systemic deception.

This convergence is not accidental. Staff from both bodies have participated in joint working groups on evaluation methodology, and the Centre for the Governance of AI, the Oxford-based research organisation that has advised both institutions, has publicly called for interoperability between their technical frameworks. There is also a memorandum of understanding between the UK and the European Commission signed in 2024 that, in principle, allows for information sharing on safety testing outcomes.

The overlap extends to the labs themselves. Anthropic, Google DeepMind, and OpenAI have all engaged with both bodies. Google DeepMind, headquartered in London, has a particularly visible dual relationship: it cooperates with UK AISI evaluations and simultaneously navigates the EU AI Office's Code of Practice for general-purpose AI, which entered its drafting phase in 2024 with DeepMind among the signatories.

Close-up editorial photograph of a researcher's hands on a keyboard, a terminal window visible on one monitor and a dense technical evaluation report on a second monitor, the setting clearly a Europea

Where they actively contradict each other

The sharpest contradiction concerns open-weights models, and it is not a detail. It is a fault line.

The EU AI Act's systemic risk provisions apply to models above a defined compute threshold. The European Commission's initial implementing regulations set that threshold at ten to the power of twenty-five floating-point operations. Meta's Llama series and Mistral AI's flagship models sit in proximity to that threshold or may cross it with future releases. The EU AI Office has signalled that open-weights releases complicate its enforcement model significantly: once weights are public, post-deployment controls are essentially impossible to impose on the developer, and the Office's tools were designed around closed-API deployment.

Mistral AI, the Paris-based lab and one of Europe's most prominent frontier developers, has been consistently vocal in opposing the extension of systemic-risk obligations to open-weights releases. The company has argued, with some force, that applying the same compliance burden to open models as to closed API deployments misunderstands the technical reality and would discriminate against European open-source development relative to US incumbents who already dominate closed APIs.

The UK AI Safety Institute has taken no comparable position on open weights as a matter of formal policy, but its voluntary model implicitly sidesteps the question: you cannot compel an open-weights lab to submit weights for evaluation if there is no legal mechanism to do so, and AISI has not tried. Its evaluations to date have focused on systems where the developer retains a deployment chokepoint, meaning closed or gated APIs.

The result is a genuine regulatory gap. Open-weights models above the systemic risk threshold face mandatory EU obligations that are, in practice, difficult to enforce, while falling outside the practical scope of voluntary UK evaluations. Neither body has yet resolved this. The Centre for the Governance of AI flagged precisely this gap in commentary published in late 2024, noting that the open-source question represents a "hard limit" on what the current bilateral framework can achieve.

Editorial photograph of a wide-angle view inside a contemporary European tech company open-plan office, engineers gathered around a whiteboard covered in model architecture diagrams, warm overhead lig

Methodology: voluntary depth versus mandatory breadth

Set aside open weights for a moment and the methodological divergence is still significant. The UK AI Safety Institute's approach is technically deep and developer-facing. Evaluators get substantial access: in some cases red-team access to model internals, not just API behaviour. The Institute's published findings on pre-deployment capability evaluations have been more granular than anything the EU AI Office has produced so far, largely because voluntary cooperation from labs gives AISI more to work with.

The EU AI Office's mandatory assessments, by contrast, are legally grounded but technically shallower at this stage, partly because the Office is newer, partly because its staffing reflects legal and policy expertise more than pure ML engineering, and partly because the Code of Practice for general-purpose AI is still being negotiated rather than enforced. The Office's 2026 workplan acknowledges the need to build out technical capacity and has signalled partnerships with national scientific bodies, including Germany's DFKI, the German Research Centre for Artificial Intelligence, to strengthen evaluation infrastructure.

There is also a transparency asymmetry. AISI publishes evaluation findings, sometimes in considerable technical detail. The EU AI Office's compliance assessments are tied to confidentiality obligations under the AI Act, meaning systemic-risk findings may never be fully public. For the AI governance research community, and for developers trying to benchmark against best practice, this creates an odd situation where the voluntary, non-binding UK regime produces more publicly visible technical knowledge than the mandatory EU regime.

## By The Numbers

The institutional footprint, funding trajectory, and scope of each body tell their own story about ambition and resource. The numbers below capture the scale of the divergence and the pace at which both institutions are attempting to close capability gaps.

Which model will the frontier labs anchor against?

This is the question that actually matters for the trajectory of AI governance in Europe, and the honest answer is: they will anchor against both, strategically, for different purposes, and that strategic split will eventually force a political choice.

For pre-deployment capability evaluation, the voluntary AISI model currently offers more: more access, more technical depth, and publication of findings that can serve as safety credentials in public discourse. A lab that cooperates with AISI and receives a clean evaluation gains reputational cover in the UK market and, arguably, in international markets that lack their own evaluation infrastructure.

For market access to the EU's 450 million consumers, the AI Office is inescapable. No lab deploying at scale in Europe can afford to treat GPAI systemic-risk obligations as optional, whatever the difficulties of enforcement. The Code of Practice, once finalised, will set behavioural norms that the largest players helped draft and that smaller players will have to follow. Google DeepMind, Meta, and Mistral AI are all in the room as the Code is written; their participation is itself a form of anchoring.

The more plausible near-term scenario is not one regime winning but a de facto division of labour that nobody formally agreed to: AISI as the technical vanguard that produces the methodology, the EU AI Office as the legal enforcement layer that scales it. That would be a sensible outcome. It would also depend on political will in both London and Brussels to actually coordinate rather than compete for credit, which is not a safe assumption.

The UK government's post-election positioning under the current administration has emphasised pro-innovation AI policy and close alignment with the US AI Safety Institute, its closest counterpart. If that bilateral relationship with Washington strengthens faster than the UK-EU relationship, AISI could drift toward becoming a transatlantic technical standards body rather than a European one. That would leave the EU AI Office as the dominant, if imperfect, reference point for European frontier governance by default.

Mistral AI's ongoing resistance to systemic-risk classification for open models adds another variable. If the European Commission adjusts the compute threshold or creates a separate open-weights track, as some within the AI Office have reportedly discussed, the regulatory terrain shifts again. A lighter-touch EU regime for open weights combined with mandatory closed-API assessments would look much more like the UK's voluntary model than the current AI Act text implies, narrowing the contradiction considerably.

For now, the smart money says frontier labs will run AISI evaluations because they want the technical credibility and the bilateral access it provides, and comply with the EU AI Office because they have to. That is not a stable equilibrium. One regulatory misstep, one high-profile safety incident that one body handled and the other missed, will determine which institution the industry actually trusts.

THE AI IN EUROPE VIEW

The framing of AISI versus the EU AI Office as a turf battle is both accurate and misleading. Accurate, because two institutions with overlapping mandates and limited coordination budgets will inevitably compete for technical authority and political recognition. Misleading, because the real problem is not competition but inadequacy: neither body, individually, is currently equipped to govern frontier AI at the scale the moment requires.

The UK's voluntary model is technically credible precisely because it is voluntary; labs cooperate because they want to, not because they must. That is a fragile foundation. The first major lab that decides the reputational calculus no longer favours cooperation will expose the model's limits overnight. The EU's mandatory regime has legal teeth but organisational adolescence. The AI Office is building technical capacity in real time while simultaneously trying to enforce an Act whose implementing regulations are still being written.

What Europe actually needs is neither institution winning, but genuine joint infrastructure: shared evaluation methodology, reciprocal findings access, and a common position on open weights that is technically coherent rather than politically convenient. Mistral AI and Google DeepMind, as the continent's two most consequential frontier developers, should be pushing hard for that outcome. So should the Centre for the Governance of AI and every national regulator with a seat at either table. The alternative is a decade of jurisdictional friction while the genuinely hard safety questions go unanswered.

Updates

  • published_at reshuffled 2026-04-29 to spread distribution per editorial directive
  • Byline migrated from "Sofia Romano" (sofia-romano) to Intelligence Desk per editorial integrity policy.
AI Terms in This Article 6 terms
API

Application Programming Interface, a way for software to talk to other software.

benchmark

A standardized test used to compare AI model performance.

at scale

Applied broadly, to a large number of users or use cases.

AI governance

The policies, standards, and oversight structures for managing AI systems.

AI safety

Research focused on ensuring AI systems behave as intended without causing harm.

alignment

Ensuring AI systems pursue goals that match human intentions and values.

Advertisement

Comments

Sign in to join the conversation. Be civil, be specific, link your sources.

No comments yet. Start the conversation.
Sign in to comment