Weights Without Substance: Why European Open-Source AI Will Shrink to a Label Within 18 Months

European open-source AI is heading for a quiet identity crisis, and the labs driving it know perfectly well what is happening. The promise that differentiated Paris-based Mistral AI and Heidelberg-based Aleph Alpha from their American closed-weight rivals was always partly a marketing position, but it was also, for a time, a genuine technical commitment. That commitment is now being eroded by a combination of commercial pressure, regulatory compliance costs, and a licensing drift that the wider industry has been reluctant to name plainly.

The trajectory is not difficult to read. Mistral, which launched in 2023 with models released under Apache 2.0 licences and was celebrated across the open-source community for doing so, shifted its approach with subsequent releases. Mistral Large, its most capable model, was never open-weight at all. Mixtral 8x22B arrived in April 2024 under a Mistral Research Licence that explicitly restricts commercial use beyond certain thresholds. The company frames this as tiering, not retreat, but the direction of travel is unambiguous: the frontier models stay closed, and the open weights belong to the second tier.

Aleph Alpha, whose Luminous series was positioned as a sovereignty-first, European-values alternative to GPT-4, has been even more circumspect about data transparency. The company discloses weights for research purposes under tightly controlled conditions, but training data provenance remains largely opaque. For a company that has built its pitch around trustworthy AI for European public sector clients, including partnerships with the German federal government, that opacity carries particular irony.

"The frontier models stay closed, and the open weights belong to the second tier. That is not a licensing strategy; it is a branding strategy wearing a licensing strategy's clothes."

AI in Europe editorial analysis

The structural reason for this drift is the EU AI Act, and specifically its provisions on general-purpose AI models, which came into force in August 2024. The Act creates a two-tier system for GPAI providers. Models with a training compute above 10^25 FLOPs are classified as posing systemic risk and face the most demanding obligations, including adversarial testing, incident reporting, and cooperation with the EU AI Office. But even below that threshold, all GPAI providers must maintain technical documentation, comply with copyright law, and publish summaries of training data.

That last requirement is the quiet killer for genuine openness. Publishing a meaningful summary of training data is not just a disclosure exercise; it is a legal liability exercise. If a lab publishes its weights and its training data simultaneously, it hands copyright claimants a direct roadmap. European publishers, collecting societies, and news organisations have been watching US litigation against OpenAI and Meta with close attention. Any European lab that releases full training data under a permissive licence is effectively inviting a lawsuit that its US competitors, with their far deeper legal war chests, are only now learning to manage.

The EU AI Office, which sits within the European Commission's DG CNECT and became operational in early 2024, has published clarifications on what the GPAI rules mean for open-source providers. The Office has acknowledged that open-source models benefit from a lighter-touch regime in some respects, but it has been careful not to exempt them from the copyright summary obligation. The practical result is that labs face a choice: invest heavily in data auditing and legal clearance to enable genuine transparency, or release weights only and call it open source.

A close-up, slightly abstract photograph of a printed software licence agreement, with passages highlighted in yellow marker and marginal handwritten notes in blue ink. Visible on the desk beside the

The Licensing Drift in Practice

By The Numbers

10^25 FLOPs

Systemic-risk threshold under the EU AI Act

Models trained above this compute threshold are classified as posing systemic risk under the EU AI Act's GPAI provisions, triggering the most demanding compliance obligations including adversarial testing and mandatory incident reporting to the EU AI Office.

Source

~6 billion euros

Mistral AI valuation at Series B (mid-2024)

Mistral AI completed a funding round at approximately 6 billion euros valuation in mid-2024, bringing investor expectations that sit in structural tension with fully permissive open-source licensing of frontier models.

Source

August 2024

EU AI Act entry into force

The EU AI Act entered into force in August 2024, with the GPAI provisions and their associated obligations for open-source providers becoming a live compliance consideration for European labs from that date.

Source

October 2024

OSI open-source AI definition published

The Open Source Initiative published its formal definition of open-source AI in October 2024, requiring not just model weights but also training data or a sufficiently detailed description to allow reproduction. No major European frontier model currently meets this standard.

Source

April 2024

Mixtral 8x22B release with restricted licence

Mistral released Mixtral 8x22B in April 2024 under a Mistral Research Licence that restricts commercial use beyond defined thresholds, marking a clear departure from the Apache 2.0 approach used for the original Mistral 7B release in 2023.

Source

Hugging Face, the model-hosting platform founded in New York but with substantial European operations and a significant portion of its research staff based in Paris, maintains one of the most granular licence trackers in the industry. Its model cards show a clear pattern across European submissions over the past 18 months: the proportion of models released under genuinely permissive licences such as Apache 2.0 or MIT has fallen, while custom research licences, non-commercial licences, and Responsible AI Licence variants have grown. This is not a Hugging Face problem; it reflects what labs are choosing to submit.

The Open Source Initiative, which published its formal definition of open-source AI in October 2024, requires that a qualifying system include not just weights but also the data used for training, or at minimum a description detailed enough to allow reproduction. By that standard, essentially no major European frontier model currently qualifies. Mistral's open releases come closest on the weights dimension, but training data disclosure remains incomplete. Aleph Alpha does not come close on either dimension for its commercial-grade models.

This matters because the European story about open source has been tied to a specific political argument: that European labs, unlike their American counterparts, can be trusted with open access because they operate under European data protection rules, answer to European regulators, and share European democratic values. If European open source narrows to weights-but-not-data, that argument weakens considerably. Weights alone do not tell a researcher or a regulator very much about what a model was trained on, what biases it may carry, or whether its outputs can be trusted in high-stakes public sector applications.

The scale of the shift is visible in the numbers. Licensing restrictions, regulatory compliance costs, and the gap between open-weight and genuinely open-source releases all point in the same direction, and the figures below put the structural pressure in concrete terms.

An overhead shot of a whiteboard in a European AI lab meeting room, covered in a hand-drawn diagram comparing open-weight versus open-data model release pipelines, with branching arrows and compliance

What the Labs Say, and What They Do

Mistral's public communications continue to emphasise openness as a core value. Chief executive Arthur Mensch has spoken repeatedly about the importance of open models for European AI sovereignty, and the company's smaller releases, including Mistral 7B and Mixtral 8x7B, remain available under licences that permit broad use. The argument from Mistral, and it is not an entirely dishonest one, is that releasing capable small models openly provides genuine value to the ecosystem even if the frontier models are commercial.

But the frontier is where the political argument lives. When European governments, including France's Direction interministerielle du numerique and Germany's Bundesministerium fuer Digitales und Verkehr, justify support for domestic AI labs on sovereignty grounds, they are not talking about 7-billion-parameter models. They are talking about models capable of replacing dependence on GPT-4 and Claude for sensitive applications. And those models, at Mistral as at Aleph Alpha, are not open.

Aleph Alpha's position is in some ways more intellectually consistent. The company, led by Jonas Andrulis, has never pretended that full openness is compatible with its enterprise-first, security-focused business model. Its Aleph Alpha Research division publishes academic work, and the company has participated in European research consortia, but the commercial Luminous and Pharia model families are proprietary by design. The sovereignty pitch is about data residency, auditability by authorised parties, and European legal jurisdiction, not about public access to weights or data. That is a coherent position, even if it sits awkwardly with the broader open-source narrative that European AI has adopted.

The tension is most acute for Mistral precisely because it has leaned furthest into the open-source identity. The company's Series B funding at a valuation of around 6 billion euros, completed in mid-2024, brings with it investor expectations that are structurally incompatible with giving away frontier capability. Apache 2.0 and unicorn valuations do not coexist easily at the model tier that actually matters commercially.

The 18-Month Horizon

Within the next 18 months, the most likely outcome is a stable but narrowed definition of European open-source AI. Labs will continue releasing weights for mid-tier models, and those releases will carry licences that permit research and limited commercial use. Training data will remain closed, with compliance with the AI Act's copyright summary obligation satisfied through minimal disclosures that meet the letter of the law without enabling genuine reproducibility. The EU AI Office will accept this arrangement because it has limited enforcement capacity and because the alternative, pushing labs toward US or non-EU jurisdictions, serves no one.

Hugging Face will continue tracking these releases and will, to its credit, keep flagging the distinction between open-weight and open-source. The Open Source Initiative's definition will remain the principled benchmark, and European labs will continue falling short of it for any model that matters commercially. The word open will persist in press releases while its meaning contracts.

This is not a catastrophe. Weights-only openness still has real value: it enables fine-tuning, security auditing, academic research, and integration into sovereign infrastructure. A European public authority that can run Mistral's weights on its own servers, audit the inference process, and fine-tune on its own data is in a meaningfully better position than one dependent on a black-box API. But it is not what was promised, and the gap between promise and delivery deserves to be named.

The European AI sector has a habit of treating sovereignty and openness as synonyms. They are not. Sovereignty is about control; openness is about access. A weights-only model hosted on European infrastructure gives sovereignty without openness. Full open source, training data included, gives openness that may actually compromise sovereignty if that data includes sensitive national information. The AI Act, whatever its other merits, has clarified that the two values pull in different directions, and European labs are quietly resolving that tension in favour of the one that pays the bills.

THE AI IN EUROPE VIEW

The European open-source narrative was always doing two jobs simultaneously: competing with American closed-weight giants on technical credibility, and reassuring European governments that domestic AI investment served public rather than purely private interests. For a brief window in 2023, when Mistral released Mistral 7B under Apache 2.0 and briefly made believers of the open-source community, it looked as though those two jobs might be compatible. They are not, and the AI Act has made that incompatibility structural rather than contingent.

We are not arguing that European labs should impoverish themselves in the name of ideological purity about open source. Commercial viability matters; a lab that cannot raise capital cannot build frontier models, and a Europe without frontier AI capability is a worse outcome than one with capable but partly proprietary labs. But the policy community, and the public sector clients who are being asked to justify billion-euro bets on European AI sovereignty, deserve clarity about what they are actually buying. Weights are not data. Licences with commercial riders are not Apache 2.0. And a definition of open source that shrinks to fit investor expectations is a definition that has lost its meaning. Name it plainly, price it honestly, and stop using the word open as a free pass.

Weights Without Substance: Why European Open-Source AI Will Shrink to a Label Within 18 Months

The Licensing Drift in Practice

What the Labs Say, and What They Do

The 18-Month Horizon

Updates

Comments