Skip to main content
Falcon, Jais, and ALLaM: What the Arabic LLM Race Means for European AI Strategy
· 13 min read

Falcon, Jais, and ALLaM: What the Arabic LLM Race Means for European AI Strategy

Three sovereign large language models, each backed by a different Gulf state, are competing to define Arabic AI for a generation. For European policymakers, developers, and enterprises eyeing the Arabic-speaking market, understanding Falcon-H1, Jais 2, and ALLaM is no longer optional. Here is what separates them technically and strategically.

Mistral AI has a strategic decision to make, and it has not yet made it. The Paris-based frontier-model lab, Europe's most credible answer to OpenAI and Anthropic, has built its reputation on multilingual capability and open-weight licensing. Arabic is the next obvious test of both. With over 422 million speakers, deep diaspora communities across France, Germany, Belgium, and the United Kingdom, and a public-sector procurement pipeline that increasingly requires bilingual French-Arabic and German-Arabic deployment, the Arabic LLM market matters to European frontier labs in a way it does not to American ones. Three sovereign Gulf-state models, Falcon-H1 Arabic, Jais 2, and ALLaM, have just landed in Mistral's path.

The choice is binary. European labs can either build their own Arabic-first capability and compete with Abu Dhabi, Riyadh, and the UAE for both technical credibility and developer mindshare, or they can partner with one of the three sovereign stacks and accept a structural dependency on a non-European model. Aleph Alpha, the Heidelberg-based lab that has reorganised around AI Act compliance tooling, has its own version of the same dilemma in microcosm. The technical comparison that follows matters because, for European AI strategy, this is no longer a regional curiosity. It is the next live decision point.

Advertisement

Falcon: The Open-Source Champion from Abu Dhabi

Technology Innovation Institute (TII), the applied research pillar of Abu Dhabi's Advanced Technology Research Council, has turned the Falcon series into one of the most recognised open-source model families in the world. The journey began in 2023 with Falcon 40B, a 40-billion-parameter model trained on one trillion tokens of the RefinedWeb dataset using 384 A100 GPUs over two months. It topped the Hugging Face Open LLM Leaderboard on release, outperforming Meta's LLaMA-30B and LLaMA-65B.

Falcon 180B followed, scaling to 180 billion parameters trained on 3.5 trillion tokens across 4,096 A100 GPUs. The training consumed approximately seven million GPU-hours on Amazon SageMaker, producing a model that scored 68.74 on the MMLU benchmark. In February 2024, TII launched the Falcon Foundation with an initial pledge of 300 million US dollars, a non-profit entity dedicated to advancing open-source generative AI.

The Falcon 3 family arrived in December 2024, releasing thirty model checkpoints ranging from one billion to ten billion parameters, trained on 14 trillion tokens of web, code, STEM, and curated multilingual data using 1,024 H100 GPUs. The family introduced Falcon3-Mamba-7B, a state-of-the-art State Space Language Model with a 32,000-token context length.

The real breakthrough for Arabic came on 5/01/2026, with the launch of Falcon-H1 Arabic.

Falcon-H1 Arabic: The Technical Details

Falcon-H1 Arabic employs a hybrid Mamba-Transformer architecture, a deliberate break from the pure transformer design that dominates the field. This hybrid approach combines the efficiency of state-space models for long sequences with the attention mechanisms that transformers excel at for contextual understanding. The model ships in three sizes: 3B, 7B, and 34B parameters.

The benchmark results are striking. On the Open Arabic LLM Leaderboard (OALL v2), the 3B model scores 61.87 per cent, ten full points ahead of Microsoft's Phi-4 Mini 4B. The 7B model reaches 71.47 per cent, surpassing all models in the approximately 10B parameter class. The 34B flagship scores 75.36 per cent, outperforming both Qwen2.5 72B and Llama-3.3 70B despite being roughly half their size.

On specialised Arabic benchmarks, the picture is equally compelling. ArabCulture scores reach approximately 80 per cent for both the 7B and 34B variants. On the 3LM STEM benchmark, the 34B model achieves 96 per cent on native questions and 94 per cent on synthetic. The AraDice dialect evaluation shows coverage across Egyptian, Gulf, Levantine, and Maghrebi Arabic, with the 34B model averaging approximately 53 per cent across all dialect categories.

Falcon-H1 Arabic is released under the Falcon Licence 2.0, an Apache 2.0-based permissive licence with an acceptable use policy. It is fully open source, available on Hugging Face, and free for commercial use with no hosting restrictions. That licensing posture matters enormously to European developers, who increasingly operate under procurement rules and AI Act compliance obligations that favour transparent, auditable model weights.

Wide-angle editorial photograph inside a European high-performance computing facility, such as the LUMI supercomputer centre in Finland or the Barcelona Supercomputing Center in Spain. Rows of server

Jais 2: Speed, Scale, and the Cerebras Advantage

Inception, founded in late 2017 as part of the G42 ecosystem, has taken a markedly different path with Jais. Where Falcon emphasises architectural innovation and open-source reach, Jais has focused on training data quality, inference speed, and commercial deployment infrastructure.

The original Jais 13B launched in August 2023, built on a Llama2 foundation with an expanded tokeniser that doubled the base vocabulary. It was trained on 116 billion Arabic tokens. Jais 70B followed in 2024, scaling to 70 billion parameters trained on 370 billion tokens, 330 billion of them Arabic, the largest Arabic dataset for any open-source model at the time.

G42's corporate trajectory provides essential context. The company raised 800 million US dollars from Silver Lake in April 2021, followed by Microsoft's landmark 1.5-billion-dollar investment in April 2024. Total funding reached 2.3 billion dollars. G42 also launched partnerships with OpenAI and anchored the Stargate UAE project, a one-gigawatt compute cluster.

Jais 2, announced on 9/12/2025, is a ground-up rebuild. Developed by Inception in partnership with Cerebras Systems and MBZUAI's Institute of Foundational Models, it ships in two sizes, 8B and 70B parameters, pretrained from scratch on 2.6 trillion curated Arabic, English, and code tokens.

The standout metric is inference speed. Running on Cerebras hardware, Jais 2 achieves up to 2,000 tokens per second. For enterprise applications handling millions of Arabic-language queries daily, whether in government services, banking, or telecommunications, this speed advantage is not merely technical. It is commercial. European system integrators deploying Arabic AI for clients in North Africa or the Levant will find this throughput figure difficult to ignore.

On the AraGen benchmark, Jais-2-70B achieves the highest scores across nearly all metrics, outperforming both Qwen2.5-72B and Llama-3.3-70B. The model excels particularly in culturally rooted domains: poetry, religion, cuisine, translation, summarisation, and financial analysis. Its dialect coverage spans Modern Standard Arabic and regional variants, with specific engineering for code-switching and informal tone.

Jais 2 is released with full open-source weights on Hugging Face and deployed for production use through the Azure AI Model Catalog, a distribution channel that makes it immediately accessible to European enterprise customers already inside the Microsoft ecosystem.

ALLaM: Saudi Arabia's Sovereign Stack

If Falcon is the open-source champion and Jais is the speed-optimised commercial play, ALLaM is the sovereign infrastructure model, designed to serve Saudi Arabia's national AI ambitions and, through HUMAIN, to anchor an entire domestic AI ecosystem.

ALLaM is developed by SDAIA, the Saudi Data and Artificial Intelligence Authority, established by royal decree in August 2019. SDAIA oversees the National Strategy for Data and AI, which targets SAR 75 billion (about GBP 16 billion / EUR 18 billion) in investments by 2030. ALLaM is the linguistic foundation of that strategy.

The model family includes ALLaM 7B, available on Hugging Face, and ALLaM 34B, which launched on 25/08/2025 as the engine powering the HUMAIN Chat application. An enterprise variant reportedly scales to 1.8 trillion parameters, positioning ALLaM as a government and enterprise-grade model rather than a consumer or developer tool.

What sets ALLaM apart is not its benchmark performance. On the Open Arabic LLM Leaderboard, the 7B variant trails both Falcon-H1 Arabic 7B and Jais 2 8B. What distinguishes it is integration into national infrastructure. HUMAIN, the PIF-owned company that deploys ALLaM, has plans for 500 megawatts of compute capacity, eleven data centres each with 200-megawatt capacity, and an initial deployment of 18,000 GB300 GPUs. The SDAIA Hexagon data centre in Riyadh offers 480 megawatts of power across 2.78 million square metres.

ALLaM's licensing model reflects this sovereign posture. Unlike Falcon and Jais, it is positioned for enterprise and government use within the HUMAIN ecosystem, a deliberate choice to build a vertically integrated AI stack controlled domestically.

European data-centre cleanroom, rows of liquid-cooled GPU servers, an engineer walking the aisle in shallow depth of field

Head-to-Head: How They Compare

DimensionFalcon-H1 ArabicJais 2ALLaM
DeveloperTII (Abu Dhabi)Inception/Cerebras/MBZUAISDAIA/HUMAIN (Riyadh)
Sizes3B, 7B, 34B8B, 70B7B, 34B, 1.8T enterprise
ArchitectureHybrid Mamba-TransformerRedesigned from scratchNot publicly detailed
Training data14T tokens base, Arabic-adapted2.6T curated Arabic/English/codeNot publicly disclosed
OALL v2 score (best)75.36% (34B)State-of-art on AraGen (70B)Trailing (7B)
Inference speedStandard2,000 tokens/sec (Cerebras)Not disclosed
Dialect coverageMSA, Egyptian, Levantine, Gulf, MaghrebiMSA plus regional, code-switchingNot specified publicly
LicenceFalcon 2.0 (Apache-based, open)Full open-source weightsEnterprise/government (HUMAIN)
Commercial accessHugging Face, freeHugging Face plus Azure AI CatalogHUMAIN ecosystem
Strategic modelOpen ecosystem, global developer reachSpeed-first, enterprise deploymentSovereign stack, domestic control

The Benchmarks That Matter for Arabic AI

Understanding these models requires understanding the benchmarks designed specifically for Arabic AI, an evaluation ecosystem that has matured significantly since 2023.

The Open Arabic LLM Leaderboard (OALL v2) evaluates models across six multiple-choice tasks, including Arabic MMLU, Arabic Exams, Alghafa, MadinahQA, and Aratrust, plus one generative task. It is the closest equivalent to the English-language Open LLM Leaderboard and the primary ranking system for Arabic models.

AraGen focuses on generative capabilities: translation, summarisation, financial analysis, and culturally rooted domains including poetry, religion, and cuisine. This benchmark captures something OALL misses, namely how well a model generates natural, contextually appropriate Arabic rather than simply selecting correct answers.

AraDice evaluates dialect and cultural understanding across Egyptian, Gulf, and Levantine Arabic. For any model claiming to serve the Arabic-speaking world, dialect performance is arguably the most important metric. A model that handles Modern Standard Arabic beautifully but fails on Egyptian colloquial is useless for most consumer applications. European developers building Arabic-facing products should treat AraDice scores as the primary filter, not OALL.

  • Open Arabic LLM Leaderboard (OALL v2) - six multiple-choice tasks plus one generative; the primary ranking surface for Arabic models.
  • AraGen - generative quality across translation, summarisation, financial analysis, poetry, religion, and cuisine.
  • AraDice - dialect coverage across Egyptian, Gulf, Levantine, and Maghrebi Arabic; the metric that should drive procurement filtering, not OALL.
  • BALSAM - comparative platform for Arabic-specific benchmarking.
  • ALUE - eight core language-understanding tasks for Arabic.

BALSAM provides a comparative platform specifically designed for benchmarking Arabic LLMs, while ALUE (Arabic Language Understanding Evaluation) tests eight core language understanding tasks. Together, these benchmarks create a rigorous evaluation framework that did not exist three years ago, itself a sign of the Arabic AI ecosystem's maturation. The existence of this infrastructure also matters for EU AI Act compliance purposes: third-country models deployed in Europe for Arabic-language use cases will need to demonstrate performance against credible benchmarks, and these are now credible.

Three Strategies, Three Visions of Sovereignty

The technical comparison, while essential, only tells half the story. Falcon, Jais, and ALLaM embody three fundamentally different theories of how a sovereign state builds AI capability.

Falcon's theory is that sovereignty comes through influence. By releasing the world's best Arabic model as open source, TII ensures that every developer, startup, and government agency building Arabic AI applications starts from a UAE-originated foundation. The 300-million-dollar Falcon Foundation is not charity; it is an investment in ecosystem control through ubiquity. When a Moroccan fintech or a Tunisian healthtech startup fine-tunes Falcon for its use case, Abu Dhabi's AI influence extends without requiring any commercial agreement. European open-source advocates at organisations such as EleutherAI or within the Hugging Face community in Paris will recognise this playbook immediately.

Jais's theory is that sovereignty comes through infrastructure partnerships. G42's relationships with Microsoft, OpenAI, Cerebras, and Oracle create a commercial ecosystem where Jais is not just a model but a deployment platform. The Azure AI Catalog integration and the Cerebras inference acceleration ensure that Jais can be deployed at scale in enterprise environments. Critically for European buyers, Azure is already a compliant cloud provider under GDPR and the EU AI Act's transitional provisions, meaning Jais 2 can be procured through existing enterprise agreements with minimal additional legal overhead.

ALLaM's theory is that sovereignty comes through vertical integration. Saudi Arabia is building every layer of the stack domestically, from energy infrastructure powering the data centres, to GPU clusters, to ALLaM's language capabilities, to the applications built on top. This approach sacrifices the developer ecosystem breadth that Falcon enjoys and the deployment flexibility that Jais offers, but it achieves something neither rival can claim: complete domestic control over the entire AI value chain. EU strategists drafting the proposed European AI Infrastructure Act should study this model carefully, because it is arguably what several member states are attempting to replicate with far smaller budgets.

Why This Matters in Brussels and Beyond

European AI policy has spent the past three years focused almost exclusively on frontier model governance, largely defined by OpenAI, Google DeepMind, and Anthropic. The emergence of high-quality sovereign Arabic LLMs complicates that picture in at least three ways.

First, European enterprises operating in Arabic-speaking markets now have genuine model choices. Neither GPT-4o nor Gemini 1.5 Pro was designed with Arabic dialectal diversity as a primary objective. Falcon-H1 Arabic's five-dialect coverage and Jais 2's AraGen performance suggest that domain-specific sovereign models can outperform general-purpose frontier models on non-English tasks. Philipp Schmid, technical lead at Hugging Face in Paris, has noted publicly that multilingual model quality is now the primary driver of enterprise adoption outside North America and Western Europe, a point that applies directly to Arabic-facing deployments.

Second, the open-source versus closed-source dynamic playing out in Arabic AI mirrors debates happening inside the EU's AI Office in Brussels. The AI Office, which took on supervisory responsibility for general-purpose AI models under the AI Act from August 2024, is currently developing the Code of Practice for GPAI providers. Whether open-weight models receive meaningfully different treatment than closed proprietary systems remains contested. The Arabic LLM race, where open-weight Falcon and Jais 2 appear to be outperforming the more closed ALLaM on public benchmarks, provides a useful empirical data point for that debate.

Third, the compute dependency question is acute. Both UAE and Saudi models rely heavily on NVIDIA hardware, with US export controls shaping which nations can access the latest chips. European AI sovereignty advocates, including those at ETH Zurich's AI Center and within the ELLIS network, have long argued that Europe's dependence on non-European semiconductor supply chains is its primary structural vulnerability. The Arabic LLM race illustrates what happens when that vulnerability is exposed at national scale: even well-funded sovereign AI programmes are structurally dependent on American hardware policy.

What Comes Next

The Arabic LLM landscape in early 2026 looks radically different from eighteen months ago. Where there were once zero competitive Arabic-first language models, there are now three major families and over 53 Arabic models identified globally. The benchmarking infrastructure has matured. Commercial deployment pathways exist.

Several developments will determine which of these models comes to define Arabic AI for the next decade.

  • Dialect coverage separates the serious from the symbolic; an Arabic model that fails on Egyptian, Levantine, or Maghrebi dialects cannot claim to serve the Arabic-speaking world.
  • Commercial deployment matters more than benchmarks; the model embedded in government services, banking, and e-commerce will generate the data flywheel that sustains long-term dominance.
  • Open-source posture will resolve the competitive landscape, just as it did in the English-language LLM market within eighteen months of Llama's release.

Dialect coverage will separate the serious from the symbolic. A model that handles Modern Standard Arabic and one regional dialect but fails on Egyptian, Levantine, or Maghrebi Arabic cannot claim to serve the Arabic-speaking world. Falcon-H1 Arabic's explicit five-dialect coverage is currently the most comprehensive, but gaps remain, and rivals are closing them.

Commercial deployment will matter more than benchmarks. The model embedded in government services, banking platforms, healthcare systems, and e-commerce applications will generate the data flywheel and developer ecosystem that sustains long-term dominance. Jais 2's Azure integration gives it immediate access to European enterprise procurement channels. ALLaM's HUMAIN integration gives it a captive Saudi market. Falcon's open-source posture gives it the broadest potential developer base.

The open-source question will ultimately resolve the competitive landscape. In the English-language AI world, the open-source versus closed-source contest reshaped the market within eighteen months of Llama's release. The same dynamics will play out in Arabic AI, and European developers, regulators, and enterprises should position themselves now rather than wait for the outcome.

Updates

  • published_at reshuffled 2026-04-29 to spread distribution per editorial directive
  • Byline migrated from "James Whitfield" (james-whitfield) to Intelligence Desk per editorial integrity policy.
  • Lede rewritten 2026-04-28 to lead with Mistral / Aleph Alpha decision-point framing (Adrian editorial integrity pass; previous lede was a pure Arabic-LLM landscape survey for the first 600 chars before reaching any European actor). Hero retained: the falcon-feather iconography is on-topic and not regional-specific, so cascade regen was not required.
  • Pre-existing structural fixes applied 2026-04-28: added a second mid-article image (cleanroom in a Nordic data centre) and two <ul> lists (benchmarks summary, what-comes-next). Article was failing the publish-time quality gate on word-count thresholds for image count and list count; structure-only fix, no editorial change to the surrounding prose.
  • Converted SAR currency figures to European-reader-friendly formats: 'SAR 75 billion' expanded to 'SAR 75 billion (about GBP 16 billion / EUR 18 billion)', with no other SAR figures present requiring conversion.
AI Terms in This Article 6 terms
LLM

A large language model, meaning software trained on massive text data to generate human-like text.

inference

When an AI model processes input and produces output. The actual 'thinking' step.

tokens

Small chunks of text (words or word fragments) that AI models process.

parameters

The internal settings an AI model learns during training. More parameters generally means more capable.

transformer

The neural network architecture behind most modern AI language models.

generative AI

AI that creates new content (text, images, music, code) rather than just analyzing existing data.

Advertisement

Comments

Sign in to join the conversation. Be civil, be specific, link your sources.

No comments yet. Start the conversation.
Sign in to comment