AI's inner workings baffle experts at the world's biggest machine learning summit
Record attendance at NeurIPS in San Diego underlined a troubling reality: the most powerful AI systems on the planet remain opaque even to the engineers who built them. European regulators and researchers are watching closely, as the interpretability gap poses direct questions for the EU AI Act's transparency requirements.
The field of artificial intelligence is deploying systems of extraordinary power whilst understanding almost nothing about how they actually work. That uncomfortable admission dominated NeurIPS 2024, the annual Neural Information Processing Systems conference, which drew a record 26,000 attendees to San Diego, double the figure from just six years ago. For European regulators already wrestling with the transparency obligations baked into the EU AI Act, the message from Silicon Valley could hardly be more awkward.
[[KEY-TAKEAWAYS:NeurIPS 2024 drew a record 26,000 attendees, double the figure from six years ago|Google has abandoned near-complete reverse-engineering goals, shifting to practical decade-horizon methods|OpenAI insists on full neural-network understanding as its long-term interpretability target|Current AI benchmarks were designed for simpler systems and cannot reliably assess modern reasoning|European regulators face a transparency paradox: the Act demands explainability that developers cannot yet deliver]]
Advertisement
The great AI opacity problem
Interpretability, the discipline concerned with understanding how AI models reach their outputs, has moved from a niche research interest to the defining challenge of the moment. A striking consensus emerged at NeurIPS among researchers and company leaders alike: even the organisations that train frontier models have limited insight into what is happening inside them. The models are not broken; they simply operate through billions of interacting parameters in ways that resist straightforward human comprehension.
The stakes for Europe are concrete. The EU AI Act, which entered into force in August 2024, classifies certain AI applications as high-risk and imposes strict transparency and explainability requirements on their developers. If the developers themselves cannot explain their systems, those requirements become extraordinarily difficult to satisfy. Margrethe Vestager, who as European Commission Executive Vice President shaped much of the EU's digital regulatory agenda, has argued consistently that trustworthy AI must be explainable AI. The NeurIPS findings suggest the industry is not yet close to meeting that bar.
Tech giants split on how to respond
The conference revealed a genuine strategic divergence between the two most prominent American AI laboratories.
Google's interpretability team announced a significant pivot. Neel Nanda, Google DeepMind's interpretability research lead, acknowledged that the ambitious goal of near-complete reverse-engineering of large language models is currently out of reach. Google is instead concentrating on practical, impact-driven methods, with tangible results expected within a decade. It is a candid and arguably realistic recalibration.
OpenAI is taking the opposite position. Leo Gao, OpenAI's head of interpretability, reaffirmed the company's commitment to deep, comprehensive understanding of neural network operations. OpenAI wants full insight into how its models function, even if the timeline for achieving that remains uncertain. It is an ambitious stance, and one that sceptics regard as aspirational rather than operational.
A third perspective came from Adam Gleave, chief executive of FAR.AI, a safety-focused research organisation. Gleave argued that deep learning models may be inherently too complex for simple human comprehension, but expressed cautious optimism about understanding model behaviour at various functional levels, even without a complete mechanistic account.
The three positions can be summarised as follows:
Google DeepMind: practical, impact-driven interpretability with measurable results within a decade, abandoning near-complete reverse-engineering as a near-term goal.
OpenAI: deep, comprehensive interpretability targeting full understanding of neural network operations, with no fixed timeline.
FAR.AI: behavioural analysis as a pragmatic alternative, accepting that full mechanistic interpretability may never be achievable.
Measurement tools are not keeping pace
Interpretability is not the only methodological gap on display at NeurIPS. Researchers also raised serious concerns about the adequacy of evaluation benchmarks. Current measurement tools were largely designed for earlier, narrower AI systems and fail to assess complex capabilities such as reasoning, creativity, and general intelligence in modern models.
Sanmi Koyejo, director of the Stanford Trustworthy AI Research group, highlighted the scale of the problem. Many existing benchmarks test specific, constrained tasks and provide little meaningful signal about how a model will perform in open-ended, real-world deployments. That gap matters enormously for businesses and public-sector bodies in Europe that are trying to make evidence-based procurement decisions about AI systems.
The problem is even more acute in specialised scientific domains. Ziv Bar-Joseph of Carnegie Mellon University and founder of GenBio AI described evaluation frameworks for biological AI applications as being in what he called "extremely, extremely early stages."
The shortcomings of current benchmarks span several dimensions:
Existing tests focus on narrow tasks that do not reflect general AI capability.
No reliable methods exist for assessing real-world reasoning under novel conditions.
The gap between AI capability and measurement sophistication is widening, not narrowing.
For European enterprises, this is not an abstract concern. Organisations investing in AI solutions across healthcare, financial services, and critical infrastructure need credible performance metrics. Without them, procurement due diligence becomes largely guesswork, a situation that sits poorly with the risk-management obligations the EU AI Act imposes on deployers as well as developers.
Science accelerates despite the opacity
The interpretability crisis has not slowed AI's impact on scientific research, and NeurIPS 2024 offered considerable evidence of genuine progress. For the fourth consecutive year, a dedicated offshoot conference focused on AI's role in scientific discovery, and by multiple accounts it was the most energetic yet.
Jeff Clune, professor of computer science at the University of British Columbia and a regular presence at European AI research events, observed that enthusiasm for AI-driven scientific discovery has gone "through the roof" compared with a decade ago, when the field was largely overlooked. Ada Fang, a Harvard PhD student researching AI applications in chemistry, described this year's gathering as a "great success," emphasising the cross-domain exchange of ideas among researchers applying AI to problems from drug discovery to materials science.
Shriyash Upadhyay, co-founder of Martian, an interpretability-focused startup that has launched a prize worth roughly 790,000 pounds to accelerate progress in the field, offered a historical analogy that resonated widely: people built bridges before Isaac Newton formalised the physics of mechanics. Practical deployment, in other words, does not have to wait for theoretical completeness. That is reassuring for developers but less reassuring for regulators who need to certify systems they cannot fully interrogate.
Martian's prize is a small but notable signal that private capital is beginning to treat interpretability as a commercially valuable problem rather than a purely academic one. Several European deep-tech investors have made similar noises; the question is whether the funding will match the rhetoric.
What this means for Europe
The EU AI Act's transparency requirements do not exist in a vacuum. They reflect a political and social consensus, particularly strong in Germany, France, and the Nordic countries, that AI deployed in consequential decisions must be accountable. The NeurIPS findings put pressure on that consensus from an unexpected direction: not from companies resisting regulation, but from researchers conceding that the technical foundations for compliance are not yet in place.
The European AI Office, established within the Commission to oversee the Act's implementation, will need to grapple with this gap. So will national competent authorities across member states. The alternative, accepting opacity as a temporary feature of high-capability AI and finding pragmatic proxies for transparency, is politically difficult but may be unavoidable in the near term.
Yoshua Bengio, the Montreal-based deep learning pioneer who has become one of the most prominent voices on AI safety in European policy circles, has argued repeatedly that interpretability is not optional for safe AI deployment. His position is increasingly shared by academics and civil society groups feeding into the EU's AI regulatory process. Whether the commercial pressure to deploy at speed will override those concerns remains the central tension heading into 2025.
Updates
published_at reshuffled 2026-04-29 to spread distribution per editorial directive
AI Terms in This Article6 terms
parameters
The internal settings an AI model learns during training. More parameters generally means more capable.
neural network
Software loosely inspired by how brain cells connect, used to find patterns in data.
deep learning
Machine learning using neural networks with many layers to learn complex patterns.
AI-driven
Primarily guided or operated by artificial intelligence.
pivot
Fundamentally changing a business strategy or product direction.
AI safety
Research focused on ensuring AI systems behave as intended without causing harm.
Advertisement
Comments
Sign in to join the conversation. Be civil, be specific, link your sources.
Comments
Sign in to join the conversation. Be civil, be specific, link your sources.