AI's Inner Workings Still Baffle Experts as NeurIPS Draws Record Crowds

The field's most powerful AI systems are, by the admission of the people who built them, still fundamentally mysterious. That uncomfortable truth dominated discussions at the annual Neural Information Processing Systems (NeurIPS) conference in San Diego, which drew a record 26,000 attendees, double the figure from just six years ago. The explosive attendance mirrors AI's transformation from academic niche to global industrial force. Yet beneath the hype, one foundational question overshadowed every product announcement and benchmark claim: how do frontier AI systems actually work?

[[KEY-TAKEAWAYS:Record 26,000 researchers attended NeurIPS, doubling attendance from six years ago|Google has abandoned near-complete reverse-engineering goals as currently out of reach|OpenAI is pursuing full neural network understanding despite uncertain timelines|Current AI benchmarks were designed for older, narrower models and are already obsolete|EU AI Act enforcement depends on interpretability tools that do not yet fully exist]]

The Great AI Mystery

By The Numbers

26,000

NeurIPS 2024 attendees

Record attendance at the Neural Information Processing Systems conference in San Diego, double the figure from six years ago, reflecting AI's growth from academic niche to global industry.

Source

~$1m

Martian interpretability prize

Martian, co-founded by Shriyash Upadhyay, launched a prize of approximately one million US dollars to accelerate progress in AI interpretability research.

Source

4th year

AI in science offshoot at NeurIPS

For the fourth consecutive year, a NeurIPS workshop dedicated to AI-driven scientific discovery attracted strong attendance, reflecting rapidly growing interest in autonomous research tools.

Source

Within a decade

Google's interpretability timeline

Google's interpretability lead Neel Nanda stated the team is targeting practical, demonstrable results within a decade after stepping back from near-complete reverse-engineering ambitions.

Source

A striking consensus emerged among leading researchers and company executives: they have limited understanding of how their own most advanced models function internally. The pursuit of that understanding, known as interpretability, has become the field's most pressing technical and regulatory challenge. For European policymakers already wrestling with the obligations of the EU AI Act, which came into full effect in stages from 2024, the admission is more than academically inconvenient. Meaningful conformity assessments for high-risk AI systems depend on tools that, by the frank account of the researchers building them, do not yet exist in mature form.

Shriyash Upadhyay, co-founder of Martian, an interpretability-focused company, drew a striking historical parallel. He likened the current moment to early physics, when scientists were still debating whether particles such as electrons existed and could be reliably measured at all. His company has launched a prize worth approximately 1 million US dollars to accelerate progress in the area. The paradox is hard to ignore: the core mechanisms of large language models remain opaque whilst commercial demand for those same models soars.

A wide-angle photograph taken inside a contemporary European AI research facility, showing researchers at dual-monitor workstations surrounded by visualisations of neural network activation patterns d

Tech Giants Split on Strategy

NeurIPS exposed a genuine strategic divergence between the two most prominent players. Google's interpretability team announced a significant pivot away from ambitious reverse-engineering goals. Neel Nanda, Google's interpretability lead, acknowledged that attempts at near-complete mechanistic understanding of large models are currently out of reach. The company is instead concentrating on practical, impact-driven methods, with tangible results expected within a decade rather than years.

OpenAI has taken the opposite position. Leo Gao, OpenAI's head of interpretability, reaffirmed the company's commitment to deep, comprehensive understanding of neural network operations, even if short-term success is uncertain. The contrast is not merely a question of corporate temperament. It reflects a genuine and unresolved scientific disagreement about whether full interpretability is achievable at all within the current deep-learning paradigm.

Adam Gleave, founder of FAR.AI, a safety-focused research organisation, articulated the sceptical case: deep learning models may simply be too complex for straightforward human comprehension. He remains cautiously optimistic, however, about understanding model behaviour at multiple levels of abstraction, even if internal mechanisms stay opaque.

The three broad approaches currently on the table from major organisations look roughly as follows:

Google: practical, impact-driven interpretability targeting demonstrable results within a decade, stepping back from full reverse-engineering ambitions.
OpenAI: deep, comprehensive interpretability aimed at full understanding of neural network operations, accepting longer and less certain timelines.
FAR.AI and similar safety labs: behavioural analysis, studying what models do at various levels rather than attempting to trace every internal mechanism.

For European regulators, none of these approaches yet delivers the kind of auditability that the AI Act's high-risk provisions implicitly assume. The European AI Office, established in early 2024 within the European Commission to oversee the Act's implementation, will need to grapple with the fact that the most capable models on the market cannot be fully explained by anyone, including the companies seeking to place them.

Measurement Tools Are Falling Behind

The interpretability problem is compounded by a parallel crisis in evaluation. Current benchmarks were designed for earlier, narrower AI systems. They increasingly fail to assess complex capabilities such as reasoning, contextual judgement, and general problem-solving in today's frontier models.

Researchers at institutions including ETH Zurich and University College London have raised concerns about benchmark saturation, the phenomenon whereby models rapidly achieve near-perfect scores on established tests without demonstrating genuine generalisation. The measurement gap matters enormously in specialised domains. Researchers working on biological AI described the evaluation landscape in their field as being at an extremely early stage, with no agreed framework for assessing whether an AI's predictions in areas such as protein interaction or drug discovery are reliably trustworthy.

The practical consequences for European industry are significant. Organisations deploying AI in regulated sectors, including healthcare, finance, and critical infrastructure, face a growing disconnect between what vendors claim their systems can do and what can actually be verified through available testing methods. The problem is not simply academic. It affects procurement decisions, liability assessments, and regulatory filings across the bloc.

The main gaps in current evaluation infrastructure include:

Benchmarks calibrated to older, narrower tasks that advanced models saturate almost immediately.
No agreed evaluation frameworks for specialised domains such as biology, medicine, and climate modelling.
Inadequate tools for testing real-world reasoning as opposed to pattern-matching on familiar data.
A widening gap between AI capability growth and the sophistication of the metrics used to measure it.

An editorial photograph of a packed conference auditorium during a machine-learning research presentation, rows of attendees in tiered seating visible from behind, a large projection screen showing ab

Science Presses Ahead Regardless

Despite interpretability challenges, AI systems are already producing meaningful results in scientific research, and the momentum is accelerating. Upadhyay noted that engineers built reliable bridges long before Isaac Newton formalised classical mechanics: practical application routinely precedes theoretical comprehension, and that has always been how engineering progresses.

For the fourth consecutive year, a NeurIPS offshoot dedicated to AI in scientific discovery attracted strong attendance and notable results. Researchers working on AI applications in chemistry, materials science, and genomics reported a step-change in enthusiasm compared with even three years ago. Interest in AI systems capable of autonomous discovery, generating and testing hypotheses without constant human direction, has grown sharply, contrasting with the relative indifference the field attracted a decade ago.

Europe has particular stakes in this trajectory. The European Research Council and national funding bodies in Germany, France, and the Netherlands have all increased allocations to AI-driven science programmes in the past two years. Institutions such as the Helmholtz Association and EMBL are embedding AI tooling into core research pipelines well before any consensus on interpretability has emerged, following the same pragmatic logic that Upadhyay described. Whether that pragmatism will satisfy regulators when those same tools migrate into clinical or industrial settings remains an open question.

What European Regulators and Industry Must Confront

The NeurIPS discussions carry direct implications for Europe's regulatory posture. The EU AI Act requires providers of high-risk AI systems to demonstrate technical robustness, transparency, and the ability to explain outputs to the degree necessary for appropriate oversight. The conference made clear that the scientific foundations for meeting those requirements are still under active construction.

Yoshua Bengio, scientific director of Mila and a prominent voice in AI safety globally, has argued publicly that deploying systems whose internal logic cannot be audited in safety-critical contexts is a governance failure waiting to happen. He is not alone among European-linked researchers in pressing for interpretability investment to be treated as infrastructure, not optional research.

Meanwhile, Margrethe Vestager's successor framework at the European Commission and the newly constituted AI Office face the practical task of issuing guidance on conformity assessments without the benefit of mature interpretability tooling. The risk is that compliance becomes a documentation exercise rather than a genuine safety guarantee, precisely because the underlying science has not yet caught up with the policy ambition.

The field is not standing still. Prize competitions, dedicated research programmes, and the divergent but active strategies at Google and OpenAI all suggest serious investment. However, as NeurIPS 2024 demonstrated, the gap between what frontier AI can do and what anyone can reliably verify about how it does it is not closing quickly. For European business, that gap is not a theoretical concern. It is a compliance risk, a procurement challenge, and increasingly a boardroom conversation.

AI's Inner Workings Still Baffle Experts as NeurIPS Draws Record Crowds

The Great AI Mystery

Tech Giants Split on Strategy

Measurement Tools Are Falling Behind

Science Presses Ahead Regardless

What European Regulators and Industry Must Confront

Updates

Comments