Demis Hassabis founded DeepMind with a deceptively simple mission called “solve intelligence,” and then use that to “solve everything else.” It’s a bold claim, the kind that usually belongs in science fiction novels rather than business plans. Yet, as we move deeper into 2025, that promise is beginning to look less like fiction and more like a messy, expensive, and incredibly lucrative reality.
OpenAI CEO Sam Altman has promised us enormous gains in quality of life, and Dario Amodei of Anthropic predicts a “country of geniuses in a data centre” by 2026. But for those of us watching the numbers, the question isn’t just about raw intelligence anymore. It’s about utility. Can these systems actually do science, or are they just very good at passing exams?
The answer, it seems, is a complicated “yes, but.” OpenAI’s release of the FrontierScience benchmark has given us our clearest map yet of where the silicon ends, and the scientist begins. This is a proving ground for machines. The benchmark splits the challenge into two tracks: an Olympiad tier, filled with the kind of physics and chemistry problems that earn gold medals for brilliant teenagers, and a Research tier, designed by PhDs to simulate the messy, open-ended misery of actual discovery.
The results are a perfect microcosm of the AI industry right now. On the Olympiad track, OpenAI’s latest model, GPT-5.2, is a savant, scoring an impressive 77.1%. AI solves theoretical physics and derivations that could puzzle ordinary people with such ease, but it struggles when it comes to research.
It can’t think with novelty like a scientist, proposing hypotheses and navigating ambiguity, and the score plummets to 25.2%. It’s a stark reminder of the “Research Gap.” We have built engines that can ace the test but still struggle to run the lab.
Even so, the trajectory is undeniable. Just two years ago, GPT-4 scored a mere 39% on the predecessor to these tests, the GPQA benchmark. Today, GPT-5.2 scores 92% on that same metric. The line is going up and to the right, and it’s dragging billions of dollars of investment along with it.
Billion-dollar lab partners
While generalist models struggle to design experiments, specialised AI agents are already reshaping the economics of the physical world. If you want to see where the money is really moving, look away from the chatbots and toward the biology labs. The release of AlphaFold 3 by Google DeepMind has fundamentally altered the landscape of drug discovery.
Unlike its predecessors, which focused primarily on proteins, AlphaFold 3 can predict the structure and interactions of DNA, RNA, and small-molecule ligands with unprecedented accuracy. This is improving prediction rates for protein-molecule interactions by 50%.
Isomorphic Labs, the commercial spin-off led by Hassabis, is aggressively monetising this capability. In 2025 alone, they expanded a strategic partnership with Novartis and secured a massive deal with Eli Lilly worth up to $1.7 billion to discover small-molecule therapeutics.
By mid-2025, Isomorphic raised $600 million in a financing round led by Thrive Capital to push its own internal pipeline of drugs into clinical trials. When you compress the timeline of drug discovery by even 20% or 30%, you are potentially saving billions in R&D spend.
The market is pricing this in. The global sector for AI in drug discovery is currently valued at roughly $6.93 billion, but analysts project it will surge to over $16.5 billion by 2034. We are witnessing the industrialisation of biology, where wet labs are becoming data centres, and pipettes are being guided by algorithms.
In the world of material science, the numbers are equally staggering. Google DeepMind’s GNoME project has already identified 2.2 million new crystal structures. This knowledge would have taken 800 years to acquire through traditional experimentation.
Microsoft is countering with Azure Quantum Elements, aiming to compress 250 years of chemistry research into the next 25. These tools are hunting for the battery materials and superconductors of the future, and they are finding them at a pace that human intuition simply cannot match.
Hidden infrastructure of intelligence
Behind every AI scientist who makes a breakthrough, there is a hidden economy of human experts teaching it how to think. This is the “picks and shovels” layer of the AI gold rush, and it is minting unicorns at a breakneck pace.
Take Surge AI. You might not see them in the headlines as often as OpenAI, but they are the ones feeding the brains of the operation. Founded by Edwin Chen, Surge AI took a contrarian bet. Instead of using low-paid click-workers to label data, they hired PhDs, linguists, and scientists. That bet paid off.
By late 2025, the company, which bootstrapped itself without massive venture capital injections initially, was reportedly generating annual revenue exceeding $1 billion. Valuations for the company have now hit the stratosphere, with reports placing it at around $25 billion. It turns out that in a world of abundant computing power, high-quality human reasoning is the scarcest commodity of all.
This demand for human expertise has also fuelled the rise of Mercor, a platform that uses AI to vet and match human talent. They recently closed a Series C round that values the company at $10 billion, a five-fold increase from their valuation earlier in the year. The irony is palpable. To build artificial intelligence, we are paying record sums for human intelligence.
However, the road to the “autonomous scientist” is paved with potholes. While the infrastructure is booming, the agents themselves can be dangerously overconfident. Sakana AI, a research lab based in Tokyo, captured the world’s attention with “The AI Scientist,” a system designed to automate the entire scientific loop, from reading papers to writing code and drafting manuscripts. They even claimed it navigated the peer review process at a machine learning workshop.
But when independent researchers popped the hood, the engine was sputtering. An evaluation by researchers at the University of Siegen found that “The AI Scientist” suffered from severe “novelty hallucinations.” It would reinvent well-known techniques, like micro-batching for stochastic gradient descent, and present them as groundbreaking discoveries.
Worse, the system lacked basic robustness; in testing, 42% of its proposed experiments failed to execute entirely due to coding errors. It is the digital equivalent of a brilliant but chaotic grad student who writes beautiful essays but burns down the chemistry lab.
There is a darker side to this productivity, too. The ability to generate scientific text at scale has weaponised academic fraud. We are currently drowning in a flood of AI-generated noise. A study published in Science Advances estimated that 13.5% of academic abstracts in 2024 showed signs of AI generation, with some subfields nearing 40%. Even peer reviews are being written by bots. An analysis of the ICLR conference found that nearly 16% of reviews were partially authored by LLMs. We are entering a dangerous feedback loop where AI writes the papers and AI reviews them, potentially detaching the scientific record from reality.
So, where does this leave us? Are we on the verge of a golden age or a deluge of digital noise? The economic data suggest the former, provided we can solve the reliability problem. A 2025 report by the RAND Corporation modelled the impact of AI “agents,” systems that can act autonomously rather than just answering questions. Their analysis suggests that if we truly unlock this “Agent World,” it could boost annual economic growth by 3.8 percentage points between now and 2045. That is the difference between a stagnant economy and a booming one.
The “Industrialisation of Intelligence” is messy, expensive, and filled with triumph and fraud. We have superhuman tools like AlphaFold that can see the machinery of life, and we have eager apprentices like GPT-5.2 that are brilliant in theory but clumsy in practice. As we move toward 2030, the winners won’t just be the ones with the smartest models, but the ones who figure out how to turn that raw silicon intelligence into reliable, verifiable science.
