Just this last week saw the announcement of new, sophisticated AI research tools from all the frontier labs, claiming exceptional results. Headlines such as “U. researchers unveil AI-powered tool for disease prediction with ‘unprecedented accuracy’” or “Microsoft's new AI platform to revolutionize scientific research” gush about these new tools’ abilities.
Meanwhile, Nick McGreivy, a physics and machine learning PhD, shared his own experience with the use of LLMs in scientific discovery – and his story reads very differently:
“I've come to believe that AI has generally been less successful and revolutionary in science than it appears to be.”
He elaborates:
“When I compared these AI methods on equal footing to state-of-the-art numerical methods, whatever narrowly defined advantage AI had usually disappeared. […] 60 out of the 76 papers (79 percent) that claimed to outperform a standard numerical method had used a weak baseline. […] Papers with large speedups all compared to weak baselines, suggesting that the more impressive the result, the more likely the paper had made an unfair comparison.”
And in summary:
"I expect AI to be much more a normal tool of incremental, uneven scientific progress than a revolutionary one.”
And the discussion about what is hype and what is reality in AI continues…