To Reason or Not to Reason

Large reasoning models (LRMs) are all the rage these days. LRMs are LLMs that have been fine-tuned with incentives for step-by-step argumentation and self-verification. Every frontier model comes with at least one “reasoning” mode, and the claims by AI companies are extraordinary: “[…] with some even claiming they are capable of generalized reasoning and innovation in reasoning-intensive fields such as mathematics, physics, medicine, and law.” A new paper examined these claims, and as so often, the results are mixed:

We find that the performance of LRMs drop abruptly at sufficient complexity and do not generalize. […] We find the majority of real-world examples fall inside the LRMs' success regime, yet the long tails expose substantial failure potential.

Reasoning Models Reason Well, Until They Don’t

Pascal Finette @radical