This (long) post is well worth reading in full – Arvind Narayanan, Benedikt Ströbl, and Sayash Kapoor do an excellent job of drilling into the current challenges with scaling AI models, why we shouldn’t trust industry insiders telling us that they have scaling AIs figured out, and why we are still years behind leveraging even the powers that AI bestows upon us today.
The furious debate about whether there is a capability slowdown is ironic, because the link between capability increases and the real-world usefulness of AI is extremely weak. The development of AI-based applications lags far behind the increase of AI capabilities, so even existing AI capabilities remain greatly underutilized. One reason is the capability-reliability gap --- even when a certain capability exists, it may not work reliably enough that you can take the human out of the loop and actually automate the task (imagine a food delivery app that only works 80% of the time). And the methods for improving reliability are often application-dependent and distinct from methods for improving capability. That said, reasoning models also seem to exhibit reliability improvements, which is exciting.