On one hand, we have OpenAI announcing HealthBench, a physician-designed benchmark that rigorously evaluates AI models on 5,000 realistic health conversations across diverse scenarios, thus preparing to pave the way for Doc ChatGPT to become a reality. On the other hand, you have LLM-skeptic Gary Marcus trying something as trivial as having ChatGPT draw a map to hilarious effect:
It was very good at giving me bullshit, my dog-ate-my-homework excuses, offering me a bar graph after I asked for a map, falsely claiming that it didn’t know how to make maps. A minute later, as I turned to a different question, I discovered that it turns out ChatGPT does know how to draw maps. Just not very well.
After quite the saga (it’s worth reading Gary’s article in full), he concludes:
How are you suppose do data analysis with 'intelligent' software that can’t nail something so basic? Surely this is not what we always meant by AGI.