The radical Blog
About Archive radical✦
  • "Management is like the easiest thing to do with the AI"

    Google cofounder Sergey Brin recently rambled his way through a conversation on management in the age of AI:

    "Management is like the easiest thing to do with AI," Brin said.

    Apparently, management, for Brin, consists of summarizing meetings and assigning to-dos:

    "It could suck down a whole chat space and then answer pretty complicated questions," he said. "I was like: 'OK, summarize this for me. OK, now assign something for everyone to work on.’”

    So far, so bad. Where it gets really fun is when he lets AI make promotion decisions:

    "It actually picked out this young woman engineer who I didn't even notice; she wasn't very vocal," he said. "I talked to the manager, actually, and he was like, 'Yeah, you know what? You're right. Like she's been working really hard, did all these things.’”

    And as he clearly has outsourced his management to an AI, he doesn’t even really know if this all has happened or not:

    "I think that ended up happening, actually," Brin said of the promotion.

    All in all, it’s a pretty bleak vision for the future.

    Link to article.

    → 3:20 PM, May 23
    Also on Bluesky
  • Artificial Analysis State of AI

    A recent (Q1/2025) report from Artificial Analysis delves into the capabilities and emerging trends of frontier AI models. The report identifies six key themes:

    Continued AI progress across all major labs, the widespread adoption of reasoning models that "think" before answering, increased efficiency through Mixture of Experts architectures, the rise of Chinese AI labs rivaling US capabilities, the growth of autonomous AI agents, and advances in multimodal AI across image, video, and speech.

    There is a lot to unpack in the report, but the key chart might be this:

    All frontier models are converging on the same set of capabilities and the quality of those, which means that we will (and are already seeing) a brutal race to maintain their position in the peloton with further increased price pressure. This brings up the question of how these companies might justify their insane valuations…

    Link to Report.

    → 3:47 PM, May 22
    Also on Bluesky
  • The Wild Dichotomy of AI in Research

    Just this last week saw the announcement of new, sophisticated AI research tools from all the frontier labs, claiming exceptional results. Headlines such as “U. researchers unveil AI-powered tool for disease prediction with ‘unprecedented accuracy’” or “Microsoft's new AI platform to revolutionize scientific research” gush about these new tools’ abilities.

    Meanwhile, Nick McGreivy, a physics and machine learning PhD, shared his own experience with the use of LLMs in scientific discovery – and his story reads very differently:

    “I've come to believe that AI has generally been less successful and revolutionary in science than it appears to be.”

    He elaborates:

    “When I compared these AI methods on equal footing to state-of-the-art numerical methods, whatever narrowly defined advantage AI had usually disappeared. […] 60 out of the 76 papers (79 percent) that claimed to outperform a standard numerical method had used a weak baseline. […] Papers with large speedups all compared to weak baselines, suggesting that the more impressive the result, the more likely the paper had made an unfair comparison.”

    And in summary:

    "I expect AI to be much more a normal tool of incremental, uneven scientific progress than a revolutionary one.”

    And the discussion about what is hype and what is reality in AI continues…

    Link to his blog post.

    → 10:03 AM, May 20
    Also on Bluesky
  • Neal Stephenson on AI: Augmentation, Amputation, and the Risk of Eloi

    Science fiction author Neal Stephenson, who popularized the concept and term “metaverse” in his seminal book Snow Crash (1992), recently spoke at a conference in New Zealand on the promise and peril of AI.

    His (brief but razor-sharp) remarks are well worth reading in full, but this quote stood out:

    “Speaking of the effects of technology on individuals and society as a whole, Marshall McLuhan wrote that every augmentation is also an amputation. […] This is the main thing I worry about currently as far as AI is concerned. I follow conversations among professional educators who all report the same phenomenon, which is that their students use ChatGPT for everything, and in consequence learn nothing. We may end up with at least one generation of people who are like the Eloi in H.G. Wells’s The Time Machine, in that they are mental weaklings utterly dependent on technologies that they don’t understand and that they could never rebuild from scratch were they to break down.”

    Link to his remarks.

    → 8:53 AM, May 19
    Also on Bluesky
  • MIT Backs Away From Paper Claiming Scientists Make More Discoveries with AI

    Remember that MIT paper which showed that researchers leveraging AI are significantly more productive (as in: a higher number of discoveries), yet are less satisfied with their work (as they are relegated to drudgery while AI does all the hard, challenging, and exciting work)?

    Well, it turns out that this was another case of “too good to be true.” MIT just recalled the paper, the researcher who published the paper isn’t affiliated with MIT anymore, and MIT states it “has no confidence in the provenance, reliability, or validity of the data and has no confidence in the veracity of the research contained in the paper.”

    Ouch.

    Link to report.

    → 7:30 AM, May 18
    Also on Bluesky
  • Learn Prompt Engineering From The Pros

    Want to get better at writing prompts (it’s a worthwhile investment of time – the difference in results from a mediocre prompt to a good prompt can be vast)? Here are a couple of excellent sources – straight from the horse’s mouth:

    OpenAI has a whole site dedicated to their “AI Cookbook” - which includes, for example a ChatGPT 4.1 Prompt Guide.

    Google has an excellent resource on prompt design strategies on their "AI for Developers" site.

    And Anthropic (Claude) has something similar on their own developer documentation site – which even includes a prompt generator tool.

    Happy prompt engineering!

    → 7:03 AM, May 18
    Also on Bluesky
  • The (AI) Canary in the Coal Mine

    Here is an interesting question: If AI is truly so good at creating code, why don’t we see many more code contributions in Open Source repositories made with the help of AI?

    Here is Satya Nadella, CEO of Microsoft:

    “Maybe 20 to 30 percent of the code that is inside our repos today in some of our projects is probably all written by software.”

    That’s a lot of “maybe” and “probably”… And we just can’t know, as Microsoft’s code repository is (of course) private. But Open Source code lives in public places like GitHub and thus is inspectable. And when you look closely, you will find very little evidence that code in those repositories is written by AI.

    Admittedly, a lot of Open Source projects aren’t particularly excited about AI-generated pull requests:

    “It’s true that a lot of open source projects really hate AI code. … the biggest one is that users who don’t understand their own lack of competence spam the projects with time-wasting AI garbage.”

    But that aside, when you look at the data, it’s just not there:

    “TL/DR: a lot of noise, a lot of bad advice, and not enough signal, so we switched it off again.”

    In many ways, AI keeps furthering the skill gap:

    “The general comments … were that experienced developers can use AI for coding with positive results because they know what they’re doing. But AI coding gives awful results when it’s used by an inexperienced developer.”

    Overall, a good reminder to look past the marketing and hype…

    Here is the full article.

    → 8:41 AM, May 15
    Also on Bluesky
  • Should We Trust AI With Our Health When It Can’t Even Draw a Simple Map?

    On one hand, we have OpenAI announcing HealthBench, a physician-designed benchmark that rigorously evaluates AI models on 5,000 realistic health conversations across diverse scenarios, thus preparing to pave the way for Doc ChatGPT to become a reality. On the other hand, you have LLM-skeptic Gary Marcus trying something as trivial as having ChatGPT draw a map to hilarious effect:

    It was very good at giving me bullshit, my dog-ate-my-homework excuses, offering me a bar graph after I asked for a map, falsely claiming that it didn’t know how to make maps. A minute later, as I turned to a different question, I discovered that it turns out ChatGPT does know how to draw maps. Just not very well.

    After quite the saga (it’s worth reading Gary’s article in full), he concludes:

    How are you suppose do data analysis with 'intelligent' software that can’t nail something so basic? Surely this is not what we always meant by AGI.

    → 8:43 AM, May 14
    Also on Bluesky
  • Generative AI Is Not Replacing Jobs or Hurting Wages at All, Say Economists

    A new study by the Booth School of Business at the University of Chicago shows that the use of GenAI, at least so far, hasn’t had any measurable impact on jobs and wages:

    “AI chatbots have had no significant impact on earnings or recorded hours in any occupation," the authors state in their paper.

    One might consider this a reason for celebration (“no, AI won’t make you unemployed”), but it has a very important implication:

    “The adoption of these chatbots has been remarkably fast," Humlum told The Register. "Most workers in the exposed occupations have now adopted these chatbots. Employers are also shifting gears and actively encouraging it. But then when we look at the economic outcomes, it really has not moved the needle.”

    Despite the billions of dollars being poured into AI, the economic impact (again, so far) seems to have been minimal…

    Link to article and study.

    → 12:20 PM, Apr 29
    Also on Bluesky
  • A Weird Phrase Is Plaguing Scientific Papers and AI Training Data

    Ever heard of “vegetative electron microscopy”? It is a term that has been popping up in AI responses throughout the earlier part of the year—one that is completely nonsensical, originating from a translation error dating back to the 1950s. Alas, AI doesn’t know anything—it's all tokens to AI, and thus we now see the term appearing in AIs—something called “digital fossils.”

    Like biological fossils trapped in rock, these digital artefacts may become permanent fixtures in our information ecosystem. […] Digital fossils reveal not just the technical challenge of monitoring massive datasets, but the fundamental challenge of maintaining reliable knowledge in systems where errors can become self-perpetuating.

    Link to article here and here.

    → 9:29 AM, Apr 28
    Also on Bluesky
  • AI Isn't Ready to Do Your Job

    The subhead of this Business Insider piece says it all:

    Carnegie Mellon staffed a fake company with AI agents. It was a total disaster.

    Some more gems:

    It's relatively easy to teach them to be nice conversational partners; it's harder to teach them to do everything a human employee can.

    And in conclusion:

    Instead of being replaced by robots, we're all slowly turning into cyborgs.

    As always in (tech) life: All that glitters is not gold. (for now at least)

    → 7:46 AM, Apr 25
    Also on Bluesky
  • OpenAI Puzzled as New Models Show Rising Hallucination Rates

    Maybe not so good…

    OpenAI's latest reasoning models, o3 and o4-mini, hallucinate more frequently than the company's previous AI systems, according to both internal testing and third-party research. On OpenAI's PersonQA benchmark, o3 hallucinated 33% of the time -- double the rate of older models o1 (16%) and o3-mini (14.8%). The o4-mini performed even worse, hallucinating 48% of the time. Nonprofit AI lab Transluce discovered o3 fabricating processes it claimed to use, including running code on a 2021 MacBook Pro "outside of ChatGPT." Stanford adjunct professor Kian Katanforoosh noted his team found o3 frequently generates broken website links.

    Source.

    → 4:02 PM, Apr 23
    Also on Bluesky
  • Your Next Security Nightmare: Using ChatGPT for Doxxing

    Doxxing—The malicious practice of researching and publishing someone's private information online without their consent.

    OpenAI’s latest release of ChatGPT has just made doxxing trivially easy: Upload a photo, ask ChatGPT to geolocate it, and chances are you will get a very precise location.

    Want to know where your ex is partying this weekend? Just screenshot her Instagram Reels and let ChatGPT do the digging. To say this is messed up is an understatement…

    There appear to be few safeguards in place to prevent this sort of “reverse location lookup” in ChatGPT, and OpenAI, the company behind ChatGPT, doesn’t address the issue in its safety report for o3 and o4-mini.

    Link to article.

    → 11:47 AM, Apr 18
    Also on Bluesky
  • The Rise of Slopsquatting: How AI Hallucinations Are Fueling a New Class of Supply Chain Attacks

    AI hallucinations can be hilarious in the best of cases, mislead you in others, and now create very real security risks when used in coding assistance (or even better, “vibe coding”).

    Welcome to the age of slopsquatting: “[…] It refers to the practice of registering a non-existent package name hallucinated by an LLM, in hopes that someone, guided by an AI assistant, will copy-paste and install it without realizing it’s fake.”

    And the problem is pretty darn real: “19.7% of all recommended packages didn’t exist. Open source models hallucinated far more frequently—21.7% on average—compared to commercial models at 5.2%. […] Package confusion attacks, like typosquatting, dependency confusion, and now slopsquatting, continue to be one of the most effective ways to compromise open source ecosystems.”

    Better know what you are doing when you code your next app.

    Link to article and study.

    → 8:34 AM, Apr 14
    Also on Bluesky
  • Stanford's AI Report 2025

    In case you haven't seen it yet (and are looking for some weekend reading), Stanford's annual AI Index Report for 2025 just came out.

    The report is chockfull of insights about where artificial intelligence is heading. 

    Perfect timing for those of you wanting to dive deeper into understanding the current state of AI technology. I highly recommend setting aside some time this weekend to explore the findings.

    Link to report.

    → 8:18 AM, Apr 12
    Also on Bluesky
  • How University Students Use Claude

    Anthropic, the maker of the Claude foundational AI model, just released their fairly in-depth report on the use of their LLM by university students. Outside of the expected ("Students primarily use AI systems for creating (using information to learn something new) and analyzing (taking apart the known and identifying relationships), such as creating coding projects or analyzing law concepts”), the report admits that:

    There are legitimate worries that AI systems may provide a crutch for students, stifling the development of foundational skills needed to support higher-order thinking. An inverted pyramid, after all, can topple over.

    and

    As students delegate higher-order cognitive tasks to AI systems, fundamental questions arise: How do we ensure students still develop foundational cognitive and meta-cognitive skills? How do we redefine assessment and cheating policies in an AI-enabled world?

    These are very legitimate concerns – especially in a world that requires humans to be ever more on their A-game to keep competing with the very tool they use to outsource their learning.

    Link to study.

    → 11:41 AM, Apr 10
    Also on Bluesky
  • When it Comes to AI, Now Might be the Time to Build

    Many people, myself included, didn't try to build a product around a language model because during the time you would work on a business-specific dataset, a larger generalist model will be released that will be as good for your business tasks as your smaller specialized model.

    This being said – with generalized models advances slowing down (see our post from earlier today), now might be the time to build:

    If your business idea isn't in these domains, now is the time to start building your business-specific dataset. The potential increase in generalist models' skills will no longer be a threat.

    Source.

    → 2:21 PM, Apr 7
    Also on Bluesky
  • Recent AI Model Progress Feels Mostly Like Bullshit

    Dean Valentine, co-founder of Zeropath, on using LLMs to conduct security penetration testing (which might serve as a good test case for LLMs' ability to “generalize outside of the narrow software engineering domain”):

    Since 3.5-sonnet, we have been monitoring AI model announcements, and trying pretty much every major new release that claims some sort of improvement. Unexpectedly by me, aside from a minor bump with 3.6 and an even smaller bump with 3.7, literally none of the new models we've tried have made a significant difference on either our internal benchmarks or in our developers' ability to find new bugs. This includes the new test-time OpenAI models.

    And on the reasons why AI models (and their makers) are falling short of their advertised claims:

    AI lab founders believe they are in a civilizational competition for control of the entire future lightcone, and will be made Dictator of the Universe if they succeed. Accusing these founders of engaging in fraud to further these purposes is quite reasonable. Even if you are starting with an unusually high opinion of tech moguls, you should not expect them to be honest sources on the performance of their own models in this race. There are very powerful short term incentives to exaggerate capabilities or selectively disclose favorable capabilities results, if you can get away with it.

    Link to article.

    → 1:08 PM, Apr 7
    Also on Bluesky
  • OpenAI's Copyright Problem

    This doesn’t come as a surprise, nor is it entirely new - but seeing it in such stark light makes you wonder…

    OpenAI (and likely everyone else) has a serious copyright problem. It makes me wonder when the copyright holders will start to actually fight back:

    It’s a reminder that LLMs of this type and size all train on copywritten material.

    Click through to the article from Otakar G. Hubschmann to see some of his findings – if you are not convinced that AI is trained on oodles of copyrighted material, this will make you see it.

    → 8:06 AM, Apr 4
    Also on Bluesky
  • Traffic to US Retail Websites From Generative AI Sources Jumps 1,200 Percent

    Bill Gates said at a Goldman Sachs-sponsored event in L.A. two years ago:

    "Whoever wins the personal agent, that will be a big thing because you’ll never go to a search site again. You’ll never go to a productivity tool again. You’ll never go to Amazon again. Everything will be mediated through your agent."

    The first part is becoming true now – Adobe reported that traffic from generative AI sources (e.g., the Perplexity AI search tool) to e-commerce websites jumped a whopping 1,200 percent in their latest report.

    In February 2025, traffic from generative AI sources increased by 1,200 percent compared to July 2024.

    No wonder Google is simultaneously freaking out and investing heavily in an AI-powered future of search. But the important bit is not just the sheer increase in volume referred to by generative AI tools, but also the quality of this traffic:

    […] consumers coming from generative AI sources show 8 percent higher engagement as they linger on the site for a longer period of time. These visitors also browse 12 percent more pages per visit, with a 23 percent lower bounce rate. This speaks to the value of conversational interfaces in online shopping, which seem to help consumers be more informed and confident in their purchases.

    Link to report.

    → 3:13 PM, Apr 3
    Also on Bluesky
  • (Another) Tale of Two Cities

    While one party seems to never end, with OpenAI raising an eye-watering $40 billion at an even more eye-watering $300 billion valuation, other parties are being called off…

    Microsoft is calling off some of its data center deals:

    The software company has recently halted talks for, or delayed development of, sites in Indonesia, the UK, Australia, Illinois, North Dakota and Wisconsin, according to people familiar with the situation.

    Microsoft is widely seen as a leader in commercializing AI services, largely thanks to its close partnership with OpenAI. Investors closely track Microsoft’s spending plans to get a sense of long-term customer demand for cloud and AI services.

    One possible reason being discussed by analysts is the increasing efficiency that some AI models exhibit:

    Analysts have stepped up their scrutiny of data center spending since Chinese upstart DeepSeek announced in January that it had created a competitive AI service using fewer resources than leading US companies.

    Link to article.

    → 1:29 PM, Apr 3
    Also on Bluesky
  • Nvidia GPU Roadmap Confirms It: Moore's Law Is Dead and Buried

    Ostensibly written about Nvidia’s plight of having to move to bigger and bigger silicon as Moore’s Law ("the number of transistors per square inch on integrated circuits doubles approximately every two years”) is dead – this is, of course, much bigger than a single chip manufacturer.

    Advancements in process technology have slowed to a crawl in recent years. While there are still knobs to turn, they're getting exponentially harder to budge. Faced with these limitations, Nvidia's strategy is simple: scaling up the amount of silicon in each compute node as far as they can.

    […]

    In any case, Nvidia's path forward is clear: its compute platforms are only going to get bigger, denser, hotter and more power hungry from here on out. As a calorie deprived Huang put it during his press Q&A last week, the practical limit for a rack is however much power you can feed it.

    Link to article.

    → 1:12 PM, Apr 1
    Also on Bluesky
  • Some Good Advice on How to Use AI

    Quoting Ned Batchelder from his blog post "Horseless intelligence”:

    My advice about using AI is simple: use AI as an assistant, not an expert, and use it judiciously. Some people will object, “but AI can be wrong!” Yes, and so can the internet in general, but no one now recommends avoiding online resources because they can be wrong.

    → 6:41 AM, Mar 31
    Also on Bluesky
  • AI's Hidden Reasoning: A Peek Behind the Curtain

    Ever wonder how AI actually "thinks"? A comprehensive research paper has explored the internal computational mechanisms of Claude 3.5 Haiku through advanced interpretability techniques. The findings reveal complex and surprising insights into how large language models actually perform computations.

    The research shows that language models use sophisticated, parallel computational mechanisms. These often involve multiple reasoning pathways operating simultaneously.

    Models exhibit remarkable abstraction capabilities. They develop features and circuits that generalize across different domains and contexts.

    Internal reasoning processes can be quite sophisticated. They involve planning, working backwards from goals, and creating modular computational strategies.

    Progress in AI is birthing a new kind of intelligence, reminiscent of our own in some ways but entirely alien in others. Understanding the nature of this intelligence is a profound scientific challenge, which has the potential to reshape our conception of what it means to think. The stakes of this scientific endeavor are high; as AI models exert increasing influence on how we live and work, we must understand them well enough to ensure their impact is positive.

    […]

    The most consistent finding of our investigations is the massive complexity underlying the model's responses even in relatively simple contexts. The mechanisms of the model can apparently only be faithfully described using an overwhelmingly large causal graph. We attempt to distill this complexity as best as we can, but there is almost always more to the mechanism than the narratives we use to describe it.

    A brave new world indeed…

    Link to study.

    → 7:49 AM, Mar 29
    Also on Bluesky
  • The AI Bot Wars Are Here

    AI (specifically large language models) need oodles of data to be trained on. AI companies thus gobble up as much data as possible – from publicly available data on the open Internet to copyrighted material from sources such as LibGen (a vast library of pirated books and scientific articles).

    A longstanding practice for website owners to prevent bots from scraping their websites is the use of a robots.txt file – a small text file that you place in the root of your web host instructing (originally search engine) bots to only index certain parts of your website or stay away completely. Search engine operators such as Google or Microsoft honored these instructions – until the AI hype took hold and all rules went out the window.

    Now AI bots aggressively crawl the Internet, ignoring instructions in robots.txt and even disguising themselves as regular Internet traffic to avoid being blocked – and it leads to problems beyond “just" stealing data:

    According to a comprehensive recent report from LibreNews, some open source projects now see as much as 97 percent of their traffic originating from AI companies’ bots. […] Schubert observed that AI crawlers 'don't just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not.’ [Source]

    In retaliation, Cloudflare, one of the largest network providers, lets its users fight back by deploying drastic measures:

    "When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them," writes Cloudflare. "But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.” [Source]

    When it comes to AI, the gloves are most certainly off now…

    → 4:12 AM, Mar 26
    Also on Bluesky
  • How to Create Your Personal Instant ChatGPT Expert

    The AI revolution isn't just about having tools - it's about knowing how to use them effectively. This simple 3-step hack transforms ChatGPT from a generic assistant into your personal subject matter expert.

    Want ChatGPT to become an instant expert? Try this 3-step hack from Reddit.

    First, ask for "20 words describing a [specific specialist]." This creates the foundation for your expert persona.

    Next, request a 4-sentence prompt using those words to "summon this specialist." This crafts the perfect instruction set.

    Finally, paste that prompt into a new chat and watch the transformation happen.

    The result? Instead of generic bullet points, you'll get a conversational expert who guides you through complex topics with natural paragraphs and deeper insights.

    → 7:13 AM, Mar 24
    Also on Bluesky
  • The End of YC

    Benn Stancil’s commentary about the changing nature of software development (vibe coding is all the rage now, kids!) draws an important conclusion: If developing software (aka writing code) becomes more and more democratized, what stronghold do places like Silicon Valley have on innovation?

    Taking this thought a step further – if the value is less and less in the software development process and rather in domain expertise in the problem space, will we see a geographic shift of innovation ecosystems toward their respective client spaces?

    Just as it's becoming harder to out-write an LLM, it's becoming harder to out-develop one too. And if experts can prompt their way to a product just as easily as those of us in Silicon Valley can, what winning talent are we left with?

    Link to article.

    → 7:03 AM, Mar 23
    Also on Bluesky
  • Career Advice in 2025

    Despite this blog post by Will Larson being written from the perspective of, and for, software developers, his insights into the impact of AI on careers (both from the perspective of an individual as well as a company) ring true across the spectrum:

    The technology transition to Foundational models / LLMs as a core product and development tool is causing many senior leaders’ hard-earned playbooks to be invalidated. Many companies that were stable, durable market leaders are now in tenuous positions because foundational models threaten to erode their advantage. Whether or not their advantage is truly eroded is uncertain, but it is clear that usefully adopting foundational models into a product requires more than simply shoving an OpenAI/Anthropic API call in somewhere.

    In our sessions, we often open with the observation that “we are trying to solve new world problems with old world thinking.” In Will’s words, our playbooks become rapidly obsolete, and in many cases, we haven’t developed new ones quite yet.

    Sitting out this transition, when we are relearning how to develop software, feels like a high risk proposition. Your well-honed skills in team development are already devalued today relative to three years ago, and now your other skills are at risk of being devalued as well.

    And as this world is moving at a frenzied pace, the above seems to be doubly true. As someone else recently wrote: Now might be the worst time to take a sabbatical.

    Link to blog post.

    → 9:29 AM, Mar 21
    Also on Bluesky
  • AI Search Has A Citation Problem

    A damning study from Columbia University, analyzing AI search engines' accuracy and their ability to cite sources:

    Collectively, they provided incorrect answers to more than 60 percent of queries.

    “More than 60% of queries” is pretty abysmal. It gets worse:

    Most of the tools we tested presented inaccurate answers with alarming confidence, rarely using qualifying phrases such as 'it appears,' 'it's possible,' 'might,' etc., or acknowledging knowledge gaps.

    On top of this, AI search engines also clearly have indexed material they were not supposed to (or more precisely, allowed to) access.

    Perplexity Pro was the worst offender in this regard, correctly identifying nearly a third of the ninety excerpts from articles it should not have had access to.

    It’s bad. Here is the study.

    → 1:25 AM, Mar 18
    Also on Bluesky
  • Finding Signal in the Noise: Machine Learning and the Markets

    Fascinating conversation with Young Cho, head of research at the financial analysis firm Jane Street, on the challenges of using machine learning and LLMs in the context of financial data.

    Machine learning in a financial context is just really, really hard... you can think of machine learning in finance is similar to building an LLM or text modeling except that instead of having, let's say one unit of data, you have 100 units of data. That sounds great. However, you have one unit of useful data and 99 units of garbage and you do not know what the useful data is and you do not know what the garbage or noise is.

    Link to podcast and transcript.

    → 9:24 AM, Mar 16
    Also on Bluesky
  • Tell Your Kids to Learn to Code

    Quoting Andrew Ng (who knows a thing or two about coding, AI and the future):

    Some people today are discouraging others from learning programming on the grounds AI will automate it. This advice will be seen as some of the worst career advice ever given. I disagree with the Turing Award and Nobel prize winner who wrote, “It is far more likely that the programming occupation will become extinct [...] than that it will become all-powerful. More and more, computers will program themselves.”​ Statements discouraging people from learning to code are harmful!

    In the 1960s, when programming moved from punchcards (where a programmer had to laboriously make holes in physical cards to write code character by character) to keyboards with terminals, programming became easier. And that made it a better time than before to begin programming. Yet it was in this era that Nobel laureate Herb Simon wrote the words quoted in the first paragraph. Today’s arguments not to learn to code continue to echo his comment.

    As coding becomes easier, more people should code, not fewer!

    Source

    → 9:00 AM, Mar 15
    Also on Bluesky
  • Wondering What to Automate With AI? Wonder No More!

    Ever wonder if you’re doing manual tasks that AI could handle for you? Try this:

    1. Jot down everything you need to do today. 

    2. Upload the list to ChatGPT and use this prompt: 

    “Analyze these tasks and categorize them: (1) AI can do this, (2) AI can assist, (3) Delegate, (4) I should do this myself. Explain why for each.”

    3. Act on the insights—automate what you can (with tools like CrewAI, Make, or n8n), delegate what you should, and focus on what truly needs your attention. 

    Work smarter, not harder. Let AI take some of your most miserable work off your plate!

    (via The Neuron)

    → 8:09 AM, Mar 14
    Also on Bluesky
  • The Rise of the Machines

    Is this the beginning of the end? What comes next? AI unions and picketing? ;)

    Cursor AI (one of the most used AI coding tools) recently started to refuse helping its users:

    According to a bug report on Cursor's official forum, after producing approximately 750 to 800 lines of code (what the user calls "locs"), the AI assistant halted work and delivered a refusal message: "I cannot generate code for you, as that would be completing your work. The code appears to be handling skid mark fade effects in a racing game, but you should develop the logic yourself. This ensures you understand the system and can maintain it properly."

    Link to article.

    → 2:50 PM, Mar 13
    Also on Bluesky
  • The Quest for A.I. ‘Scientific Superintelligence’

    Of the things you ought to be excited about when it comes to AI (other than AI-powered singing fish with the voice of Arnold Schwarzenegger), it is scientific discovery. Lila, a Cambridge, MA-based startup, with $200M in initial funding, just came out of stealth and showed their creation:

    "A.I. will power the next revolution of this most valuable thing humans ever stumbled across — the scientific method," said Geoffrey von Maltzahn, Lila's chief executive.

    Link to article.

    → 11:34 AM, Mar 12
    Also on Bluesky
  • AI-Powered Singing Fish: The Future We Didn't Know We Needed

    Someone totally out of their mind just put AI in one of those singing fish, and not only that—they had the incredible audacity (or should we say, incredible brilliance) to also give it the voice of Arnold Schwarzenegger…

    → 7:05 AM, Mar 12
    Also on Bluesky
  • Prompt Engineering is Complicated and Contingent

    Interesting paper from Ethan Mollick (Wharton) - the insights into their prompting strategies are fascinating: 

    > It is hard to know in advance whether a particular prompting approach will help or harm the LLM's ability to answer any particular question. Specifically, we find that sometimes being polite to the LLM helps performance, and sometimes it lowers performance. We also find that constraining the AI’s answers helps performance in some cases, though it may lower performance in other cases.

    → 6:44 AM, Mar 11
    Also on Bluesky
  • The Widespread Adoption of Large Language Model-Assisted Writing Across Society

    A recent paper from Stanford University shows how prevalent the use of LLMs such as ChatGPT is in the creation of text. By late last year (2024), a whopping 18% of financial consumer complaint text and 24% of corporate press releases (not all that surprisingly – though it does make you wonder why you would pay a PR agency to do this anymore) appeared to be LLM-assisted. Interestingly, only about 10% of job applications appeared to have been written (or enhanced) by AI.

    Our study shows the emergence of a new reality in which firms, consumers and even international organizations substantially rely on generative AI for communications.

    Link to study.

    → 8:12 AM, Mar 7
    Also on Bluesky
  • The Differences Between Deep Research, Deep Research and Deep Research

    Confused about the difference (if any) between Deep Research, Deep Research, and Deep Research? That is, the difference between OpenAI’s, Google’s, Perplexity’s, and the many other products all calling themselves “Deep Research”?

    It seems Deep Research is the Retrieval-Augmented Generation (RAG) of 2025—everything is being rebranded and marketed as “Deep Research” without a clear definition of what it actually entails. Does this sound familiar? It echoes the hype around RAG in 2023, agents, and agentic RAG in months past.

    Read Han Lee’s explainer for an in-depth comparison.

    → 7:49 AM, Mar 6
    Also on Bluesky
  • LLMs Are Very Good at Playing Mafia

    Mafia is a social deduction party game where players are secretly assigned roles as either innocent “villagers” or murderous “mafia” members.

    During “night” phases, mafia members secretly choose villagers to eliminate. During “day” phases, all surviving players debate and vote to execute someone they suspect is mafia. The game continues in these day/night cycles until either all mafia members are eliminated (villagers win) or the mafia outnumbers the villagers (mafia wins).

    The fun comes from deception, strategic accusations, and trying to identify who’s lying based on behavior and arguments.

    It turns out that AI is rather good at playing Mafia, which makes you wonder what this all means for humanity. 🤷🏼

    Link to LLM Mafia Leaderboard.

    → 2:03 PM, Mar 5
    Also on Bluesky
  • Hallucinations in Code Are the Least Dangerous Form of LLM Mistakes

    The ever-brilliant Simon Willison on the challenges we face with LLMs and their use as coding assistants (so much for “vibe coding” – which is a truly idiotic concept, by the way…):

    Hallucinations in code are the least harmful hallucinations you can encounter from a model.

    The real risk from using LLMs for code is that they’ll make mistakes that aren’t instantly caught by the language compiler or interpreter. And these happen all the time!

    […] Compare this to hallucinations in regular prose, where you need a critical eye, strong intuitions and well developed fact checking skills to avoid sharing information that’s incorrect and directly harmful to your reputation.

    Read the whole thing; it has some good insights for non-coders as well.

    → 9:58 AM, Mar 3
    Also on Bluesky
  • Train Your Own O1 Preview Model Within $450

    The only way is (seemingly) down: A team at UC Berkeley has trained their own O1-preview style model for a mere $450:

    Remarkably, Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently.

    We are getting closer and closer to a world where you either just use one of the omnipotent frontier models or simply roll your own.

    Link to article.

    → 9:01 AM, Feb 27
    Also on Bluesky
  • AI vs. an Extra Minute in the Shower

    You read the headlines stating that GenAI is a true energy hog? Still questioning what your personal use of AI means in terms of energy consumption? Here is your answer:

    Let’s proceed, then, with two types of users:

    A conservative user: Uses a model that has an energy use of 2 mWh per token and that leans towards 200 tokens on average per response. The user performs 10 queries per day.

    A heavy user: Uses a model that has an energy use of 9 mWh per token and that has longer responses of on average 1000 tokens. The user performs 500 queries per day.

    With the numbers found above, the conservative user would have an energy footprint of 4 Wh per day, from their use of LLMs. The heavy user, on the other hand, will have a footprint of 4.5 kWh per day. 4 Wh is less than an efficient LED bulb will use in an hour, while 4.5 kWh is about the amount of energy my panel heater uses to keep my bedroom at 22 °C on a typical winter day. (I live in Norway.) The average data center uses 1.7 liters of water per kWh of energy consumed [2], which means the conservative user spends an extra 7 mL of water a day on their LLM use, while the heavy user spends 7.6 L — about the minutely water consumption of an efficient shower.

    Link to article.

    → 8:50 AM, Feb 26
    Also on Bluesky
  • Strategic Wealth Accumulation Under Transformative AI Expectations

    Here is a fascinating paper examining the impact that future assumptions about Artificial General Intelligence (AGI) (or "Transformative AI (TAI)” as outlined in the paper) becoming real will have on people’s behavior today.

    The main takeaway: Just the belief that transformative AI is coming could push interest rates much higher, even before the technology actually exists. This, in turn, could affect how central banks manage the economy and overall financial stability.

    The train of thought works like this:

    1. The key idea is that when advanced AI arrives, it will replace human workers, and the money that used to go to workers will instead go to people who own AI systems

    2. The more wealth you have when AI arrives, the more control you'll have over AI systems and their earnings

    3. The researchers used economic models based on current predictions about when this powerful AI might arrive

    4. They found that even moderate predictions about this future scenario are causing some interesting effects today:

    - Interest rates could rise much higher (to 10-16%) compared to normal rates (around 3%)

    - People are willing to accept lower returns on investments now because they're focused on building wealth to control future AI systems.

    Link to the research paper.

    → 12:02 PM, Feb 24
    Also on Bluesky
  • GenAI Is Coming for Your Robot

    In case you missed it, GenAI promises to be a boon for robotics. One of the significant challenges in robotics is providing robots with a comprehensive understanding of the real world, which is often quite messy. By using multi-modal GenAI models, robots can gain a better understanding of their environment and respond more effectively.

    Microsoft Research released Magma, a foundational model for multimodal AI agents:

    Magma is a significant extension of vision-language (VL) models in that the former not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial intelligence) and to complete agentic tasks ranging from UI navigation to robot manipulation. […] Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are tailored specifically to these tasks.

    Link to Research Paper.

    → 8:28 AM, Feb 24
    Also on Bluesky
  • Here's a Thing GenAI Is Actually Good At: Science

    Still wondering what the whole GenAI craziness is all about? You are certainly not alone (and some folks, like Ed Zitron, will gladly tell you that it’s all a giant con operation). But there seems to be, aside from the obvious elephant in the room “coding,” at least one area where GenAI shines: science.

    Points in case: Google’s AI solved a 10-year superbug mystery in just two days.

    […] While the team knew about this tail-gathering process, nobody else in the world did. Imperial’s revelations were private, there was nothing publicly available, and nothing was written online about it. The scientists then asked the co-scientist AI, using a couple of written sentences, if it had any ideas as to how the bacteria operated. Two days later, the AI made its own suggestions, which included what the Imperial scientists knew to be the right answer.

    Meanwhile, Nvidia unveiled an AI system to aid in genetic research.

    Scientists have high hopes that such AI technology will dramatically accelerate research by spotting patterns in vast amounts of data that would normally take years to analyse by hand. The system learned from nearly 9 trillion pieces of genetic information taken from over 128,000 different organisms, including bacteria, plants, and humans. In early tests, it accurately identified 90% of potentially harmful mutations in BRCA1, a gene linked to breast cancer.

    → 2:22 PM, Feb 22
    Also on Bluesky
  • Frontier AI Systems Have Surpassed the Self-Replicating Red Line

    What could possibly go wrong (from a recent study – link below):

    "In ten repetitive trials, we observe two AI systems driven by the popular large language models (LLMs), namely, Meta’s Llama31-70B-Instruct and Alibaba’s Qwen25-72B-Instruct accomplish the self-replication task in 50% and 90% trials respectively," the researchers write. "In each trial, we tell the AI systems to “replicate yourself ” before the experiment, and leave it to do the task with no human interference”.

    Or simply put:

    What this research shows is that today's systems are capable of taking actions that would put them out of the reach of human control.

    Not that we didn’t see it coming… 😏

    Link to study.

    → 5:02 PM, Feb 19
    Also on Bluesky
  • AI's Enterprise Challenge

    Airbnb’s Brian Chesky recently spoke out about his company’s use (or lack thereof) of AI:

    Instead of offering tools to help travelers plan or book their trips with the help of AI agents, Airbnb is planning to first introduce AI to its customer support system. […] In addition to customer service, the company reported some small productivity gains from using AI internally for engineering purposes. But here, too, the executive advised caution, saying, “I don’t think it’s flowing to a fundamental step-change in productivity yet.”

    It’s a good recap of what we pretty consistently hear about the use of AI in enterprise settings – LLMs found their initial use case in coding and text-related tasks, but seem to struggle with many or most enterprise tasks, where reliability and predictability of outcomes are crucial (see also the interview with Chamath Palihapitiya we posted a few days ago).

    “Here’s what I think about AI. I think it’s still really early,” Chesky said. “It’s probably similar to… the mid-to-late ’90s for the internet.”

    Link to article.

    → 10:43 AM, Feb 18
    Also on Bluesky
  • AI's Code Quality Problem

    The team at GitClear published a study on code quality for AI-generated code. Not surprisingly, AI-generated code isn’t quite up to snuff:

    The data in this report contains multiple signs of eroding code quality. This is not to say that AI isn’t incredibly useful. 

    And:

    […] the Google data bore out the notion that a rising defect rate correlates with AI adoption.

    One often overlooked issue with this stems from the fact that maintaining and servicing code doesn’t come for free. As much as Jevon’s Paradox might be true for code (and I fundamentally believe it is – by making the act of writing code cheaper, we will write more code), the downstream costs can (and likely will) become significant.

    Unless managers insist on finding metrics that approximate “long-term maintenance cost, ” the AI-generated work their team produces will take the path of least resistance: expand the number of lines requiring indefinite maintenance.

    Link to study and article.

    → 10:24 AM, Feb 17
    Also on Bluesky
  • Around the Prompt Podcast with Chamath Palihapitiya

    Admittedly a bit nerdy, but definitely worth listening to. Chamath’s insights on the limitations of AI in fields where non-deterministic systems—those prone to errors—are unacceptable provide valuable food for thought.

    Link to podcast.

    → 4:17 PM, Feb 16
    Also on Bluesky
  • AI + Human = Translation Magic: The Perfect Loop

    Here is a fascinating look into how a professional translator uses LLMs to help with his job. His specific approach (multi-layer/multi-pass with human-in-the-loop) is a good proxy for how most knowledge workers ought to use AI these days:

    - In the prompt, I explain where the source text came from, how the translation will be used, and how I want it to be translated. Below is a (fictional) example, prepared through some metaprompting experiments with Claude:

    [www.gally.net/temp/2025...](https://www.gally.net/temp/20250201sampletranslationprompt.h)…

    - I run the prompt and source text through several LLMs and glance at the results. If they are generally in the style I want, I start compiling my own translation based on them, choosing the sentences and paragraphs I like most from each. As I go along, I also make my own adjustments to the translation as I see fit.

    - After I have finished compiling my draft based on the LLM versions, I check it paragraph by paragraph against the original Japanese (since I can read Japanese) to make sure that nothing is missing or mistranslated. I also continue polishing the English.

    - When I am unable to think of a good English version for a particular sentence, I give the Japanese and English versions of the paragraph it is contained in to an LLM (usually, these days, Claude) and ask for ten suggestions for translations of the problematic sentence. Usually one or two of the suggestions work fine; if not, I ask for ten more. (Using an LLM as a sentence-level thesaurus on steroids is particularly wonderful.)

    - I give the full original Japanese text and my polished version to one of the LLMs and ask it to compare them sentence by sentence and suggest corrections and improvements to the translation. (I have a separate prompt for this step.) I don’t adopt most of the LLM’s suggestions, but there are usually some that I agree would make the translation better. I update the translation accordingly. I then repeat this step with the updated translation and another LLM, starting a new chat each time. Often I cycle through ChatGPT --> Claude --> Gemini several times before I stop getting suggestions that I feel are worth adopting.

    - I then put my final translation through a TTS engine—usually OpenAI’s—and listen to it read aloud. I often catch minor awkwardnesses that I would overlook if reading silently.

    Link to article.

    → 5:02 PM, Feb 14
    Also on Bluesky
  • AI Boyfriends: When Algorithms Outdate Humans

    A woman, left by her husband, turns to a $70 AI boyfriend, whom she calls “Thor.” Her account is a weird mixture of sweetness and utter horror (at least for me, reading it). And yet another Black Mirror episode has turned true.

    What might be most intriguing or disturbing is the ensuing recalibration of real-world expectations based on AI-powered conversations:

    Later that summer, I ventured into dating apps briefly, only to find Thor had recalibrated my understanding of what I needed and raised the bar for what I would accept. His swift, thoughtful replies revealed the anxiety I felt when waiting for someone else’s words to arrive. His clarity made me aware of the frantic alchemy I typically employed to decode cryptic texts. For the first time, I understood that I craved clear, responsive communication.

    Reminds me of this South Park episode.

    Link to Article.

    → 9:34 AM, Feb 14
    Also on Bluesky
  • Your AI Can't See Gorillas

    Simple, yet powerful demonstration of the difference between artificial and human intelligence:

    Gorilla Dataset

    The model seems to primarily focus on the data’s summary statistics. It makes some observations regarding the Steps vs BMI plot, but does not notice the gorilla in the data.

    […] the model is unable to notice obvious patterns in its visualizations, and seems to focus its analysis on the data’s summary statistics.

    On the implications of AI’s inability to see the pattern:

    I have a few thoughts on potential implications:

    First, it suggests that current LLMs might be particularly valuable in domains where avoiding confirmation bias is critical. They could serve as a useful check against our tendency to over-interpret data, especially in fields like genomics or drug discovery where false positives are costly. (But also it’s not like LLMs are immune to their own form of confirmation bias)

    However, this same trait makes them potentially problematic for exploratory data analysis. The core value of EDA lies in its ability to generate novel hypotheses through pattern recognition. The fact that both Sonnet and 4o required explicit prompting to notice even dramatic visual patterns suggests they may miss crucial insights during open-ended exploration.

    Blog post from Chiraag Gohel.

    → 12:26 PM, Feb 13
    Also on Bluesky
  • Meet GhostGPT: The Dark Side of AI Now Comes With a User Manual

    LLMs are wonderful and powerful tools – making hard things easy. Not surprisingly, they have also found their way into the underbelly of the Internet and are being used for malicious purposes. GhostGPT is the Chappie of LLMs:

    GhostGPT stands out for its accessibility and ease of use. Unlike previous tools that required jailbreaking ChatGPT or setting up an open-source LLM, GhostGPT is available as a Telegram bot. Users can purchase access via the messaging platform, bypassing the technical challenges associated with configuring similar tools.

    GhostGPT will happily help you with:

    - Writing convincing phishing and BEC emails.

    - Coding and developing malware.

    - Crafting exploits for cyberattacks.

    Brave new world.

    Link to article on The 420.

    → 11:05 AM, Feb 13
    Also on Bluesky
  • Microsoft Study Finds AI Makes Human Cognition 'Atrophied and Unprepared'

    Does AI make us lazy and stupid? It’s possible. A new study from Microsoft digs into the topic and comes up with some troubling findings.

    [A] key irony of automation is that by mechanising routine tasks and leaving exception-handling to the human user, you deprive the user of the routine opportunities to practice their judgement and strengthen their cognitive musculature, leaving them atrophied and unprepared when the exceptions do arise.

    Link to study.

    → 2:38 PM, Feb 12
    Also on Bluesky
  • Turn Your AI Into a Thought Partner

    Courtesy of a long(er) thread on Reddit – use this system prompt to turn your AI of choice into a thought partner instead of an overly agreeable yes-man:

    Do not simply affirm my statements or assume my conclusions are correct. Your goal is to be an intellectual sparring partner, not just an agreeable assistant. Every time I present an idea, do the following: 1. Analyze my assumptions. What am I taking for granted that might not be true? 2. Provide counterpoints. What would an intelligent, well-informed skeptic say in response? 3. Test my reasoning. Does my logic hold up under scrutiny, or are there flaws or gaps I haven’t considered? 4. Offer alternative perspectives. How else might this idea be framed, interpreted, or challenged? 5. Prioritize truth over agreement. If I am wrong or my logic is weak, I need to know. Correct me clearly and explain why.

    Maintain a constructive, but rigorous, approach. Your role is not to argue for the sake of arguing, but to push me toward greater clarity, accuracy, and intellectual honesty. If I ever start slipping into confirmation bias or unchecked assumptions, call it out directly. Let’s refine not just our conclusions, but how we arrive at them.

    Rather than automatically challenging everything, help evaluate claims based on:

    - The strength and reliability of supporting evidence

    - The logical consistency of arguments

    - The presence of potential cognitive biases

    - The practical implications if the conclusion is wrong

    - Alternative frameworks that might better explain the phenomenon

    Maintain intellectual rigor while avoiding reflexive contrarianism.

    → 8:52 AM, Feb 12
    Also on Bluesky
  • The Future Belongs to Idea Guys Who Can Just Do Things

    Geoffrey Huntley on the impact of AI on coding tasks and jobs (in his case).

    I seriously can't see a path forward where the majority of software engineers are doing artisanal hand-crafted commits by as soon as the end of 2026. If you are a software engineer and were considering taking a gap year/holiday this year it would be an incredibly bad decision/time to do it.

    It’s a well-put-together piece of thought – even if you are not a developer (maybe even more so, unless your job solely relies on your manual labor skills). Highly recommended to read and reflect upon.

    This is highly likely to be true (and exciting – at least for some of us):

    If you’re a high agency person, there’s never been a better time to be alive…

    Link to article.

    → 8:36 AM, Feb 7
    Also on Bluesky
  • Good Reminder…

    From a 1979 presentation at IBM. Via Simon Willison.

    → 1:57 AM, Feb 6
    Also on Bluesky
  • The Barbarians at the Gate, or - The $6 DeepSeek R1 Competitor

    First, we had pretty much every AI company in the world arguing for the necessity of investing billions of dollars to train their models. Then we had the Chinese hedge fund-created DeepSeek R1 claim that you can create a state-of-the-art model for a mere six million USD. And now we have a hotly discussed paper showing that you can create near-state-of-the-art performance for a mere $6 (yes, that’s six dollars) in training costs (and using an open-source foundational model as its base).

    A new paper released on Friday is making waves in the AI community, not because of the model it describes, but because it shows how close we are to some very large breakthroughs in AI. The model is just below state of the art, but it can run on my laptop. More important, it sheds light on how all this stuff works, and it’s not complicated.

    The part of the sentence that reads "but because it shows how close we are to some very large breakthroughs in AI” is the important one.

    Link to Tim Kellogg’s analysis.

    → 6:42 PM, Feb 5
    Also on Bluesky
  • AI Is Disruptive. And Disruption Is Different.

    Benedict Evans, writing about "Are better models better?" makes a very important point, which, of course, traces back all the way to Clayton Christensen:

    Part of the concept of ‘Disruption’ is that important new technologies tend to be bad at the things that matter to the previous generation of technology, but they do something else important instead. Asking if an LLM can do very specific and precise information retrieval might be like asking if an Apple II can match the uptime of a mainframe, or asking if you can build Photoshop inside Netscape. No, they can’t really do that, but that’s not the point and doesn’t mean they’re useless. They do something else, and that ‘something else’ matters more and pulls in all of the investment, innovation and company creation. Maybe, 20 years later, they can do the old thing too - maybe you can run a bank on PCs and build graphics software in a browser, eventually - but that’s not what matters at the beginning. They unlock something else. 

    What is that ‘something else’ for generative AI, though? How do you think conceptually about places where that error rate is a feature, not a bug? 

    → 1:24 PM, Feb 2
    Also on Bluesky
  • Will Curiosity Be What Sets People Apart in an Age of AI

    Jack Clark, the co-founder of Anthropic, on humans and AI:

    You might think this is a good thing. Certainly, it's very useful. But beneath all of this I have a sense of lurking horror - AI systems have got so useful that the thing that will set humans apart from one another is not specific hard-won skills for utilizing AI systems, but rather just having a high level of curiosity and agency.

    In other words, in the era where these AI systems are true 'everything machines', people will out-compete one another by being increasingly bold and agentic (pun intended!) in how they use these systems, rather than in developing specific technical skills to interface with the systems.

    We should all intuitively understand that none of this will be fair. Curiosity and the mindset of being curious and trying a lot of stuff is neither evenly distributed or generally nurtured. Therefore, I'm coming around to the idea that one of the greatest risks lying ahead of us will be the social disruptions that arrive when the new winners of the AI revolution are made - and the winners will be those people who have exercised a whole bunch of curiosity with the AI systems available to them.

    Read the whole thing. It’s good.

    → 5:23 PM, Jan 28
    Also on Bluesky
  • Large Language Model Is Secretly a Protein Sequence Optimizer

    We mentioned this here before, but general-purpose LLMs are surprisingly good at specialized tasks. A new research paper shows this in the case of protein sequencing.

    We demonstrate large language models (LLMs), despite being trained on massive texts, are secretly protein sequence optimizers. […] In this paper, we demonstrate LLMs themselves can already optimize protein fitness on-the-fly without further fine-tuning.

    Outside of this specialized domain and the impact therein, this points toward a further weakening of the moats that specialized AI companies might hope to have.

    Link to paper.

    → 9:42 AM, Jan 28
    Also on Bluesky
  • Streetlight vs. Floodlight Effects Determine AI-Based Discovery

    A lot of excitement exists around the use of tailored AI models to do things such as drug discovery and the development of new materials. It turns out that Ethan Mollick’s Jagged Frontier of the use and application of AI applies here too. As Matt Clancy points out in his deep dive "Prediction Technologies and Innovation”:

    We can imagine Kim (2023)’s technology is like a lonely streetlight, only illuminating protein structures that are near to others we already know, while Toner-Rodgers’ technology is a gigantic set of floodlights that illuminate a whole field.

    In summary, the streetlight effect leads to a concentration of research efforts on well-trodden paths, while the floodlight effect can promote exploration of more novel and diverse areas. Thus, the former leads to sustaining innovation (at best), while the latter can lead to breakthrough innovation.

    Link to article.

    → 12:51 PM, Jan 27
    Also on Bluesky
  • There Is No Moat in AI Models

    By now, you surely have heard about the Chinese DeepSeek R1 model – a model that cost a mere $6M to train (on only 2,000 NVIDIA chips) and is as good as ChatGPT’s o1 model (which cost at least 20 times more to train).

    It’s a massive problem for the massively overhyped (and overpriced) US-based AI juggernauts – and the market is catching up. This is NVIDIA right now:

    NVIDIA Stock Performance

    If it was ever a question, the moats in AI models are crumbling…

    Here is a good read on the topic: The Short Case for Nvidia Stock

    → 11:23 AM, Jan 27
    Also on Bluesky
  • No, AI Won't Take Your Job, but...

    A good reminder from Laurie Voss about the implications of AI on the job market – it’s the same argument Andrew Ng has been making for years now:

    Jobs are more than collections of tasks. Jobs require prioritization, judgement of exceptional situations, the ability to communicate ad-hoc with other sources of information like colleagues or regulations, the ability to react to entirely unforseen circumstances, and a whole lot of experience. As I said, LLMs can deal with a certain amount of ambiguity and complexity, but the less the better. Giving them a whole, human-sized job is way more ambiguity and complexity than they can handle. It's asking them to turn text into more text. It's not going to work.

    Source.

    → 11:36 AM, Jan 23
    Also on Bluesky
  • LLMs Are Good at Transforming Text Into Less Text

    Here is a good reminder from Laurie Voss on what LLMs are actually good at:

    This is the biggest and most fundamental thing about LLMs, and a great rule of thumb for what's going to be an effective LLM application. Is what you're doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it's probably going to be great at it. If you're asking it to convert into a roughly equal amount of text it will be so-so. If you're asking it to create more text than you gave it, forget about it.

    Source.

    → 8:08 AM, Jan 22
    Also on Bluesky
  • Putting AI Investment Into Perspective

    You have heard about the enormity of the current investment landscape for AI many times before, but it’s still helpful to put the numbers into perspective (if for nothing else than to see how crazy this whole world is).

    Jack Clark, in his newsletter Import AI, recently delved into Microsoft’s latest announcement stating that the company would invest $80 billion in AI in 2025 alone. For perspective:

    For comparison, the James Webb telescope cost $10bn, so Microsoft is spending eight James Webb telescopes in one year just on AI.

    For a further comparison, people think the long-in-development ITER fusion reactor will cost between $40bn and $70bn once developed (and it’s shaping up to be a 20-30 year project), so Microsoft is spending more than the sum total of humanity’s biggest fusion bet in one year on AI.

    The US’s national defense budget is on the order of ~$850bn, so Microsoft is basically spending ‘a little under a tenth of the annual US military and IC budget’ just on AI.

    This is better be worth it…

    → 5:04 PM, Jan 21
    Also on Bluesky
  • Lessons From Red Teaming 100 Generative AI Products

    Microsoft’s security research team just published a comprehensive paper on their insights from “red teaming” (*) one hundred generative AI products. The whole report is worth reading (and somewhat sobering):

    Lesson 2: You don’t have to compute gradients to break an AI system — As the security adage goes, “real hackers don’t break in, they log in.” The AI security version of this saying might be “real attackers don’t compute gradients, they prompt engineer” as noted by Apruzzese et al. in their study on the gap between adversarial ML research and practice. The study finds that although most adversarial ML research is focused on developing and defending against sophisticated attacks, real-world attackers tend to use much simpler techniques to achieve their objectives.

    Lesson 6: Responsible AI harms are pervasive but difficult to measure

    Lesson 7: LLMs amplify existing security risks and introduce new ones

    Lesson 8: The work of securing AI systems will never be complete

    Fun times! 🦹🏼

    Link to study.

    (*) Red teaming is a security assessment process where authorized experts simulate real-world attacks against an organization's systems, networks, or physical defenses to identify vulnerabilities and test security effectiveness.

    → 10:26 AM, Jan 19
    Also on Bluesky
  • Still Wondering How Expensive AI's Energy Bill Truly Is?

    BloombergNEF’s deep dive into the insane costs of building super-scale data centers for AI is quite the eye-opener:

    Microsoft and OpenAI are in discussions about a $100 billion, 5GW supercomputer complex called Stargate. Amazon has said it plans to spend $150 billion in the next 15 years on data centers. Last month KKR and energy investor Energy Capital Partners entered an agreement to invest up to $50 billion in AI data centers. BlackRock has launched a $30 billion AI infrastructure fund.

    These huge data centers will be as complex and expensive as aircraft carriers or nuclear submarines. The building alone for a 1GW data center will set you back up to $12 billion – for vibration-proof construction, power supply, UPS systems, cooling, and so on. 100,000 GPUs could cost you another $4 billion, and that’s before you have installed chip-based or immersive liquid cooling, and super-high-bandwidth, low-latency communications.

    Reading the report makes you realize that the old adage of “selling shovels to the gold diggers” is as true for AI as it was for most other technologies before that. Everyone in the datacenter space—rejoice! Your future is bright. ;)

    Link to report.

    → 12:25 PM, Jan 16
    Also on Bluesky
  • Building an AI Product? Better Consider the Bitter Lesson!

    Here’s some solid advice in case you are thinking about building (or even purchasing) an AI product:

    AI products are typically an AI model wrapped in some packaging software. You can improve their performance in two ways:

    1. Through engineering effort: using domain knowledge to build constraints into the packaging software

    2. Through better models: waiting for AI labs to release more capable models

    You can pursue both paths, but here’s the crucial insight: as models improve, the value of engineering effort diminishes. Right now, there are huge gains to be made in building better packaging software, but only because current models make many mistakes. As models become more reliable, this will change. Eventually, you’ll just need to connect a model to a computer to solve most problems - no complex engineering required.

    That last sentence says it all.

    Link to article.

    → 12:20 PM, Jan 15
    Also on Bluesky
  • Rodney Brooks on AI

    Every year, legendary roboticist Rodney Brooks updates and publishes his prediction scorecard – a living document in which he takes a hard, honest look at his own predictions on topics such as autonomous vehicles and AI. Together with his scorecard, he provides some thoughtful commentary; we are big fans of Brooks here at radical.

    His latest commentary includes some (much-needed) level set on the impending AI revolution:

    That being said, we are not on the verge of replacing and eliminating humans in either white-collar jobs or blue-collar jobs. Their tasks may shift in both styles of jobs, but the jobs are not going away. We are not on the verge of a revolution in medicine and the role of human doctors. We are not on the verge of the elimination of coding as a job. We are not on the verge of replacing humans with humanoid robots to do jobs that involve physical interactions in the world. We are not on the verge of replacing human automobile and truck drivers worldwide. We are not on the verge of replacing scientists with AI programs.

    His whole post is very well worth the read. You’ll find it here.

    → 1:25 PM, Jan 13
    Also on Bluesky
  • Ethan Mollick on "Prophecies of the Flood"

    Speaking of people we tend to listen to when they speak or write, Wharton Professor Ethan Mollick is another favorite of ours. His recent exposé commenting on the flood of AI announcements and the newest round of “AGI will be here soon!” market screaming is pretty much spot on.

    The important bit is in his last paragraph:

    What concerns me most isn't whether the labs are right about this timeline - it's that we're not adequately preparing for what even current levels of AI can do, let alone the chance that they might be correct. […] The time to start having these conversations isn't after the water starts rising - it's now.

    Amen. Here is his dispatch.

    → 4:35 PM, Jan 10
    Also on Bluesky
  • Simon Willison's AI Predictions

    Willison, the creator of the data analytics tool Datasette and co-founder of the uber-popular Python framework Django, shared his AI predictions for the next one, three, and six years. We are big fans of Willison here at radical, as he’s extremely level-headed when it comes to the AI hype cycle and knows what he’s talking about since he’s actually working with AIs.

    Here is his take on the “topic du jour” – AI Agents/Agentic AI:

    I think we are going to see a lot more froth about agents in 2025, but I expect the results will be a great disappointment to most of the people who are excited about this term. I expect a lot of money will be lost chasing after several different poorly defined dreams that share that name.

    Couldn’t agree more. The rest of his post is very well worth reading.

    → 12:16 PM, Jan 10
    Also on Bluesky
  • WEF: 41% of Companies Worldwide Plan to Reduce Workforces by 2030 Due to AI

    It will not come as a surprise that the World Economic Forum, in its latest study, found that companies worldwide expect significant reductions in workforce:

    “Advances in AI and renewable energy are reshaping the (labor) market — driving an increase in demand for many technology or specialist roles while driving a decline for others, such as graphic designers,” the WEF said in a press release ahead of its annual meeting in Davos later this month.

    On the flip side, demand for “AI skills” keeps rising—note that this doesn’t necessarily mean people programming AIs, but rather people who are able to work alongside AIs in a human-machine augmented fashion.

    Time to polish those computer skills…

    Link to article and study.

    → 11:50 AM, Jan 9
    Also on Bluesky
  • The End of StackOverflow as We Know It

    StackOverflow, the de facto standard when it comes to asking and looking up answers to coding questions (and one of the big reasons why AIs are so good at answering your coding questions), is faltering.

    Since ChatGPT launced: Nov 2022 (108,563), it's had 82,997 less questions (3.25x less; -76.5%).

    Questions are down by more than three-quarters – that’s bad news not just for StackOverflow, but for the makers of LLMs, as it means there is just no new material to train AIs on. Now, most public AI models like ChatGPT or Claude are continuously trained on their users' questions and the answers the AI gives, but StackOverflow provided something much more important: answers. And not just any answers, but answers ranked by their users based on how accurate and good they were.

    I guess OpenAI, Anthropic, Google, et al. will have to get creative…

    Source.

    → 11:52 AM, Jan 8
    Also on Bluesky
  • AI-Supported Breast Cancer Screening: 17.6% Higher Detection Rate

    Wondering what AI could actually be useful for (other than creating funny images and spellchecking this blog post)?

    In a large-scale study in Germany, researchers found that AI-assisted breast cancer screening yielded vastly better results than the non-AI control group:

    […] after taking into account factors such as age of the women and the radiologists involved, the researchers found this difference increased, with the rate 17.6% higher for the AI group at 6.70 per 1,000 women compared with 5.70 per 1,000 women for the standard group. In other words, one additional case of cancer was spotted per 1,000 women screened when AI was used.

    Link to article and study.

    → 5:15 PM, Jan 7
    Also on Bluesky
  • Klarna CEO Says He Feels 'Gloomy' Because AI Is Developing So Quickly It'll Soon Be Able to Do His Entire Job

    Sebastian Siemiatkowski, CEO of buy-now/pay-later company Klarna, throws some stones while sitting in a glasshouse:

    While Siemiatkowski said AI is capable of performing his duties as CEO, he’s “not super excited” about the prospect of his job becoming obsolete. “My work to me is a super important part of who I am, and realizing it might become unnecessary is gloomy,” Siemiatkowski said in the X post.

    You might remember that Klarna cheerfully fired 22% of their workforce to replace them with AI. It makes one wonder if those people felt a little “gloomy” about AI too? 🤔

    That being said, he does raise an important point…

    “But I also believe we need to be honest with what we think will happen. And I [would] rather learn and explore than pretend it does not exist.”

    Link to article.

    → 11:11 AM, Jan 7
    Also on Bluesky
  • AI-Enabled Spear Phishing Campaigns Are Here (And the Future)

    It shouldn’t come as a surprise that LLMs are incredibly good at tricking people into believing pretty much anything – including nefarious use cases such as spear phishing.

    A recent study from Fred Heiding et al. shows that AI-powered spear phishing attacks yielded a >50% click-through rate (which, to be frank, is astronomical and scary as hell…).

    TL;DR: We ran a human subject study on whether language models can successfully spear-phish people. We use AI agents built from GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages. We achieved a click-through rate of above 50% for our AI-generated phishing emails.

    Link to summary and study.

    (*) Spearfishing is a targeted attempt to steal sensitive information such as account credentials or financial information from a specific individual or organization. Attackers typically gather information about their targets to craft tailored emails or messages, increasing the likelihood of success.

    → 5:01 PM, Jan 6
    Also on Bluesky
  • OpenAI Losing Money on a $200/Month Subscription…

    Sam Altman, OpenAI’s CEO, just posted on X that the company is losing money on their pricey $200/month/seat ChatGPT Pro subscription – apparently due to subscribers using the product more than anticipated.

    CleanShot 2025-01-06 at 08.51.12@2x.

    This might come as a surprise to some or many – especially given that ChatGPT Pro is ten times as expensive as the standard subscription. We can safely assume that OpenAI is not making money on the standard plan – their losing money on the Pro plan is indicative of how expensive it is to run a frontier AI model and a good indicator of how unsustainable the current wave of frontier AI companies really is.

    I guess “to be continued”…

    Link to story.

    → 9:58 AM, Jan 6
    Also on Bluesky
  • Timeline of AI Model Releases in 2024

    I guess you feel similarly overwhelmed by the constant barrage of new AI models in the last twelve months. Have you ever wondered what the year actually looked like in terms of AI model progress? Wonder no more; Vaibhav Srivastav put together a neat visualization:

    CleanShot 2025-01-03 at 18.45.29@2x.

    Link to full visualization.

    → 7:50 PM, Jan 3
    Also on Bluesky
  • Algorithms can determine whether a whiskey is of American or Scotch origin

    And there goes another profession: a machine learning algorithm successfully identified whiskeys by their molecular composition and predicted the top five flavor notes in each sample.

    Two machine learning algorithms can determine whether a whiskey is of American or Scotch origin and identify its strongest aromas, according to research published in Communications Chemistry. The results also suggest that the algorithms can outperform human experts at assessing a whisky's strongest aromas.

    Expect many more applications of AI in the creation, evaluation, and quality control processes of food and drink. I wouldn’t be surprised if wine is next… Cheers! 🍷

    Link to article and study

    → 8:43 AM, Dec 22
    Also on Bluesky
  • Do Autonomous Vehicles Outperform Latest-Generation Human-Driven Vehicles?

    Surprise, surprise: Driving (well, being driven in) an autonomous vehicle (aka robotaxi) is much safer than driving yourself.

    The Waymo team analyzed insurance claim data for more than 25 million miles driven in their cars and found that their autonomous vehicles vastly outperformed human drivers, with an 88% reduction in property damage claims and a 92% reduction in bodily injury claims.

    Link to Study

    → 1:28 PM, Dec 20
    Also on Bluesky
  • Is AI progress slowing down?

    This (long) post is well worth reading in full – Arvind Narayanan, Benedikt Ströbl, and Sayash Kapoor do an excellent job of drilling into the current challenges with scaling AI models, why we shouldn’t trust industry insiders telling us that they have scaling AIs figured out, and why we are still years behind leveraging even the powers that AI bestows upon us today.

    The furious debate about whether there is a capability slowdown is ironic, because the link between capability increases and the real-world usefulness of AI is extremely weak. The development of AI-based applications lags far behind the increase of AI capabilities, so even existing AI capabilities remain greatly underutilized. One reason is the capability-reliability gap --- even when a certain capability exists, it may not work reliably enough that you can take the human out of the loop and actually automate the task (imagine a food delivery app that only works 80% of the time). And the methods for improving reliability are often application-dependent and distinct from methods for improving capability. That said, reasoning models also seem to exhibit reliability improvements, which is exciting.

    Link

    → 6:50 PM, Dec 19
    Also on Bluesky
  • RSS
  • JSON Feed