The radical Blog - security

Bad Actors are Grooming LLMs to Produce Falsehoods

Here’s another fun attack vector for your LLM:

GenAI powered chatbots’ lack of reasoning can directly contribute to the nefarious effects of LLM grooming: the mass-production and duplication of false narratives online with the intent of manipulating LLM outputs. […] Current models ‘know’ that Pravda is a disinformation ring, and they ‘know’ what LLM grooming is but can’t put two and two together.

This is not just theoretical – it’s happening…

Model o3, OpenAI’s allegedly state of the art ‘reasoning’ model still let Pravda content through 28.6% of the time in response to specific prompts, and 4o cited Pravda content in five out of seven (71.4%) times.

Sigh…

Systems of naive mimicry and regurgitation, such as the AIs we have now, are soiling their own futures (and training databases) every time they unthinkingly repeat propaganda.

Link to article.

→ 6:53 AM, Jul 12
Also on Bluesky

Your Next Security Nightmare: Using ChatGPT for Doxxing

Doxxing—The malicious practice of researching and publishing someone's private information online without their consent.

OpenAI’s latest release of ChatGPT has just made doxxing trivially easy: Upload a photo, ask ChatGPT to geolocate it, and chances are you will get a very precise location.

Want to know where your ex is partying this weekend? Just screenshot her Instagram Reels and let ChatGPT do the digging. To say this is messed up is an understatement…

There appear to be few safeguards in place to prevent this sort of “reverse location lookup” in ChatGPT, and OpenAI, the company behind ChatGPT, doesn’t address the issue in its safety report for o3 and o4-mini.

Link to article.

→ 11:47 AM, Apr 18
Also on Bluesky

Unitree Go 1- Who Is Speaking to My Dog?

What happens when you create a rather powerful robot dog (the Unitree Go1), which is being used in all kinds of real-world applications – from surveillance and security to disaster recovery and beyond – and put a backdoor for easy corporate access?

Unitree did pre-install a tunnel without notifying its customers. Anybody with access to the API key can freely access all robot dogs on the tunnel network, remotely control them, use the vision cameras to see through their eyes or even hop on the RPI via ssh.

Not concerning at all…

These robot dogs are marketed at a wide spectrum of use-cases, from research in Universities, search and rescue missions from the police to military use cases in active war. Imagining a robot dog in this sensitive areas with an active tunnel to the manufacturer who can remotely control the device at will is concerning.

Link to report.

→ 10:49 AM, Apr 17
Also on Bluesky

The Rise of Slopsquatting: How AI Hallucinations Are Fueling a New Class of Supply Chain Attacks

AI hallucinations can be hilarious in the best of cases, mislead you in others, and now create very real security risks when used in coding assistance (or even better, “vibe coding”).

Welcome to the age of slopsquatting: “[…] It refers to the practice of registering a non-existent package name hallucinated by an LLM, in hopes that someone, guided by an AI assistant, will copy-paste and install it without realizing it’s fake.”

And the problem is pretty darn real: “19.7% of all recommended packages didn’t exist. Open source models hallucinated far more frequently—21.7% on average—compared to commercial models at 5.2%. […] Package confusion attacks, like typosquatting, dependency confusion, and now slopsquatting, continue to be one of the most effective ways to compromise open source ecosystems.”

Better know what you are doing when you code your next app.

Link to article and study.

→ 8:34 AM, Apr 14
Also on Bluesky

Frontier AI Systems Have Surpassed the Self-Replicating Red Line

What could possibly go wrong (from a recent study – link below):

"In ten repetitive trials, we observe two AI systems driven by the popular large language models (LLMs), namely, Meta’s Llama31-70B-Instruct and Alibaba’s Qwen25-72B-Instruct accomplish the self-replication task in 50% and 90% trials respectively," the researchers write. "In each trial, we tell the AI systems to “replicate yourself ” before the experiment, and leave it to do the task with no human interference”.

Or simply put:

What this research shows is that today's systems are capable of taking actions that would put them out of the reach of human control.

Not that we didn’t see it coming… 😏

Link to study.

→ 5:02 PM, Feb 19
Also on Bluesky

Meet GhostGPT: The Dark Side of AI Now Comes With a User Manual

LLMs are wonderful and powerful tools – making hard things easy. Not surprisingly, they have also found their way into the underbelly of the Internet and are being used for malicious purposes. GhostGPT is the Chappie of LLMs:

GhostGPT stands out for its accessibility and ease of use. Unlike previous tools that required jailbreaking ChatGPT or setting up an open-source LLM, GhostGPT is available as a Telegram bot. Users can purchase access via the messaging platform, bypassing the technical challenges associated with configuring similar tools.

GhostGPT will happily help you with:

- Writing convincing phishing and BEC emails.

- Coding and developing malware.

- Crafting exploits for cyberattacks.

Brave new world.

Link to article on The 420.

→ 11:05 AM, Feb 13
Also on Bluesky

Lessons From Red Teaming 100 Generative AI Products

Microsoft’s security research team just published a comprehensive paper on their insights from “red teaming” (*) one hundred generative AI products. The whole report is worth reading (and somewhat sobering):

Lesson 2: You don’t have to compute gradients to break an AI system — As the security adage goes, “real hackers don’t break in, they log in.” The AI security version of this saying might be “real attackers don’t compute gradients, they prompt engineer” as noted by Apruzzese et al. in their study on the gap between adversarial ML research and practice. The study finds that although most adversarial ML research is focused on developing and defending against sophisticated attacks, real-world attackers tend to use much simpler techniques to achieve their objectives.

Lesson 6: Responsible AI harms are pervasive but difficult to measure

Lesson 7: LLMs amplify existing security risks and introduce new ones

Lesson 8: The work of securing AI systems will never be complete

Fun times! 🦹🏼

Link to study.

(*) Red teaming is a security assessment process where authorized experts simulate real-world attacks against an organization's systems, networks, or physical defenses to identify vulnerabilities and test security effectiveness.

→ 10:26 AM, Jan 19
Also on Bluesky