Lessons From Red Teaming 100 Generative AI Products

Microsoft’s security research team just published a comprehensive paper on their insights from “red teaming” (*) one hundred generative AI products. The whole report is worth reading (and somewhat sobering):

Lesson 2: You don’t have to compute gradients to break an AI system — As the security adage goes, “real hackers don’t break in, they log in.” The AI security version of this saying might be “real attackers don’t compute gradients, they prompt engineer” as noted by Apruzzese et al. in their study on the gap between adversarial ML research and practice. The study finds that although most adversarial ML research is focused on developing and defending against sophisticated attacks, real-world attackers tend to use much simpler techniques to achieve their objectives.

Lesson 6: Responsible AI harms are pervasive but difficult to measure

Lesson 7: LLMs amplify existing security risks and introduce new ones

Lesson 8: The work of securing AI systems will never be complete

Fun times! 🦹🏼

Link to study.

(*) Red teaming is a security assessment process where authorized experts simulate real-world attacks against an organization's systems, networks, or physical defenses to identify vulnerabilities and test security effectiveness.

Pascal Finette @radical