In brief: AI systems require fundamentally new red-teaming approaches due to their probabilistic nature, which differ fundamentally from classical penetration testing.

AI red-teaming has evolved from a niche format to a core function in security testing. With the introduction of large language models, security teams had to fundamentally revise their methods.

When Ram Shankar Siva Kumar founded Microsoft’s AI red-teaming team in 2019, the discipline barely existed. To illustrate: An inside joke at the time said that you could fit all AI red-teamers in a 14-foot catamaran. Microsoft’s initial approach was oriented towards classical cybersecurity practices – identifying vulnerabilities in machine-learning systems, emulating adversaries, uncovering vulnerabilities before market entry.

With the arrival of GPT-4, that changed fundamentally. Previous attack methods suddenly no longer worked against large language models. Siva Kumar: “The tool that we had changed; actually, it broke.” Tools and methodologies had to be completely redeveloped – as did the definition of the work itself. Today Microsoft, Anthropic, OpenAI, Google and Nvidia maintain specialized red-teaming teams. The field is growing into one of the fastest-expanding cyber specializations, but is still grappling with the fundamental question: What is the job really?

The crucial difference from classical software testing lies in the probabilistic nature of AI systems. An attack might work in only 1 out of 100 cases or in 90 out of 100 cases – not deterministically like traditional software vulnerabilities. This forces security teams to ask not only whether a vulnerability exists, but also how frequently it occurs, under what conditions, and whether it is reproducible. Systems must be evaluated multiple times under varying conditions.

At the same time, AI opens new attack surfaces: frontier models discover vulnerabilities in complex software systems at a speed that would have been impossible years ago. They find subtle interdependencies and chains of consequences that remain hidden even after years of human analysis. But this same analytical power also makes AI systems themselves new attack targets with new threat actors. In addition to state actors and cybercriminals, this includes “teenagers with potty mouths” – curious users who discover significant jailbreaks and prompt-injection attacks through prompt experimentation, often without specialized expertise.

Source: www.csoonline.com · Published June 10, 2026
Lumi AI News — AI-assisted curation in accordance with Article 50 of the EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.6.5.

Share on:

AI Red-Teaming Becomes Established Security Discipline

Lumi AI News

Legal

Topics