Legitimate AI agents inherently satisfy all three criteria of the “lethal trifecta” (data access, external content, external communication), so security must shift from architectural design to runtime monitoring.
A new benchmark enables identification of the exact point where medical AI models produce hallucinations and enables targeted countermeasures through trace-supervised fine-tuning.
A trainable classifier predicts with a 0.7 Macro-F1-Score based on early hidden states whether activation steering will succeed without requiring complete generations.
Language models are evolving from chatbots with simple next-token prediction into Digital Colleagues with working memory, persistent workspaces, reusable skills, and reliable problem-solving.
AI amplifies existing problems: companies with poor data hygiene and undocumented processes accelerate their compliance risks rather than their business processes when implementing AI.