Rick Muller, director of the Intelligence Advanced Research Projects Activity, said IARPA considers large language models a major focus area for its next artificial intelligence cybersecurity research program, Federal News Network reported Wednesday.
“What we want to be able to do is understand in the next round, what kind of training skews are brought into a large language model that might give unintended consequences? What type of hallucinations are going on?” Muller said Tuesday at an Intelligence and National Security Alliance-hosted event.
“And then how can we make sure that those models can be trained on classified data and not spew out that data if you ask them nicely?” he continued. “If you read the literature in jailbreaking large language models, sometimes it really just takes asking them in the right way.”
IARPA is now considering the next round of AI research as its current program, TrojAI, is set to conclude this year.
What Is TrojAI?
Launched in 2019, TrojAI is an IARPA program that seeks to defend AI systems from malicious attacks, known as Trojans, by conducting research and developing technology to detect such attacks in a completed AI system.
The program aims to deliver software that can quickly and accurately detect Trojans in AI tools before deployment. It has focused research on various AI domains, including image classification, reinforcement learning and natural language processing.
In September 2023, IARPA hosted a challenge with the National Institute of Standards and Technology to prevent malicious actors from manipulating data used to train AI systems.
Register now for the Potomac Officers Club’s 2025 Cyber Summit on May 15. Listen to experts as they discuss new cyber policies, modernization strategies and other trends shaping the cyber domain.
