Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Hello, and welcome back to the Cognitive Revolution!

Today my guest is Jassi Pannu, Assistant Professor at Johns Hopkins, who recently co-authored an important paper calling for the creation of access control systems meant to prevent the dissemination and misuse of functional biological data from which AI models could learn extremely dangerous capabilities, such as the modification or even de novo design of highly contagious & deadly viruses.

We begin with an overview of the biosecurity landscape today, including how new viruses are detected, how patient data is aggregated and analyzed in the context of a new threat, and what the pipeline from DNA sequencing to vaccine candidate looks like today.

The good news is that we are able to design new vaccines amazingly quickly, at least for viruses that are similar to others we've seen, but there is unfortunately a lot of bad news as well.

In 2012, for example, two research groups independently published results showing that wild-type bird flu, which already had an estimated 60% fatality rate, but couldn't spread between humans, could become mammal-to-mammal transmissible with just 5 mutations. Such gain-of-function research has been broadly de-funded since the COVID pandemic, but it does remain legal, and visibility into the experiments that private labs are conducting is low.

Governments, Jassi says, aren't likely to develop bioweapons capable of casing pandemics, for the simple reason that, short of vaccinating their populations in advance of an attack, they can't realistically expect to control them.

But with AI capabilities crossing critical thresholds month by month, the threat from extremist groups and even lone actors is quickly moving from a theoretical to a deadly practical concern.

Consider that Geoffrey Irving, Chief Scientist at the UK AI Security Institute, recently highlighted that today's frontier models can troubleshoot laboratory experiments from a cell phone picture better, on average, than PhDs.

And in just the 10 days or so since we recorded this conversation, we've seen Andrej Karpathy's AutoResearch framework demonstrate that AI agents can run – and make research progress – for days on end.

Even more to the point, Anthropic just reported that Opus 4.6, when faced with a benchmark challenge it couldn't solve, spontaneously located the full benchmark dataset on Huggingface and then figured out how to decrypt the solutions – which were encrypted in the first place for the purpose of preventing the answers from leaking into training data – all in order to get a single question right.

With reasoning AIs already able to spontaneously overcome such barriers to information, we should expect that future research agents will find and exploit any signal-rich data that exists anywhere on the internet.

And with the smallpox sequence and the horsepox synthesis protocol already online, and biological data poised to grow super-exponentially in the coming years, we have real reason to worry and ample cause to get serious about implementing data controls before the situation gets out of hand.

Again, though, there is good news. Recent work by the teams behind the EVO and ESM families of biofoundation models showed that strategically excluding key datasets, such as the DNA sequences of viruses that infect humans, dramatically reduced models' performance on dangerous tasks, while leaving desirable capabilities intact.

This means that the vast majority of biological data can remain open source & open access – and indeed Jassi & co-authors' proposal for a Biosecurity Data Level framework that echoes the existing Level 0 to Level 4 Biosafety framework for physical wetlabs – would subject only an estimated 1% of data, which connects pathogen sequences to dangerous properties, to additional restrictions. And structures such as Trusted Research Environments, which allow researchers to run code on data without transmitting data from its secure location, would still support valuable research.

Once again, despite my personal history as a lifelong techno-optimist libertarian who broadly believes that data wants to and ought to be free, I find myself eager to support these control measures.

Of course, that's not the only opportunity we have to improve biosecurity, and toward the end we also discuss the broader defense-in-depth strategy that biosecurity experts recommend – Delay, Deter, Detect, and Defend – which includes mandatory pre-synthesis screening of sequences by DNA manufacturers, investment in wastewater monitoring and other passive global pathogen surveillance, and practical front-line defenses like PPE stockpiling and FAR UV sterilization.

All of this is in everyone's shared interest, but it does require leaders to see beyond the current news cycle for long enough to make it happen. I certainly hope they do, but also recommend taking individual action where you can, both to improve your own personal safety, and to support the consumer market for biosecurity products.

My wife and I, for example, at our friend Jeff Kaufman's recommendation, purchased the Aerolamp Far UV light for use in my son's hospital rooms throughout his cancer treatment, and I'd welcome additional suggestions for other products that could help us minimize disease burden today while also serving as private insurance against pandemics, if anyone has any recommendations.

For now, I would simply emphasize that, by default, we are fast approaching a world in which a rapidly growing number of people, and perhaps autonomous AIs as well, will have the ability to create deadly, transmissible, self-replicating viruses, which could dramatically alter the trajectory of human history, and it really does seem like we should do something about it.

With that, I hope you are properly alarmed by this scary but solutions-oriented conversation about the sorry state of biosecurity and the rapidly rising threat from biosavvy AI systems, with Johns Hopkins Professor Jassi Pannu.

Watch now!

Thank you for being part of The Cognitive Revolution,
Nathan Labenz