AI Control: Using Untrusted Systems Safely with Buck Shlegeris, Redwood Research (80,000 Hours Pod)

In this episode, we share a fascinating conversation from the 80,000 Hours Podcast between Rob Wiblin and Buck Shlegeris, CEO of Redwood Research.
Watch Episode Here
Read Episode Description
In this episode, we share a fascinating conversation from the 80,000 Hours Podcast between Rob Wiblin and Buck Shlegeris, CEO of Redwood Research. Buck dives deep into the emerging field of AI Control strategies for safely working with powerful AIs even if they’re not fully aligned. They explore innovative techniques like always-on auditing, honeypotting, re-sampling, and factored cognition to monitor and manage AI behaviors. This discussion highlights both the promise and challenges of controlling increasingly autonomous AI systems in today’s fast-evolving landscape. Tune in for a thoughtful, first-principles look at how we might secure useful outcomes from AIs we can’t fully trust.
Original source: https://80000hours.org/podcast...
Upcoming Major AI Events Featuring Nathan Labenz as a Keynote Speaker
https://www.imagineai.live/
https://adapta.org/adapta-summ...
https://itrevolution.com/produ...
SPONSORS:
ElevenLabs: ElevenLabs gives your app a natural voice. Pick from 5,000+ voices in 31 languages, or clone your own, and launch lifelike agents for support, scheduling, learning, and games. Full server and client SDKs, dynamic tools, and monitoring keep you in control. Start free at https://elevenlabs.io/cognitiv...
Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive
Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive
NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive
PRODUCED BY:
https://aipodcast.ing
CHAPTERS:
(00:00) About this episode
(04:02) Introduction to AI Control and Misalignment Risks
(05:54) Understanding AI Control Strategies
(06:50) Potential Threats from Misaligned AIs
(09:49) Challenges in AI Control
(19:47) Techniques for AI Auditing and Control (Part 1)
(24:23) Sponsors: ElevenLabs | Oracle Cloud Infrastructure (OCI)
(26:50) Techniques for AI Auditing and Control (Part 2)
(28:35) Evaluating AI Control Effectiveness
(43:10) Understanding AI Attack Frequency and Classifier Challenges
(43:24) The Role of Classifiers in AI Safety
(44:18) Auditing and Replacement Strategies (Part 1)
(45:10) Sponsors: Shopify | NetSuite
(48:35) Red Team and Blue Team Testing
(49:48) Untrusted Monitoring and Sting Operations
(51:10) Challenges in Untrusted Monitoring
(55:08) Strategies to Prevent AI Collusion
(01:11:18) Evaluating AI Control Techniques
(01:16:58) Performance Implications and AI Control Strategies
(01:17:24) Limiting AI Access and Affordances
(01:18:25) Challenges in AI Development and Experimentation
(01:21:10) Insider Threats and Security Measures
(01:29:59) Chronic Risks and AI Training
(01:36:10) Evaluating AI Alignment and Control
(01:48:57) Ethical Considerations in AI Control
(01:51:24) AI's Ethical Dilemmas and Backstabbing Concerns
(01:52:02) AI Welfare and Factory Farming Comparison
(01:52:48) AI Control and Responsibility
(01:53:46) AI Companies and Safety Interventions
(01:55:38) Working Inside AI Companies: Pros and Cons
(02:03:42) Redwood's Vision and AI Control Focus
(02:12:20) AI Takeover Scenarios and Mitigation Strategies
(02:27:34) Practical Interventions and Optimism
(02:28:36) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...