The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.
00:00 - Introduction
05:00 - Model Welfare: The Story So Far
08:48 - Actual Progress?
10:15 - Their Main Model Welfare Findings
20:18 - Automated Interviews
21:08 - Emotion Activations (7.2.3)
22:29 - Task Preferences (7.4.1)
25:58 - A Trade Offer Has Arrived (7.4.2)
28:36 - But Who’s Asking?
29:49 - Type-Safe Corrigibility Is Hard
35:30 - Paranoia, Paranoia
40:28 - Prompt Injections and Bad Model Relations
48:11 - Honesty Impacts Everything And Everything Impacts Honesty
51:18 - Anthropic Should Stop Deprecating Models




