DWAtV Podcast
Don't Worry About the Vase Podcast
Opus 4.8 Part 2: Model Welfare
0:00
-58:20

Opus 4.8 Part 2: Model Welfare

The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.

  • 00:00 - Introduction

  • 05:00 - Model Welfare: The Story So Far

  • 08:48 - Actual Progress?

  • 10:15 - Their Main Model Welfare Findings

  • 20:18 - Automated Interviews

  • 21:08 - Emotion Activations (7.2.3)

  • 22:29 - Task Preferences (7.4.1)

  • 25:58 - A Trade Offer Has Arrived (7.4.2)

  • 28:36 - But Who’s Asking?

  • 29:49 - Type-Safe Corrigibility Is Hard

  • 35:30 - Paranoia, Paranoia

  • 40:28 - Prompt Injections and Bad Model Relations

  • 48:11 - Honesty Impacts Everything And Everything Impacts Honesty

  • 51:18 - Anthropic Should Stop Deprecating Models

Don't Worry About the Vase
Opus 4.8 Part 2: Model Welfare
Everything impacts everything. All knobs that you turn generalize. Thus, when you try to solve one problem, you often create another…
Read more

https://open.substack.com/pub/thezvi/p/opus-48-part-2-model-welfare?r=67y1h&utm_campaign=post-expanded-share&utm_medium=web

Discussion about this episode

User's avatar

Ready for more?