ComputerphileAi Will Try to Cheat & Escape (aka Rob Miles was Right!)
2025 • 第12話 2025年4月2日 20m
As Large Language Models improve, the tokens they predict form ever more complicated and nuanced outcomes. Rob Miles and Ryan Greenblatt discuss "Alignment Faking" a paper Ryan's team created - ideas about which Rob made a series of videos on Computerphile in 2017.