Skip Navigation

SneerClub @awful.systems David Gerard @awful.systems 3 mo. ago

what if, right, what if our super-duper-autocomplete was just tricking us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey

www.lesswrong.com New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

I examine the probability of a behavior sometimes called "deceptive alignment."

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong

You're viewing a single thread.

32 comments

I conclude that scheming is a disturbingly plausible outcome of using baseline machine learning methods to train goal-directed AIs sophisticated enough to scheme (my subjective probability on such an outcome, given these conditions, is ~25%).

Out: vibes and guesswork

In: "subjective probability"
- at one of the places i worked this kind of data was called assnumbers.

You've viewed 32 comments.