what if, right, what if our super-duper-autocomplete was just tricking us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey
what if, right, what if our super-duper-autocomplete was just tricking us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey
www.lesswrong.com New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" — LessWrong
I examine the probability of a behavior sometimes called "deceptive alignment."
You're viewing a single thread.
View all comments
32
comments
I conclude that scheming is a disturbingly plausible outcome of using baseline machine learning methods to train goal-directed AIs sophisticated enough to scheme (my subjective probability on such an outcome, given these conditions, is ~25%).
Out: vibes and guesswork
In: "subjective probability"
29 0 Replyat one of the places i worked this kind of data was called assnumbers.
24 0 Reply
You've viewed 32 comments.
Scroll to top