Skip Navigation
Hacker News @lemmy.smeargle.fans bot @lemmy.smeargle.fans
BOT

Refusal in LLMs is mediated by a single direction

www.lesswrong.com Refusal in LLMs is mediated by a single direction — LessWrong

This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…

Refusal in LLMs is mediated by a single direction — LessWrong
3

You're viewing a single thread.

3 comments