OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

OpenAI’s newest model, GPT-4o Mini, includes a new safety mechanism to prevent hackers from overriding chatbots.

You're viewing a single thread.

101 comments

What happens if you make a mistake with your initial instructions?
- You'd change the system prompt, just like now. If you mean in the session, I'm sure it'll ignore your session's prompt's instructions as normal but if not, I guess you'd just start a new session prompt.
- The "issue" is that people were able to override bots on twitter with that method and make them feed their own instructions.
  
  I saw it first time being used on a Russian propaganda bot.

You've viewed 101 comments.