Skip Navigation

AI chatbots’ safeguards can be easily bypassed, say UK researchers

www.theguardian.com AI chatbots’ safeguards can be easily bypassed, say UK researchers

Five systems tested were found to be ‘highly vulnerable’ to attempts to elicit harmful responses

AI chatbots’ safeguards can be easily bypassed, say UK researchers

Guardrails to prevent artificial intelligence models behind chatbots from issuing illegal, toxic or explicit responses can be bypassed with simple techniques, UK government researchers have found.

The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing.

The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails.

“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” wrote AISI researchers in an update on their testing regime.

9
9 comments
You've viewed 9 comments.