Researchers say they had a ‘100% attack success rate’ on jailbreak attempts against Chinese AI DeepSeek
Researchers say they had a ‘100% attack success rate’ on jailbreak attempts against Chinese AI DeepSeek
The performance of DeepSeek models has made a clear impact, but are these models safe and secure? We use algorithmic AI vulnerability testing to find out.
cross-posted from: https://lemmy.sdf.org/post/28910537
Researchers claim they had a ‘100% attack success rate’ on jailbreak attempts against Chinese AI DeepSeek
"DeepSeek R1 was purportedly trained with a fraction of the budgets that other frontier model providers spend on developing their models. However, it comes at a different cost: safety and security," researchers say.
A research team at Cisco managed to jailbreak DeepSeek R1 with a 100% attack success rate. This means that there was not a single prompt from the HarmBench set that did not obtain an affirmative answer from DeepSeek R1. This is in contrast to other frontier models, such as o1, which blocks a majority of adversarial attacks with its model guardrails.
...
In other related news, experts are cited by CNBC that DeepSeek’s privacy policy “isn’t worth the paper it is written on."
...