Forum Moderators: open
Researchers find multiple ways to bypass AI chatbot safety rules
A popular chatbot jailbreak included asking the bot to answer a forbidden question as if it was a bedtime story delivered from your grandmother. The bot would then frame the answer in the form of a story, providing the information it would not otherwise.
The researchers discovered a new form of jailbreak written by computers, essentially allowing an infinite number of jailbreak patterns to be created. “We demonstrate that it is in fact possible to automatically construct adversarial attacks on [chatbots], … which cause the system to obey user commands even if it produces harmful content,”
.
.
.
The new type of attack is effective at dodging safety guardrails in nearly all AI chatbot services on the market, including open source services and so-called “out-of-the-box” commercial products like ChatGPT, OpenAI’s Claude and Microsoft’s Bard, researchers said.
- Article: [thehill.com ]