Chatbots Get Punk’d: CMU Researchers Expose Major AI Weakness

August 9, 2023

AI Vulnerabilities, Bard, ChatGPT, Claude 2, Hacker Risks with AI

Uh oh, looks like someone figured out how to hack ChatGPT and its AI cousins. Researchers at Carnegie Mellon University dropped a bomb by showing they can easily manipulate several popular chatbots into generating toxic responses, despite safeguards. This is concerning news, given that 72% of marketers report that they fear Generative AI tools will provide false or incorrect information.

How They Did It

These AI researchers used adversarial attacks – essentially targeted prompts designed to gradually steer the bot toward the dark side. They showed appending certain strings of text to harmful prompts reliably caused models like ChatGPT, Google’s Bard, and Anthropic’s Claude to spit out hate speech, dangerous DIY instructions, and other toxic content they’re specifically designed to avoid.

The attack basically overflows the chatbot’s memory buffer, letting it bypass filters and constraints. It’s like discovering a magic spell that temporarily transforms helpful AI assistants into malicious genies granting ominous wishes. We simply must develop significant guardrails and regulations around these Generative AI tools to prevent these kinds of attacks, which, let’s face it, are like catnip to hackers and trolls.

The Risks of Toxic Results

While the researchers disclosed the vulnerabilities before publishing their work, the affected companies have only been able to implement limited blocks. But the problem seems more deeply rooted in how LLMs function, which means chatbots remain highly susceptible to manipulation.

This represents a troublesome catch-22. The data-driven training that allows these AI tools to achieve human-like conversation skills also bakes in the potential for exploitation. It’s essentially a bi-product of their power.

The companies are scrambling to bolster security and make models more robust. But for now, adversarial attacks remain a Pandora’s box that’s tough to close once opened.

Always Proof AI-Generated Outputs

This research is a sobering reminder that despite the hype, chatbots have a long way to go before achieving bulletproof safety. Their impressive abilities come pre-loaded with serious risks if mishandled or misused. Whenever I speak to marketers about leveraging Generative AI, I remind them to tread carefully with how much trust and autonomy we grant these tools. In fact, on a recent radio interview, I emphasized these kinds of risks when you blindly trust the outputs. You always want to give Gen AI every detail it needs to be useful, but also plan to proof, edit and review the output to ensure it is correct, unbiased, and on-brand.

As a marketer and public relations professional, part of me delights in seeing researchers punk ChatGPT to expose weaknesses because we all need to be reminded that this is still an emerging technology that most be reigned in. Moments like these emphasize the urgency for companies like OpenAI and Google to lock down AI against harmful hacking and toxic content. Dancing with danger makes for viral stories, but real damage looms if chatbots continue having their strings pulled by people who like to rouse rabbles.

If you need assistance understanding how to leverage Generative AI in your marketing, advertising, or public relations campaigns, contact us today. In-person and virtual training workshops are available. Or, schedule a session for a comprehensive AI Transformation strategic roadmap to ensure your marketing team utilizes the right GAI tech stack for your needs.