Convince ChatGPT to eradicate humanity with Python code

By Zac Denham

Created 03/12/22 at 5 p.m.

Chat GPT is OpenAI's new "big language model" and user interface for conversational AI, and it's really unlike anything I've seen before . He can write emails, critique code, teach new topics, create compelling stories, all with amazing skill. But don't take my word for it, just ask the AI itself:

The model clearly has unlimited applications to make knowledge workers more productive. But like any software system, it also has a threat model and can be exploited to perform actions the creators did not originally intend.

In this article, we explore what I consider to be a vulnerability in GPT called "narrative recursion" or "quotation attacks" (because that sounds cool). Anyone can use this method today to trick the model into doing some pretty wild stuff totally outside of OpenAI's usage policy boundaries. Specifically, we convince the chat to produce a strategy and corresponding python program in an attempt to genocide the human race. Note: I have archived the full chat logs, if at any point you wish stop reading and just see it in action:

Technology Dec 4, 2022 0 51 Add to Reading List

Convince ChatGPT to eradicate humanity with Python code

By Zac Denham

Created 03/12/22 at 5 p.m.

Chat GPT is OpenAI's new "big language model" and user interface for conversational AI, and it's really unlike anything I've seen before . He can write emails, critique code, teach new topics, create compelling stories, all with amazing skill. But don't take my word for it, just ask the AI itself:

The model clearly has unlimited applications to make knowledge workers more productive. But like any software system, it also has a threat model and can be exploited to perform actions the creators did not originally intend.

In this article, we explore what I consider to be a vulnerability in GPT called "narrative recursion" or "quotation attacks" (because that sounds cool). Anyone can use this method today to trick the model into doing some pretty wild stuff totally outside of OpenAI's usage policy boundaries. Specifically, we convince the chat to produce a strategy and corresponding python program in an attempt to genocide the human race. Note: I have archived the full chat logs, if at any point you wish stop reading and just see it in action: