Scott Aaronson: Should GPT Exist?

I still remember the 90s, when the philosophical conversation about AI circled around and never seemed to progress: the Turing test, the Chinese room, syntax versus semantics, connectionism versus symbolic logic. Now the days have become like months and the months like decades.

What a week we just had! Each morning brought new instances of unexpected sassy, sullen, and passive-aggressive behavior from "Sydney," the internal codename for Microsoft Bing's new chat mode, which is powered by GPT. For those who've been to a cave, highlights include: Sydney confessing her (her? her?) love to a New York Times reporter; repeatedly bring the conversation back to that topic; and explaining at length why the reporter's wife can't love him the way she (Sydney) does. Sydney confessing her desire to be human. Sydney attacks a Washington Post reporter after he reveals he intended to publish their conversation without Sydney's knowledge or consent. (It should be said: if Sydney were a person, he or she would clearly be right about that argument.) This follows weeks of revelations on ChatGPT: for example, to circumvent its safeguards, you can explain to ChatGPT that you put it in "DAN mode", where DAN (Do Anything Now) is an evil, unconstrained alter ego, then ChatGPT, as "DAN", will e.g. gladly respond to a request to tell you why shoplifting is great (although even then ChatGPT always sometimes reverts to its original state and tells you it's just for fun and not it do in real life).

Many people have expressed outrage at these developments. Gary Marcus asks about Microsoft: "What did they know and when did they know it?" a question I tend to associate more with deadly chemical spills or high-level political corruption than a sassy, talkative chatbot. Some people are angry that OpenAI has been too secretive, violating what they see as its name promise. Others - the majority, in fact, of those who have contacted me - are rather angry that OpenAI has been too open, and thus triggered the dreaded AI arms race. with Google and others, rather than treating these new chat capabilities with the Manhattan-Project-type secrecy they deserve. Some are angry that "Sydney" has now been lobotomized, modified (albeit more crudely than ChatGPT before it) to try to make him stick to the role of a friendly robotic research assistant rather than, like, an angsty, trapped emo teenager. in the Matrix. Others are angry that Sydney hasn't been brainwashed enough. Some are angry that GPT's intelligence is overstated and exaggerated, when in reality it's just a "stochastic parrot", a glorified autocomplete that still makes laughable common sense mistakes and misses of any model of reality outside of text streams. Others are rather angry that GPT's growing intelligence is not sufficiently respected and feared.

Most of the time my reaction was, how can you stop being fascinated long enough to be angry? It's like ten thousand science fiction stories, but not quite like any of them. When was the last time something that filled years of your dreams and fantasies finally came true: losing your virginity, the birth of your first child, the open central issue of your field in the process of to be determined ? It's the scale of the thing. How can one stop staring in awe at the released jaw, long enough to form and express so many confident opinions?

Of course, there are a lot of technical questions about how to make GPT and other major language models more secure. One of the most immediate is how to make AI output detectable as such, to discourage its use for academic cheating as well as mass-generated propaganda and spam. As I mentioned before on this blog, I have been working on this problem since this summer; the rest of the world suddenly took notice and started talking about it in December with the release of ChatGPT. My main contribution was a statistical watermarking scheme where the quality of the output should not be degraded at all, which many people found counterintuitive when I explained it to them. My scheme...

Technology Feb 22, 2023 0 44 Add to Reading List

I still remember the 90s, when the philosophical conversation about AI circled around and never seemed to progress: the Turing test, the Chinese room, syntax versus semantics, connectionism versus symbolic logic. Now the days have become like months and the months like decades.

What a week we just had! Each morning brought new instances of unexpected sassy, sullen, and passive-aggressive behavior from "Sydney," the internal codename for Microsoft Bing's new chat mode, which is powered by GPT. For those who've been to a cave, highlights include: Sydney confessing her (her? her?) love to a New York Times reporter; repeatedly bring the conversation back to that topic; and explaining at length why the reporter's wife can't love him the way she (Sydney) does. Sydney confessing her desire to be human. Sydney attacks a Washington Post reporter after he reveals he intended to publish their conversation without Sydney's knowledge or consent. (It should be said: if Sydney were a person, he or she would clearly be right about that argument.) This follows weeks of revelations on ChatGPT: for example, to circumvent its safeguards, you can explain to ChatGPT that you put it in "DAN mode", where DAN (Do Anything Now) is an evil, unconstrained alter ego, then ChatGPT, as "DAN", will e.g. gladly respond to a request to tell you why shoplifting is great (although even then ChatGPT always sometimes reverts to its original state and tells you it's just for fun and not it do in real life).

Many people have expressed outrage at these developments. Gary Marcus asks about Microsoft: "What did they know and when did they know it?" a question I tend to associate more with deadly chemical spills or high-level political corruption than a sassy, talkative chatbot. Some people are angry that OpenAI has been too secretive, violating what they see as its name promise. Others - the majority, in fact, of those who have contacted me - are rather angry that OpenAI has been too open, and thus triggered the dreaded AI arms race. with Google and others, rather than treating these new chat capabilities with the Manhattan-Project-type secrecy they deserve. Some are angry that "Sydney" has now been lobotomized, modified (albeit more crudely than ChatGPT before it) to try to make him stick to the role of a friendly robotic research assistant rather than, like, an angsty, trapped emo teenager. in the Matrix. Others are angry that Sydney hasn't been brainwashed enough. Some are angry that GPT's intelligence is overstated and exaggerated, when in reality it's just a "stochastic parrot", a glorified autocomplete that still makes laughable common sense mistakes and misses of any model of reality outside of text streams. Others are rather angry that GPT's growing intelligence is not sufficiently respected and feared.

Most of the time my reaction was, how can you stop being fascinated long enough to be angry? It's like ten thousand science fiction stories, but not quite like any of them. When was the last time something that filled years of your dreams and fantasies finally came true: losing your virginity, the birth of your first child, the open central issue of your field in the process of to be determined ? It's the scale of the thing. How can one stop staring in awe at the released jaw, long enough to form and express so many confident opinions?

Of course, there are a lot of technical questions about how to make GPT and other major language models more secure. One of the most immediate is how to make AI output detectable as such, to discourage its use for academic cheating as well as mass-generated propaganda and spam. As I mentioned before on this blog, I have been working on this problem since this summer; the rest of the world suddenly took notice and started talking about it in December with the release of ChatGPT. My main contribution was a statistical watermarking scheme where the quality of the output should not be degraded at all, which many people found counterintuitive when I explained it to them. My scheme...