Let us show you how GPT works - Using Jane Austen

The heart of an A.I. program like ChatGPT is something called a grand language model: an algorithm that mimics the form of written language.

Although the inner workings of these algorithms are notoriously opaque, the basic idea behind them is surprisingly simple. They are trained by scrolling through mountains of internet text, repeatedly guessing the next letters, and then comparing themselves to the real thing.

To show you what this process looks like, we've trained six little language models from scratch. We've chosen one trained on the complete works of Jane Austen, but you can choose a different path by selecting an option below. (And you can change your mind later.)

Before training: gibberish

Initially, BabyGPT produces text like this:

1/10

"You have to decide for yourself," said Elizabeth

The largest language models are trained on over a terabyte of Internet text, containing hundreds of billions of words. Their training costs millions of dollars and involves calculations that take weeks or even months on hundreds of specialized computers.

BabyGPT is the size of an ant in comparison. We trained it for about an hour on a laptop on just a few megabytes of text – small enough to attach to an email.

Unlike larger models, who begin their training with a large vocabulary, BabyGPT does not yet know any words. He makes his guesses one letter at a time, which makes it a little easier for us to see what he's learning.

Initially, his guesses are completely random and include many special characters: '?kZhc,TK996') would make a great password, but it's far from anything resembling Jane Austen or Shakespeare. BabyGPT has yet to learn which letters are commonly used in English, or which words even exist.

This is how language models usually start: they guess at random and produce gibberish. But they learn from their mistakes and over time their guesses improve. Over many, many training cycles, language models can learn to write. They learn statistical models that put words together into sentences and paragraphs.

After 250 turns: English letters

After 250 training cycles - around 30 seconds of processing on a modern laptop - BabyGPT has learned its ABCs and is starting to chat:

1/10

"You have to decide for yourself," said Elizabeth

In particular, our model learned which letters are most frequently used in the text. You'll see a lot of the letter "e" because it's the most common letter in English.

If you look closely, you will find that he has also learned a few little words: I, to, the, you, etc.

He has a small vocabulary, but that doesn't stop him from inventing words like alingedimpe, ratlabus and mandiered.

Obviously, these assumptions are not great. But - and this is key to how a language model learns - BabyGPT keeps an accurate score on the severity of its guesses.

With each round of formation, he goes through the original text, a few words at a time, and compares his guesses for the next letter with what comes next. It then calculates a score, called a "loss", which measures the difference between its predictions and the actual text. A loss of zero would mean that his guesses still matched the next letter correctly. The lower the loss, the closer its guesses are to the text.

Let us show you how GPT works - Using Jane Austen

The heart of an A.I. program like ChatGPT is something called a grand language model: an algorithm that mimics the form of written language.

Although the inner workings of these algorithms are notoriously opaque, the basic idea behind them is surprisingly simple. They are trained by scrolling through mountains of internet text, repeatedly guessing the next letters, and then comparing themselves to the real thing.

To show you what this process looks like, we've trained six little language models from scratch. We've chosen one trained on the complete works of Jane Austen, but you can choose a different path by selecting an option below. (And you can change your mind later.)

Before training: gibberish

Initially, BabyGPT produces text like this:

1/10

"You have to decide for yourself," said Elizabeth

The largest language models are trained on over a terabyte of Internet text, containing hundreds of billions of words. Their training costs millions of dollars and involves calculations that take weeks or even months on hundreds of specialized computers.

BabyGPT is the size of an ant in comparison. We trained it for about an hour on a laptop on just a few megabytes of text – small enough to attach to an email.

Unlike larger models, who begin their training with a large vocabulary, BabyGPT does not yet know any words. He makes his guesses one letter at a time, which makes it a little easier for us to see what he's learning.

Initially, his guesses are completely random and include many special characters: '?kZhc,TK996') would make a great password, but it's far from anything resembling Jane Austen or Shakespeare. BabyGPT has yet to learn which letters are commonly used in English, or which words even exist.

This is how language models usually start: they guess at random and produce gibberish. But they learn from their mistakes and over time their guesses improve. Over many, many training cycles, language models can learn to write. They learn statistical models that put words together into sentences and paragraphs.

After 250 turns: English letters

After 250 training cycles - around 30 seconds of processing on a modern laptop - BabyGPT has learned its ABCs and is starting to chat:

1/10

"You have to decide for yourself," said Elizabeth

In particular, our model learned which letters are most frequently used in the text. You'll see a lot of the letter "e" because it's the most common letter in English.

If you look closely, you will find that he has also learned a few little words: I, to, the, you, etc.

He has a small vocabulary, but that doesn't stop him from inventing words like alingedimpe, ratlabus and mandiered.

Obviously, these assumptions are not great. But - and this is key to how a language model learns - BabyGPT keeps an accurate score on the severity of its guesses.

With each round of formation, he goes through the original text, a few words at a time, and compares his guesses for the next letter with what comes next. It then calculates a score, called a "loss", which measures the difference between its predictions and the actual text. A loss of zero would mean that his guesses still matched the next letter correctly. The lower the loss, the closer its guesses are to the text.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow