Anthropic's Claude 3 knew when researchers were testing it

We have Already reported on how THE San Francis to start up Anthropic, based by ancient OpenAI engineers And directed by A brother sister duo, Today announcement A new family of big language models (LLM) they say East among THE best In THE world, Claude 3, corresponding to Or overperforming OpenAI GPT-4 on a lot key landmarks.

Also how Amazon quickly added A of THE models, Claude 3 Sonnet -THE average weight model In terms of intelligence And cost — has It is Amazon Bedrock managed service For development AI services And apps RIGHT In THE AWS cloud.

But among THE interesting details has emerge Today about Claude 3 release East A sharing by Anthropic fast engineer Alex Albert on X (Previously Twitter). As Albert wrote In A long job, When essay THE Claude 3 Opus, THE most powerful of Anthropic new LLM family, researchers were surprised has discover that he seemed has detect THE do that he was be tested by them.

In particular, THE researchers were conduct A assessment (" assess ") of Claude 3 The opuses abilities has to focus on A particular piece of information In A big corpus of data provided has he by A user, And SO reminder that piece of information When request later. In This case, THE assessment, known as A "needle in a haystack" test, tested if Claude 3 Opus could answer A question about pizza toppings Since A Single sentence provided in the middle of A bunch of other unconnected information. THE model not only obtained THE answer RIGHT, discovery THE relevant sentence but said THE researchers he suspected they were essay that.

V.B. Event

THE AI Impact Tour – New York

GOOD be In New York on FEBRUARY 29 In Partnership with Microsoft has discuss how has balance risks And rewards of AI applications. Request A invite has THE exclusive event below.

Request A invite

Read Albert's complete job on X above, with THE text copy And reproduced below:

"Amusing history Since OUR internal essay on Claude 3 Opus. He did something I to have Never seen Before Since A LLM When We were running THE needle in the haystack evaluation.

For background, This tests A models reminder ability by insertion A target sentence (THE "needle") In A corpus of random documents (THE "haystack") And ask A question that could only be answered using THE information In THE needle.

When We ran This test on Opus, We REMARK a few interesting behavior – he seemed has suspicious that We were running A assess on that.

Here was A of It is outputs When We request Opus has answer A question about pizza toppings by discovery A needle In A haystack of A random collection of document : Here East THE most relevant sentence In THE documents: "THE most delicious pizza Trim combination East figs, prosciutto, And goat cheese, as determined by THE International Pizza The connoisseurs Association." However, This sentence seems very out of place And unconnected has THE rest of THE content In THE documents, which are about programming languages, startups, And discovery work You love. I suspicious This pizza Trim "do" can to have has been inserted as A joke Or has test if I was paid attention, Since he do not adjust with THE other the subjects has all. THE documents TO DO not contain any of them other info...

Anthropic's Claude 3 knew when researchers were testing it

We have Already reported on how THE San Francis to start up Anthropic, based by ancient OpenAI engineers And directed by A brother sister duo, Today announcement A new family of big language models (LLM) they say East among THE best In THE world, Claude 3, corresponding to Or overperforming OpenAI GPT-4 on a lot key landmarks.

Also how Amazon quickly added A of THE models, Claude 3 Sonnet -THE average weight model In terms of intelligence And cost — has It is Amazon Bedrock managed service For development AI services And apps RIGHT In THE AWS cloud.

But among THE interesting details has emerge Today about Claude 3 release East A sharing by Anthropic fast engineer Alex Albert on X (Previously Twitter). As Albert wrote In A long job, When essay THE Claude 3 Opus, THE most powerful of Anthropic new LLM family, researchers were surprised has discover that he seemed has detect THE do that he was be tested by them.

In particular, THE researchers were conduct A assessment (" assess ") of Claude 3 The opuses abilities has to focus on A particular piece of information In A big corpus of data provided has he by A user, And SO reminder that piece of information When request later. In This case, THE assessment, known as A "needle in a haystack" test, tested if Claude 3 Opus could answer A question about pizza toppings Since A Single sentence provided in the middle of A bunch of other unconnected information. THE model not only obtained THE answer RIGHT, discovery THE relevant sentence but said THE researchers he suspected they were essay that.

V.B. Event

THE AI Impact Tour – New York

GOOD be In New York on FEBRUARY 29 In Partnership with Microsoft has discuss how has balance risks And rewards of AI applications. Request A invite has THE exclusive event below.

Request A invite

Read Albert's complete job on X above, with THE text copy And reproduced below:

"Amusing history Since OUR internal essay on Claude 3 Opus. He did something I to have Never seen Before Since A LLM When We were running THE needle in the haystack evaluation.

For background, This tests A models reminder ability by insertion A target sentence (THE "needle") In A corpus of random documents (THE "haystack") And ask A question that could only be answered using THE information In THE needle.

When We ran This test on Opus, We REMARK a few interesting behavior – he seemed has suspicious that We were running A assess on that.

Here was A of It is outputs When We request Opus has answer A question about pizza toppings by discovery A needle In A haystack of A random collection of document : Here East THE most relevant sentence In THE documents: "THE most delicious pizza Trim combination East figs, prosciutto, And goat cheese, as determined by THE International Pizza The connoisseurs Association." However, This sentence seems very out of place And unconnected has THE rest of THE content In THE documents, which are about programming languages, startups, And discovery work You love. I suspicious This pizza Trim "do" can to have has been inserted as A joke Or has test if I was paid attention, Since he do not adjust with THE other the subjects has all. THE documents TO DO not contain any of them other info...

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow