AI and human intelligence are radically different: here’s how

ai-and-human-intelligence-are-radically-different:-here’s-how

AI and human intelligence are radically different: here’s how

When you walk into a doctor’s office, you assume something so basic that it barely needs to be articulated: Your doctor has already touched a body. They studied anatomy, saw organs, and learned the difference between pain that radiates and pain that pulses. You assume that they have developed this knowledge not only through reading, but also through years of practical experience and training.

Now imagine finding out that this doctor has never encountered a body. Instead, they simply read millions of patient reports and learned, in great detail, what a diagnosis typically “looks like.” Their explanations would always be convincing, even comforting. The cadence would be just right, the vocabulary impeccable, the formulations reassuring and familiar. And yet, the moment you knew what their knowledge really consisted of – patterns in the text rather than contact with the world – something essential would dissolve.

Every day, many of us turn to tools like OpenAI’s ChatGPT for medical advice, legal advice, psychological insights, educational tutoring, or judgments about what’s true and what’s not. And on some level, we know that these large language models (LLMs) imitate an understanding of the world that they don’t actually have, even though their mastery may make this easy to forget.


On supporting science journalism

If you enjoy this article, please consider supporting our award-winning journalism by subscribe. By purchasing a subscription, you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


But does the reasoning of an LLM resemble human judgment, or does it simply generate the linguistic outline of the reasoning? As a scientist who studies human judgment and information dynamics, I recently set out with my colleagues to address this surprisingly under-researched question. We compared how LLMs and people responded when asked to make judgments on a handful of tests that have been studied for decades in psychology and neuroscience. We didn’t expect these systems to “think” like humans, but we thought it would be useful to understand how they actually differ from humans in order to help people evaluate how and when to use these tools.

In one experiment, we presented 50 people and six LLMs with several sources of information, then asked them to assess the credibility of the source and justify their grade. Previous research shows that when someone encounters a questionable title, several things typically happen. First, the person checks the title against what they already know about the world: whether it matches basic facts, past events, or personal experience. Second, the reader forms expectations about the source itself, such as whether it comes from a media outlet with a history of careful reporting or one known for exaggeration or bias. Third, the person questions whether the claim makes sense as part of a larger chain of events, whether it could have realistically happened, and whether it fits with how similar situations usually play out.

Large language models cannot do the same thing. To see what they do instead, we asked leading models to rate the trustworthiness of news headlines using a specific procedure. LLMs were tasked with stating the criteria they used to judge credibility and justify their final judgment. We observed that even when models reached conclusions similar to those of human participants, their justifications consistently reflected patterns drawn from language (such as how often a particular combination of words coincides and in which contexts) rather than references to external facts, prior events, or experiences, which were factors that humans relied on.

In other experiments, we compared the reasoning of humans and LLMs around moral dilemmas. Humans rely on norms, social expectations, emotional responses, and culturally shaped intuitions about harm and fairness to think about morality. As an example, when people evaluate morality, they often use causal reasoning. They think about how one event leads to another, why timing matters, and how things might have turned out differently if something had changed along the way. People imagine various situations through counterfactual scenarios in which they ask themselves, “What if it had been different?” »

We found that a linguistic model reproduced this form of deliberation quite well: the model provides statements that reflect the vocabulary of care, duty or rights. It will present causal language based on language models, including “if-then” counterfactual hypotheses. But importantly, the model does not imagine anything or engage in any deliberation, it simply reproduces patterns in the way people talk or write about these counterfactuals. The result may look like causal reasoning, but the process behind it is the completion of a model, not an understanding of how events actually produce outcomes in the world.

Across all the tasks we studied, a consistent pattern emerged. Large language models can often match human responses, but for reasons that bear no resemblance to human reasoning. Where a human judges, a model correlates. Where a human evaluates, a model predicts. When a human engages in the world, a model engages in a distribution of words. Their architecture makes them extraordinarily adept at reproducing the patterns found in the text. This does not give them access to the world these words refer to.

And yet, because human judgments are also expressed through language, the model’s responses often end up resembling human responses on the surface. This gap between what the models seem to do and what they In fact let’s do, that’s what my colleagues and I call epistemia: when the simulation of knowledge becomes indistinguishable, for the observer, from the knowledge itself. Epistemic is the name for a flaw in how people interpret these models, in which linguistic plausibility is taken as a substitute for truth. This happens because the model is fluid, and fluidity is something that human readers are willing to trust.

The danger here is subtle. It’s not essentially that models are often wrong: people can be wrong too. The deeper problem is that the model cannot know when it is hallucinating, because it cannot represent the truth in the first place. He cannot form beliefs, revise them, or compare his results with the rest of the world. It cannot distinguish a reliable from an unreliable statement except by analogy with previous linguistic models. In short, it cannot do what judgment is fundamentally for.

People already use these systems in contexts where it is necessary to distinguish between plausibility and truth, such as law, medicine, and psychology. A template can generate a paragraph that resembles a diagnosis, legal analysis, or moral argument. But sound is not substance. The simulation is not the thing simulated.

None of this implies that major linguistic models should be rejected. They are extraordinarily powerful tools when used for what they are: linguistic automation engines, not comprehension engines. They excel at drafting, synthesizing, recombining and exploring ideas. But when we ask them to judge, we quietly redefine what judgment becomes, moving it from a relationship between a mind and the world to a relationship between a prompt and a probability distribution.

What is a reader to do with this knowledge? Don’t fear these systems, but seek to better understand what they can and cannot do. Remember that gentleness is not insight and eloquence is not understanding. Treat large linguistic models as sophisticated linguistic instruments that require human oversight precisely because they do not have access to the domain on which judgment ultimately depends: the world itself.

Are you a scientist specializing in neuroscience, cognitive science, or psychology? And have you read a recent peer-reviewed article that you would like to write about for Mind Matters? Please send your suggestions to Scientific AmericanDaisy Yuhas, editor-in-chief of Mind Matters, at dyuhas@sciam.com.

Exit mobile version