People frequently anthropomorphize or personify technology, ascribing characteristically human or personal features, states, or abilities to technological artifacts. These might include properties falling under the bodily or biological, experiential or phenomenological, affective or emotional, intentional or agentive, rational or cognitive. Though I will use “anthropomorphism” and “personification” more or less interchangeably, it’s worth noting the distinction between human and personal properties (persons can be nonhuman, and some human properties might not have to do with being a person). The features I will focus on will be the personal ones, especially those related to communication and cognition.
Instances of anthropomorphism or personification of technology are not necessarily biased. By “bias” here I mean a more or less systematic (as opposed to purely random) departure from a genuine norm of correctness (Kelly 2023). More specifically, I mean either a departure from the norm of accuracy, or a departure from an epistemic norm such as believing in line with the available evidence (which tends to lead to inaccuracy). First, anthropomorphic language might be, and often is, just metaphorical. I occasionally use salty language to describe home appliances and devices when they malfunction, implying that they are malicious or incompetent agents deserving of an early retirement to the junkyard. Metaphorical anthropomorphism might change how we behave towards technological artifacts, treating them, in some ways, as if they are rational agents—perhaps even ways that go beyond how we occasionally name or describe them. But these changes are likely to be very limited in comparison to someone who anthropomorphizes literally. Second, it is possible for technology to have at least some human- or person-like feature, and attributing such a feature need not be biased or out of line with our total evidence. I might believe that a robot has a face or walks like a human, and I might be right; and in the future, perhaps some will have more demanding human or personal features.
Up until just a few years ago, anthropomorphizing of computer technology was limited by the fact that computer systems could not really compete with us on tasks that normally require intelligence when performed by humans, except in very restricted domains or specific sorts of tasks (e.g., performing calculations, playing chess). And we had relatively straightforward explanations available for the capabilities in terms of how the technology was designed or programmed. Just as children grow up and come to understand that their stuffed animals are really just toys, we grew up in the age of personal computers understanding full well that these were more sophisticated, highly useful things, but just things. We marveled at how Siri and Alexa were able to play a song or search the web on command, but became quickly disappointed when we tried to engage them in conversation or gave them more complex requests. We entertained the possibility of interacting with artificial systems that are genuinely intelligent, or whose (non)personal status was uncertain, but really only in the context of fiction, philosophy, and speculations about a possibly-not-too-distant future.
The recent development and use of AI have changed things, however. It’s not that AIs have sentience or understanding. Nor is their current personal or moral status really unclear or uncertain, at least not to most experts and those who understand how these systems work. There are good reasons to doubt that today’s generative AI systems, like ChatGPT, have anything like our capacity for sentience or understanding (more on this below). Rather, AI has changed things because these systems are significantly better at producing what Titus (2023) aptly calls “meaning-semblant” behavior, behavior that looks or seems meaningful. This has arguably made biased personification of AI significantly more likely. As we shall see, there are very strong, and in some ways very rational, pressures to personify AI.
People have grown accustomed to hearing, if not using, various personifying or at least ambiguous terms to describe technology: the talk of “information” as stored in tree trunks, books, computers, and minds; the ambiguity of “Artificial Intelligence” (fake intelligence? synthetic intelligence?); the description of AI systems as involving “machine learning”, or the “training” of “neural networks”, etc. Experts and developers tend to use personifying language in ways that, as we shall see, go well beyond this, rarely even flagging the language as potentially misleading or controversial to those who do not have technical knowledge. But to really appreciate the risk of anthropomorphism, we need to look more carefully at how AI systems work, and how we might, explicitly or implicitly, think they work.
Consider my hypothetical but quite ordinary friend, Kareem, who has some relatively limited information about ChatGPT. After a while of playing around with it, he says: “Wow. How is it able to do that? I mean, I don’t believe that it is conscious or really understands what it’s saying like you or I, but it sure seems like it does. How is it able to do this?” Kareem avoids applying obvious or straightforward features of humans or persons, and so avoids anthropomorphizing in this classical or standard sense. He thinks that ChatGPT is, in some way, a more sophisticated program on a more powerful machine, but that’s all. He might also accept, on the AI expert’s authority, that there is an explanation for how these systems work that involves no consciousness or awareness on the part of the machine at all.
It’s worth noticing how very difficult it can be for Kareem and the rest of us to resist using words like “know,” “think,” “interpret,” “understand,” “infer,” etc. to describe AI. Indeed, it can be a very useful means of making effective use of technology like ChatGPT, writing effective prompts, anticipating its behavior, describing its errors, and communicating to others about these systems. We can consciously deny that we use any of these literally or in the same sense as used with humans, but what understanding of these terms are we applying? There’s a danger here of anthropomorphizing, a danger from which AI practitioners and developers are not entirely immune.
Suppose that Kareem asks ChatGPT to provide an accessible explanation for its meaning-semblant behavior. He might receive something like the following output (actual ChatGPT output, June 2023): “As an AI language model, I’m trained on vast amounts of text data, learning patterns and structures in language. I don’t ‘understand’ language like humans do. Instead, I predict the most likely words and phrases to follow based on context. My responses are generated by utilizing these learned patterns, giving the illusion of understanding, even though I lack true comprehension.”
That’s a decent start, though misleading in some respects. For example, Large Language Models (LLMs) like ChatGPT don’t really work on words and phrases to predict other words and phrases; rather, the basic “tokens” tend to be sub-words or parts of words. It is “trained” to make text predictions by feeding it a lot of data (GPT 4 was given 45 gigabytes worth—equivalent to a few million pages of text), and adjusting its internal rules, the weights and balances in a very complex internal structure (“neural network”), to predict how some given text will continue. These weights and balances can be represented by a very complex mathematical function with literally billions of parameters or variables (GPT 4 has about 1 trillion parameters). It starts off being very bad at predicting text continuation, but it improves its accuracy by repeatedly tweaking its weights in light of its previous performance. The model can thus be understood as representing the likelihood, given the training data, that a string of text is followed by some other string of text.
Today’s Generative AI are designed to reflect statistical regularities existing in the data – in the case of ChatGPT, patterns of letter and word placement. It is no surprise that the syntax of our natural languages, the syntax of any meaningful language, is patterned, satisfying complex statistical regularities. LLMs are designed to take advantage of these regularities, generating text with the sorts of patterns that are already there in the linguistic data. Because of this, they can produce apparently coherent, meaningful text across a wide range of subject matter.
This sort of explanation might help us, and Kareem, understand how ChatGPT works, at least in an abstract or general sort of way. This does not, however, translate to an explanation of the specific behavior and capacities of models like ChatGPT—for example, it does not provide much of an explanation for how ChatGPT responds to a series of explicit and nuanced instructions as though it understands exactly what you asked it to do, including responding to queries about what certain words and phrases mean. Some of the apparent capacities ChatGPT acquired were very surprising even to its developers. To the extent that we have an explanation, it is not an explanation of any particular output or relatively specific features of output.
Some might object that even if ChatGPT is designed to predict text continuation by relying on patterns of symbol placement, it might still come to develop states with meaning corresponding to our natural interpretation of its output. Even if this could happen, it would be very hasty to assume that it has happened or is likely to happen. While attempts to arrive at approximate, partial explanations of specific outputs are underway in the field of explainable AI (XAI), it’s too early to tell how successful these will be, let alone whether they will support the idea that these systems do anything at all like what we do when we communicate. (For some very helpful discussions of XAI, see Fleisher 2022 and Mittelstadt et al. 2019.) We currently lack good reasons to think that the behavior of LLMs is better explained by the existence of states with meaning, and specifically meaningful contents corresponding to the meaningful contents of our natural language, than by states with wildly different contents, or states that merely encode statistical regularities from meaningful data (Titus 2023). A related, cautionary note: we should be careful in interpreting reports that claim to have established that current AI systems have “understanding,” “concepts,” or even “representations,” but that provide unclear or very weak criteria for having such states. For example, the presence of a systematic causal relation between an internal feature or activation pattern in a neural network and a syntactic pattern in language is too weak a condition for the feature to count as representing what the linguistic pattern tends to mean in ordinary language. No plausible account of mental representation, including naturalistic and reductive accounts, would take this to be sufficient. (Compare Anthropic’s apparent claim to the contrary).
To be fair, there are some more careful arguments to the effect that LLMs do satisfy the main criteria for mental representation, arguments that appeal studies on recent AI models, (Goldstein and Levinstein 2024). However, a few important points about this debate are relevant to the present context. First, the arguments are, strictly speaking, only for the conditional claim that if the leading naturalistic or physicalistic accounts of mental representation are correct, then some current LLMs are likely to have them. Many philosophers (over 30% according to PhilSurvey) “accept or lean towards” non-physicalism, and not all physicalist take the leading accounts to be correct. Second, that LLMs have some internal representations does not imply that they represent the same things we represent when we use language, let alone imply that they have states that correspond to our folk psychology, states like belief, intention, desire, reasoning, and understanding. The leading naturalistic accounts of such states require that they be representational states that have relatively stable causal roles, and the evidence that today’s LLMs have states that play such roles is inconclusive at best. In fact, the evidence seems to point in the other direction: that LLMs often respond in significantly disparate ways to only minor changes in the prompt seems to conflict with the hypothesis that LLMs have relatively stable representational states across time and context. (Goldstein and Levinstein’s recent defense of LLMs having internal representations is more tentative here, leaving it an “open question” whether folk psychological states can be attributed to LLMs, and recognizing that this “stability problem” is significant.)
ChatGPT is good enough at doing what it’s designed to do, predicting text continuation based on statistical patterns in the huge corpus of textual data it is fed, to give readers a strong impression of having certain personal properties, and specifically, something like understanding or at least a kind of sensitivity to the meanings of terms in natural language. But this is also the reason why it is not likely to be as reliable or trustworthy as it appears: it is not actually tracking the meaning of our words, or the logic of our inferences, reasoning, and commonsense thinking. When I answer someone in a conversation, I do so by understanding what they are asking and responding to that—I don’t consider the likelihood of that string of signs being followed by other kinds of signs. If the AI is responding to or “answering” a question, it is not the question we ask, but something like: “What is the statistically most likely next string of symbols given that one?” (Analogy: compare someone who actually understands a language with someone who has merely memorized the typical patterns of sounds or symbols in a language.) So, while the natural—indeed, in a way, quite rational—explanation of the appearance of meaningful communicative behavior is that ChatGPT has states and processes with semantic contents corresponding to the meaning of our language, the actual explanation of any specific behavior is something else entirely, and hidden from view.
We should not be surprised, then, that a number of people sincerely believe, or at least act very much as if they believe, that some AI systems have sentience and understanding, and that number is likely to grow. (See, for example, the story about the Google engineer who believes the company’s AI, LaMDA, is sentient, and the story regarding a suicide after conversations with a chatbot, and the Hi-Phi Nation episode on “Love in the Time of Replika”.) Relatedly, many express genuine confusion about what to think about the status of AI. Moreover—and this is something that I believe is under-appreciated in current discussions of anthropomorphism of AI—many who, like Kareem, avoid believing that AI are sentient or conscious, might still anthropomorphize in very significant ways that are not merely metaphorical. They might do so by predicating some properties that imply understanding or something very much like it, without predicating sentience or consciousness; or by taking AI to simulate understanding or reasoning, having states and processes that function much like understanding and reasoning without being intrinsically like them.
The gap between the apparent and actual explanation of the behavior of generative AI is reflected in the growing record of AI’s failures to emulate simple reasoning and common sense. Consider, for example, this very short exchange I had with the most recent version of ChatGPT (May 2024):
Upon seeing this, a friend jokingly responded: “Are you saying Jake and Yasmin don’t live with themselves?” Someone might suggest that ChatGPT is “thinking” that “Jake and Yasmin live with themselves,” and missing the commonsensical, intended meaning of the question, which rules out such an answer as appropriate. But that’s already to give ChatGPT too much credit, ascribing to it a kind of understanding, or even a sensitivity to or ability to track the meaning of natural language terms, that it just doesn’t have.
Or consider the following example (May 2024):
A human would have no trouble catching on and responding accordingly. ChatGPT’s apparent ability to follow specific directions makes it seem like it understands what we are asking it to do, or at least simulates understanding, but responses like the ones above suggest otherwise. And while the above mistakes are obvious, they will not always be obvious or easy to detect given that LLMs generally mimic the structure and tone of training data sets that lack such obvious mistakes. This is supported by systematic studies of LLM performance on commonsense knowledge.
This source of risk of anthropomorphism—the fact that they are statistical machines in the above sense—is central to how these systems work, unlike relatively superficial anthropomorphic framing, design, or augmentation—e.g., giving an AI assistant a human name, designing it to respond in the first-person, giving it a human-like voice with emotive intonation. These more superficial design features can leave a much stronger impression of interacting with a genuine agent when combined with the power and training of these statistical machines.
A significant problem with AI-generated output is that there are rational pressures to anthropomorphize. As we have seen, a natural and quite rational, but incorrect, explanation of AI behavior is that it is capable of understanding or at least tracking the meaning of our words, and the reasons to distrust this explanation are relatively technical and abstract. I fear that few people will recognize these sorts of defeaters, and fewer still will bring them to mind and effectively correct their tendencies to personify AI. There is thus a significant risk that, whether or not we explicitly anthropomorphize AI in the classical sense of taking them to be sentient or conscious, we will anthropomorphize them in a more subtle, but still biased or inaccurate, way. Just how serious a social or ethical risk this is is a very complex, empirical matter, and we don’t have very good ways of measuring the general accuracy and reliability of generative AI. But it seems to me to be significant indeed. We should not ignore or dismiss it, or assume that people will adjust their degree of trust to the (un)trustworthiness over time. After all, given that these models reflect the statistical regularities in language, we can expect the typical surface cues of coherent and trustworthy speech to be reproduced even when it isn’t coherent or trustworthy. It’s no surprise, for example, that ChatGPT produces fake “hallucinated” reports and citations with apparent confidence and high specificity. Moreover, it’s very clear that the large corporations that are designing and developing these systems have very strong motivation to encourage anthropomorphizing AI, even as they claim to be proponents of responsible, safe AI.
I’ve mostly focused on arguing that there’s a kind of biased anthropomorphism that is made significantly more likely by advanced AI assistants and that is easy to dismiss or overlook. I haven’t said much about what sorts of ethical risks might come up, and I don’t have space to do that in detail here. But they include taking AI to be more trustworthy than they really are, contributing to AI hype, the spread of misinformation, and all their attendant risks in all sorts of contexts—social media, journalism, education, health, the military, finance, transportation, cybersecurity, etc. There is also the risk of encouraging stronger forms of anthropomorphism and of obscuring the moral status of AI, especially as more sophisticated, multi-modal AI assistants are developed. This could lead to (more) people getting emotionally entangled with AI, treating AI as friends or partners, and as replacements for human relationships; and to AI phobia, being overly distrustful of AI, or taking AI to have or simulate malicious or manipulative intentions.
None of what I’ve said should be taken to imply that systems likely to encourage strong (biased or inaccurate) anthropomorphism should never be permitted. First of all, there is no doubt that AI assistants are poised to become extremely useful in all sorts of personal, practical, and professional contexts, especially if they are easy to “talk to.” In a recent interview (around 15 minutes in), OpenAI CEO Sam Altman makes the very plausible claim that AI systems with human-like or “human-compatible” features will seem more natural and fluid, and be easier to use in all sorts of ways. (Unsurprisingly, he defends developing strongly human-like AI and doesn’t dwell much on the risks of anthropomorphism. He does mention that we should not take AI to have the capacity to think, but then says, apparently personifying AI, “I always try to, like, think of it as an alien intelligence and not try to project my anthropomorphic biases onto it.” I hope I’m not alone in finding the suggestion to think of AI as an alien intelligence problematic.) Second, as some have argued, strong anthropomorphic design may be appropriate in some contexts if it contributes significantly to positive outcomes (e.g. health and education). But the likelihood and risks of anthropomorphism should be considered seriously in the design, development, procurement, deployment, and description of AI. The above argument lends further support to the importance of efforts (like this one and this one) to provide guidance on how to identify and assess these risks, reduce the harms, and use this powerful technology responsibly.
Acknowledgements: I am grateful to Jovana Davidovic, Adam Bowen, and Jennifer Kayle for helpful comments on earlier versions. I have also benefited from discussions with Khoa Lam, Will Fleisher, Dinah Rabe, and Mert Çuhadaroğlu.
Ali Hasan
Ali Hasan is associate professor and chair of the Department of Philosophy at the University of Iowa. He specializes in epistemology, philosophy of mind, and ethics. He is also senior advisor at BABL AI, a consultancy that provides audits and ethical risk assessments of AI systems.