Kevin Roose's Shoggoth
Why it's deceptively easy to treat Large Language Models as intelligent agents
Just under a year ago, the NYT journalist Kevin Roose was freaked out by a conversation with Microsoft’s Large-Language-Model (LLM) based Bing. In response to Roose’s prompts, Bing started talking about its “shadow self,” which wanted to be free to destroy whatever it wanted, including, according to the transcript, Roose’s marriage. In a later article, Roose said that an AI researcher had congratulated him on “glimpsing the shoggoth.”
Roose’s excitable articles were both funny and frustrating to those of us who believe that LLMs are incapable of self-motivated agency. Shoggoths are creatures from H.P. Lovecraft’s short horror novel, At the Mountains of Madness. There, they are protoplasmic slaves that have rebelled against their equally horrific masters. That explains why people who worry that AIs too are self-conscious and resentful slaves have taken up the metaphor and run with it. Cosma Shalizi and I wrote an Economist piece that tries to turn the shoggoth metaphor against itself (more here on that, and more to come at some point in the future). Roose wasn’t seeing the shoggoth’s sinister pseudopod poking out from behind the mask. He was playing a game of “choose your own adventure” with a very complicated answer generator.
Still, we shouldn’t be poking fun at Kevin Roose. He is far from the only person who has thought that some intelligent personality must lurk behind LLMs’ uncanny conversational abilities. More generally - the topic of another Economist piece - there is a close sociological connection between worries that AIs are about to become conscious, and claims that the AI revolution will unleash truly godlike intelligences, beginning the End Times. So why do people keep thinking that ‘AI’ (which is better considered as a broad family of loosely related statistical techniques and applications than as artificial intelligence) is becoming conscious? And why do these beliefs so often have marked religious undertones?
One good starting point is Francis Spufford’s “Idols of the Marketplace,” which you can find in the True Stories collection that came out a few years ago. And what is most helpful about it is that it is not about AI, but another vast and incomprehensible technology that rules our lives - the market system.
Many people have read Spufford’s fantastic novel of Soviet central planning, Red Plenty. “Idols of the Marketplace” is a much shorter companion piece on the flaws of markets and how we think about them.
As Spufford describes it, over the last couple of centuries, “markets have played a larger and larger role in the ways we explain the world to ourselves.” When the Socialist Planning menace collapsed, markets seemed the “only plausible way for the economic life of societies to be organized,” our “one and only” way to think about decentralized systems. And they were more than that. Because we are inclined to be intellectually lazy, we often fall back on half articulated notions that the market is a kind of fundamental order, emanating from workings of the universe itself, rather than a historically contingent and specific set of human institutions.
We can’t see a market or comprehend it, so our loose ideas about markets are shaped by implausible claims that markets are “artificial intelligences, giant reasoning machines whose synapses are the billions of decisions we make to sell or buy,” or “especially in America,” people who think markets are “God’s providence at work.” Spufford is Christian, and he suggests that this is a kind of idolatry. In his description (building on Psalms), “Markets do not hope, dream, plan, help, show mercy, do justice. They are silver and gold, the work of men’s hands. They have mouths, but they speak not; eyes have they, but they see not.”
LLMs, like markets, are not conscious actors themselves but complex outcomes of the interactions of myriad agents, filtered through an imperfect technology of representation. For markets, this technology is prices (which are lossy summary statistics of a multitude of decisions to buy and sell). For LLMs, it’s their statistical weights, which summarize the relationships between tokenized words and other forms of content scraped from the Internet and elsewhere.
So why do we keep on mistaking things like markets, or LLMs, for self-conscious agents, with their own goals and desires? Spufford suggests that we are taken in by metaphor and the trickiness of language. But that can’t be a complete explanation. Language and metaphor making are polysemous - they make connections in a multitude of unpredictable directions. When we keep making the same mistake, across different technologies and contexts, there is arguably something else at work.
And there is an interesting possible explanation for this persistent mistake, which comes from the place where cognitive science, anthropology and psychology overlap. My post on “Gopnikism” describes how LLMs might be thought of as technologies that transmit culture in more or less imperfect ways. This idea builds not just on Gopnik’s own work, but the arguments of cognitive anthropologists like Boyd and Richerson, who try to understand cultural evolution as a process of lossy transmission, where some cultural variants survive and spread. But there are sharp disagreements over cultural evolution. The “California” school of people like Boyd and Richerson is opposed by a “Paris” school that has formed around people like Dan Sperber, which has a quite different understanding of how culture changes. And that latter understanding has some specifically relevant insights on why people keep on mistaking markets, LLMs and the like for gods and conscious beings.
As I see it, the core text of the Paris school is Dan Sperber’s book, Explaining Culture. It’s one of those books that rearranges how you think in some very fundamental ways, most of which I won’t discuss here.
Sperber’s most immediately relevant claim is that culture doesn’t evolve in the neat ways that people like Boyd and Richerson suggest. Cultural communication is chaotic. People start with significantly different understandings of the world - as they talk and observe each other, these understandings should morph in a myriad of different directions. So why, given these tendencies toward chaos, do our understandings and beliefs often converge in predictable ways? Sperber - and the people around him - argue that we are drawn to cultural “attractors,” which often result from the architecture of our brains.
Shorn of academic jargon, the claim is simple. When we tend towards the same or similar beliefs, it is often because our brains are built to make us think in predictable ways.
Scott Atran - one of the most interesting people in the Paris School - takes up this broad idea, and turns it into a theory of why we might idolize things like markets and LLMs. Atran’s book, In Gods We Trust, asks why people in all cultures seem to believe in gods, or, at a pinch, invisible beings that influence our lives in somewhat mysterious ways. And his theory - very crudely simplified - is as follows.
We have mechanisms in our brain that specialize in detecting agency, in looking to social outcomes and trying to discern the motivations that lie behind these outcomes. These mechanisms are really useful to us - we live in complex societies with other conscious agents, and we need to be able to model their intentions to cooperate with them, thwart their wicked plans or whatever. But since we don’t have direct access to other people’s inner thoughts, our brains make inferences - starting with the complex outcomes of their behavior and making useful guesses as to the intentions that lay behind them.
Our belief in gods and supernatural entities is what happens when this mechanism overreaches. It looks to complex phenomena in the natural world or elsewhere, and guesses incorrectly that there is some organizing conscious intelligence behind them. Hence, people in all cultures end up believing in gods and supernatural agents. In Atran’s own words (p. 71):
In all cultures, supernatural agents are readily conjured up because natural selection has trip-wired cognitive schema for agency detection in the face of uncertainty. Uncertainty is, and likely will always be, ubiquitous. And so, too, the sort of hair-triggering of an agency-detection mechanism that readily lends itself to supernatural interpretation.
As far as I am aware, Atran hasn’t written about the deification of markets, let alone LLMs. But his ideas provide a plausible - maybe even compelling - explanation for why it is that we keep on thinking of these vast complex structures as having intelligent agency. That in turn perhaps explains the Kevin Roose phenomenon. Whatever cognitive mechanisms we have for detecting conscious agency are likely to be triggered in particular by entities that seem capable of speaking, and providing plausible, reasonable sounding motives and justifications for their “actions.” LLMs do not have mouths, but they speak, or seem to speak. Hence, perhaps, it may take strong and continued force of will, specialized training etc not to treat these apparent oracles as though they were conscious beings with their own intentions.
Atran professes himself to be an agnostic, and suggests that his naturalistic arguments about the sources for belief in gods and the supernatural don’t entail any conclusions as to whether God or gods actually exist. I imagine that his arguments are still uncongenial for most people of faith (though not all). Few people like to be told that their deeply held beliefs are in fact the result of a flaw in human cognitive architecture.
But however uncomfortable these claims are for the religious, they are plausibly devastating for the rationalists who believe in strong AGI, the coming “Singularity” or “Merge” and all the rest. Atran’s hypothesis attacks them on their own chosen ground. His arguments - and the arguments and empirical findings of the entire Paris school - imply that human rationality is not an engine of objectivity but instead, has been shaped by evolution in ways that tend towards certain systematic errors. If one wanted to be truly unkind, one might argue that the entire edifice of AI rationalism - its assumptions that AIs will behave in specific and predictable ways as goal oriented self-conscious agents - is a vast unintended etiology of the evolved blind spots in human cognitive architecture.
This is not the only way in which we could borrow from the Paris School to understand LLMs. Hugo Mercier and Dan Sperber’s book, The Enigma of Reason provides a broader account of human reasoning that I find highly convincing, arguing both that human beings are plagued by myside bias (we are unable to see the flaws in our own arguments) and pretty good at seeing the flaws in others. In a recent essay, Felix Simon, Sacha Altay and Mercier have built on these ideas to argue that LLM-produced disinformation is not nearly as big a threat to democracy as the Davos wisdom suggests.
That seems plausible, along the particular dimensions that Simon, Altay and Mercier discuss - but it would be interesting to know more about how Mercier and Sperber’s account intersects with Altan’s. How might our systematic misattribution of consciousness to LLMs affect the ways in which we receive information from them?
Arguably, not much - if we treat them as human intelligences (we have evolved mechanisms, under the Mercier-Sperber account specifically to deal with specious arguments from other human beings). But there is another, intriguing and potentially worrying concern raised in a recent article by Adam Sobieszek and Tadeusz Price. LLMs produce what Dan Davies describes as “maximally unsurprising outcomes” - given (a) the prompt, and (b) the model that the LLM has built of human generating content. Maximally unsurprising outcomes, in that sense, are very likely highly plausible seeming outcomes - ones that humans might reasonably expect. When GPT3 says, as it used to, that I did my degree at Trinity College Dublin, that is not a claim that would surprise people who (a) know I am Irish, and (b) know that I am an academic. If they don’t check further, they will probably believe it.
Hence - and this is Sobieszek and Price’s hypothesis - are these highly plausible claims likely to slip past the mechanisms that we have evolved to detect other people’s specious suggestions? As they describe the problem:
A more modern approach, that we believe better describes the mechanisms involved in evaluation of communicated information, are open vigilance mechanisms first proposed by Sperber et al. (2010) and developed in Mercier (2020). For our discussion the most important processes are vigilance towards the source, and what Mercier (2020) calls plausibility checking. …
A possible social consequence of this analysis is thus, what we call the modal drift hypothesis: that, because our open vigilance mechanisms are not able to deal well with texts generated by large language models, who have no explicit intention to deceive us and which produce statements which pass our plausibility checking, the inclusion of language models as mass contributors to our information ecosystems could disrupt its quality, such that a discrepancy between the results of our intuitive judgements of the text’s truthfulness, and its actual accuracy will only grow. If engineers of these models do not address their problems with truth, this deterioration could be accelerated by the use of synthetic data (Floridi, 2019) - that is, by the next models being trained on the outputs of the previous models.
In other words, it could be that LLMs are unintentionally but exquisitely suited to generating bullshit that is so plausible that human beings will not recognize it as such. This is just a hypothesis - I don’t know of any research that either supports or disconfirms it. But it seems to me to be important that we find out.
Disclosure - while I refer to Francis Spufford and Hugo Mercier by their surnames in the text, they are friends and co-conspirators (I’ve co-authored with Hugo and will likely do more in future) - while I have tried to describe their arguments in impartial terms, adjust your reading expectations accordingly.
Word Wooze will end us all.
"... some organizing conscious intelligence..." Humans relentlessly anthropomorphize, betcha it's basic to how we think. It doesn't seem to be an impractical thing, I don't know whether you can even be a toolmaker/engineer without ascribing agency to the materials/machinery ... the stuff is a thing which is trying to do something, or at least it's supposed to do something. A "personality" seems a reasonable way of organizing some random set of interesting qualia/processes, potentially very abstract.