Cultural theory was right about the death of the author. It was just a few decades early
How old theories explain the new technology of LLMs
There’s a great anecdote about Roman Jakobson, the structuralist theorist of language, in Bernard Dionysius Geoghegan’s book, Code: From Information Theory to French Theory. For Jakobson, and for other early structuralist and post-structuralist thinkers, language, cybernetic theories of information, and economists’ efforts to understand how the economy worked all went together :
By aligning the refined conceptual systems of interwar Central European thought with the communicationism of midcentury American science, Jakobson envisioned his own particular axis of global fraternity, closely tied to forces of Western capitalist production. (He colorfully illustrated this technoscientific fraternity when he entered a Harvard lecture hall one day to discover that the Russian economist Vassily Leontieff, who had just finished using the room, had left his celebrated account of economic input and output functions on the blackboard. As Jakobson’s students moved to erase the board he declared, “Stop, I will lecture with this scheme.” As he explained, “The problems of output and input in linguistics and economics are exactly the same.”*
If you were around the academy in the 1980s to the early 1990s, as I was, just about, you saw some of the later consequences of this flowering of ambition in a period when cybernetics had been more or less forgotten. “French Theory,” (to use Geoghegan’s language) or Literary Theory, or Cultural Theory, or Critical Theory** enjoyed hegemony across large swathes of the academy. Scholars with recondite and sometimes rebarbative writing styles, such as Jacques Derrida and Michel Foucault, were treated as global celebrities. Revelations about Paul De Man’s sketchy past as a Nazi-fancier were front page news. Capital-T Theory’s techniques for studying and interpreting text were applied to ever more subjects. Could we treat popular culture as a text? Could we so treat capitalism? What, when it came down to it, wasn’t text in some way?
And then, for complex reasons, the hegemony shriveled rapidly and collapsed. English departments lost much of their cultural sway, and many scholars retreated from their grand ambitions to explain the world. Some attribute this to the Sokal hoax; I imagine the real story was more interesting and complicated, but have never read a good and convincing account of how it all went down.
Leif Weatherby’s new Language Machines: Cultural AI and the End of Remainder Humanism is a staggeringly ambitious effort to revive cultural theory, by highlighting its applicability to a technology that is reshaping our world. Crudely simplifying, if you want to look at the world as text; if you want to talk about the death of the author, then just look at how GPT 4.5 and its cousins work. I once joked that "LLMs are perfect Derridaeians - “il n'y pas de hors texte” is the most profound rule conditioning their existence.” Weatherby’s book provides evidence that this joke should be taken quite seriously indeed.
As Weatherby suggests, high era cultural theory was demonstrably right about the death of the author (or at least; the capacity of semiotic systems to produce written products independent of direct human intentionality). It just came to this conclusion a few decades earlier than it ideally should have. A structuralist understanding of language undercuts not only AI boosters’ claims about intelligent AI agents just around the corner, but the “remainder humanism” of the critics who so vigorously excoriate them. What we need going forward, Weatherby says, is a revival of the art of rhetoric, that would combine some version of cultural studies with cybernetics.
Weatherby’s core claims, then, are that to understand generative AI, we need to accept that linguistic creativity can be completely distinct from intelligence, and also that text does not have to refer to the physical world; it is to some considerable extent its own thing. This all flows from Cultural Theory properly understood. Its original goal was, and should have remained, the understanding of language as a system, in something like the way that Jakobson and his colleagues outlined.
Even if cultural theory seems bizarre and incomprehensible to AI engineers, it really shouldn’t. Rather than adapting Leontieff’s diagrams as an alternative illustration of how language works as a system, Weatherby reworks the ideas of Claude Shannon, Warren McCulloch and Walter Pitts, to provide a different theory of how language maps onto math and math maps onto language.
This heady combination of claims is liable to annoy nearly everyone who talks and writes about AI right now. But it hangs together. I don’t agree with everything that Weatherby says, but Language Machines is by some distance the most intellectually stimulating and original book on large language models and their kin that I have read.
Two provisos.
First, what I provide below is not a comprehensive review, but a narrower statement of what I personally found useful and provocative. It is not necessarily an accurate statement. Language Machines is in places quite a dense book, which is for the most part intended for people with a different theoretical vocabulary than my own. There are various references in the text to this “famous” author or that “celebrated” claim: I recognized perhaps 40% of them. My familiarity with cultural theory is the shallow grasp of someone who was trained in the traditional social sciences in the 1990s, but who occasionally dreamed of writing for Lingua Franca. So there is stuff I don’t get, and there may be big mistakes in my understanding as a result. Caveat lector.
Second, Weatherby takes a few swings at the work of Alison Gopnik and co-authors, which is foundational to my own understanding of large models (there is a reason Cosma and I call it ‘Gopnikism’). I think the two can co-exist in the space of useful disagreement, and will write a subsequent piece about that, which means that I will withhold some bits of my argument until then.
Weatherby’s argument pulls together cultural theory (specifically, the semiotic ur-theories of Jakobson, Saussure and others), with information theory a la Claude Shannon. This isn’t nearly as unlikely a juxtaposition as it might seem. As Geoghegan’s anecdote suggests, there seemed, several decades ago, to be an exciting convergence between a variety of different approaches to systems, whether they were semiotic systems (language), information systems (cybernetics) or production systems (economics). All seemed to be tackling broadly comparable problems, using loosely similar tools. Cultural theory, in its earlier formulations, built on this notion of language as a semiotic system, a system of signs, in which the meaning of particular signs drew on the other signs that they were in relation to, and to the system of language as a whole.
Geoghegan is skeptical about the benefits of the relationship between cybernetics and structural and post structural literary theory. Weatherby, in contrast, suggests that cultural theory took a wrong turn when it moved away from such ideas. In the 1990s, it abdicated the study of language to people like Noam Chomsky, who had a very different approach to structure, and to cognitive psychology more generally. Hence, Weatherby’s suggestion that we “need to return to the broad-spectrum, concrete analysis of language that European structuralism advocated, updating its tools.”
This approach understands language as a system of signs that largely refer to other signs. And that, in turn, provides a way of understanding how large language models work. You can put it much more strongly than that. Large language models are a concrete working example of the basic precepts of structural theory and of its relationship to cybernetics. Rather than some version of Chomsky’s generative grammar, they are based on weighted vectors that statistically summarize the relations between text tokens; which word parts are nearer to or further from each other in the universe of text that they are trained on. Just mapping the statistics of how signs relate to signs is sufficient to build a working model of language, which in turn makes a lot of other things possible.
LLM, then, should stand for “large literary machine.” LLMs prove a broad platform that literary theory has long held about language, that it is first generative and only second communicative and referential. This is what justifies the question of “form”—not individual forms or genres but the formal aspect of language itself—in these systems. Indeed, this is why literary theory is conjured by the LLM, which seems to isolate, capture, and generate from what has long been called the “literary” aspect of language, the quality that language has before it is turned to some external use.
What LLMs are then, are a practical working example of how systems of signs can be generative in and of themselves, regardless of their relationship to the ground truth of reality.
Weatherby says that this has consequences for how we think about meaning. He argues that most of our theories of meaning depend on a ‘ladder of reference’ that has touchable empirical ground at the ladder’s base. Under this set of claims, language has meaning because, in some way, it finally refers back to the world. Weatherby suggests that “LLMs should force us to rethink and, ultimately, abandon” this “primacy of reference.””
Weatherby is not making the crude and stupid claim that reality doesn’t exist, but saying something more subtle and interesting. LLMs illustrate how language can operate as a system of meaning without any such grounding. For an LLM, text-tokens only refer to other text-tokens; they have no direct relationship to base reality, any more than the LLM itself does. The meaning of any sequence of words generated by an LLM refers, and can only refer to, other words and the totality of the language system. Yet the extraordinary, uncanny thing about LLMs is that without any material grounding, recognizable language emerges from them. This is all possible because of how language relates to mathematical structure, and mathematical structure relates to language. In Weatherby’s description:
The new AI is constituted as and conditioned by language, but not as a grammar or a set of rules. Taking in vast swaths of real language in use, these algorithms rely on language in extenso: culture, as a machine. Computational language, which is rapidly pervading our digital environment, is just as much language as it is computation. LLMs present perhaps the deepest synthesis of word and number to date, and they require us to train our theoretical gaze on this interface.
Hence, large language models demonstrate the cash value of a proposition that is loosely adjacent to Jakobson’s blackboard comparison. Large language models exploit the imperfect but useful mapping between the structures within the system of language and the weighted vectors that are produced by a transformer: “Underneath the grandiose ambition … lies nothing other than an algorithm and some data, a very large matrix that captures some linguistic structure” Large language models, then, show that there is practical value to bringing the study of signs and statistical cybernetics together in a single intellectual framework. There has to be, since you can’t even begin to understand their workings without grasping both.
Similarly, large language models suggest that structural theory captures something important about the relationship between language and intelligence. They demonstrate how language can be generative, without any intentionality or intelligence on the part of the machine that produces them. Weatherby suggests that these models capture the “poetics” of language; not simply summarizing the innate structures of language, but allowing new cultural products to be generated. Large language models generate poetry; “language in new forms,” which refers to language itself more than to the world that it sometimes indirectly describes. The value matrix in the model is a kind of “poetic heat-map,” which
stores much more redundancy, effectively choosing the next word based on semantics, intralinguistic context, and task specificity (set by fine-tuning and particularized by the prompt). These internal relations of language—the model’s compression of the vocabulary as valued by the attention heads—instantiate the poetic function, and this enables sequential generation of meaning by means of probability.
Still, poetry is not the same as poems:
A poem “is an intentional arrangement resulting from some action,” something knit together and realized from the background of potential poetry in language: the poem “unites poetry with an intention.” So yes, a language model can indeed (and can only) write poetry, but only a person can write a poem
That LLMs exist; that they are capable of forming coherent sentences in response to prompts; that they are in some genuine sense creative without intentionality, suggests that there is something importantly right about the arguments of structuralist linguistics. Language demonstrably can exist as a system independent of the humans who employ it, and exist generatively, so that it is capable of forming new combinations.
This cashes out as a theory of large language models that are (a) genuinely culturally generative, and (b) incapable of becoming purposively intelligent, any more than the language systems that they imperfectly model are capable of becoming intelligent. Under this account, the “Eliza effect” - the tendency of humans to mistake machine outputs for the outputs of human intelligence - is not entirely in error. If I understand Weatherby correctly, much of what we commonly attribute to individual cognition is in fact carried out through the systems of signs that structure our social lives. In this vision of the cultural and social world, Herbert Simon explicitly rubs shoulders with Claude Levi-Strauss.
This means that most fears of AGI risk are based on a basic philosophical confusion about what LLMs are, and what they can and cannot do. Such worries seem:
to rest on an implicit “I’m afraid I can’t do that, Dave.” Malfunction with a sprinkle of malice added to functional omniscience swims in a soup of nonconcepts hiding behind a wall of fictitious numbers.
Languages are systems. They can most certainly have biases, but they do not and cannot have goals. Exactly the same is true for the mathematical models of language that are produced by transformers, and that power interfaces such as ChatGPT. We can blame the English language for a lot of things. But it is never going to become conscious and decide to turn us into paperclips. LLMs don’t have personalities, but compressions of genre that can support a mixture of ‘choose your own adventure’ with role-playing game. It is very important not to confuse the latter for the former.
This understanding doesn’t just count against the proponents of AGI. It undermines the claims of many of their most prominent critics. Weatherby is ferociously impatient with what he calls “remainder humanism,” the claim that human authenticity is being eroded by inhuman systems. We have lived amidst such systems for at least the best part of a century.
In the general outcry we are currently hearing about how LLMs do not “understand” what they generate, we should perhaps pause to note that computers don’t “understand” computation either. But they do it, as Turing proved.
And perhaps for much longer. As I read Weatherby, he is suggesting that there isn’t any fundamental human essence to be eroded, and there cannot reasonably be. The machines whose gears we are trapped in don’t just include capitalism and bureaucracy, but (if I am reading Weatherby right), language and culture too. We can’t escape these systems via an understanding of what is human that is negatively defined in contrast to the systems that surround us.
What we can do is to better map and understand these systems, and use new technologies to capture the ideologies that these systems generate, and perhaps to some limited extent, shape them. On the one hand, large language models can create ideologies that are likely more seamless and more natural seeming than the ideologies of the past. Sexy murder poetry and basically pleasant bureaucracy emerge from the same process, and may merge into becoming much the same thing. On the other, they can be used to study and understand how these ideologies are generated (see also).
Hence, Weatherby wants to revive the very old idea that a proper education involved the study of “rhetoric,” which loosely can be understood as the proper understanding of the communicative structures that shape society. This would not, I think, be a return to cultural studies in the era of its great flowering, but something more grounded, combining a well educated critical imagination, with a deep understanding of the technologies that turn text into numbers, and number into text.
This is an exciting book. Figuring out the heat maps of poetics has visible practical application in ways that AGI speculation does not. One of my favorite parts of the book is Weatherby’s (necessarily somewhat speculative) account of why an LLM gets Adorno’s Dialectic of Enlightenment right, but makes mistakes when summarizing the arguments of one of his colleague’s books about Adorno, and in so doing reveals the “semantic packages” guiding the machine in ways that are reminiscent of Adorno’s own approach to critical theory:
Dialectic of Enlightenment is a massively influential text—when you type its title phrase into a generative interface, the pattern that lights up in the poetic heat map is extensive, but also concentrated, around accounts of it, debates about it, vehement disagreements, and so on. This has the effect of making the predictive data set dense—and relatively accurate. When I ask about Handelman’s book, the data set will be correspondingly less concentrated. It will overlap heavily with the data set for “dialectic of enlightenment,” because they are so close to each other linguistically, in fact. But when I put in “mathematics,” it alters the pattern that lights up. This is partly because radically fewer words have been written on this overlap of topics. I would venture a guess that “socially constructed” comes up in this context so doggedly because when scholars who work in this area discuss mathematics, they very often assert that it is socially constructed (even though that’s not Handelman’s view). But there is another group that writes about this overlap, namely, the Alt Right. Their anti-Semitic conspiracy theory about “cultural Marxism,” which directly blames Adorno and his group for “making America Communist,” will have a lot to say about the “relativism” that “critical theory” represents, a case in point often being the idea that mathematics is “socially constructed.” We are here witnessing a corner of the “culture war” semantic package. Science, communism, the far right, conspiracy theory, the Frankfurt School, and mathematics—no machine could have collated these into coherent sentences before 2019, it seems to me. This simple example shows how LLMs can be forensic with respect to ideology.
It’s also a book where there is plenty to argue with! To clear some ground, what is genuinely interesting to me, despite Weatherby’s criticisms of Gopnikism, is how much the two have in common. Both have more-or-less-independently converged on a broadly similar notion: that we can think about LLMs as “cultural or social technologies” or “culture machines” with large scale social consequences. Both characterize how LLMs operate in similar ways, as representing the structures of written culture, such as genre and habitus, and making them usable in new ways. There are sharp disagreements too, but they seem to me to be the kinds of disagreements that could turn out to be valuable, as we turn away from fantastical visions of what LLMs might become in some hazy imagined future, to what they actually are today. More on that soon.
* I can’t help wondering whether Leontieff might have returned the favor, had he re-used Jakobson’s blackboard in turn. He had a capacious intellect, and was a good friend of the poet and critic Randall Jarrell; their warm correspondence is recorded in Jarrell’s collected letters.
** Not post-modernism, which was always a vexed term, and more usually a description of the subject to be dissected than the approach to be employed. Read the late Fredric Jameson, who I was delighted to be able to send a fan letter, thinly disguised as a discussion of Kim Stanley Robinson’s Icehenge, a year or so before he died (Jameson was a fan of Icehenge and one of Stan’s early mentors).
I think you'd do better to listen to the mathematicians. They admit they work with signs and symbols, but they know that those signs and symbols have meaning in some real world. The classic example is Euclid's geometry. If you ditch the Fifth Postulate, there's no way to tell if the theorems refer to the geometry of the plane, the sphere or the hyperboloid. To assign a more specific meaning, a mathematician can impose an appropriate postulate.
This is why mathematicians say that when an android proves a theorem, nothing happens. Androids work in the domain of signs and symbols. Mathematicians work with mathematical objects. This doesn't mean that automatic theorem proving is worthless, just that, as the sage said, man is the measure of all things.
Borrowing yet another page from the mathematicians and moving this discussion into a new domain, LLMs deal with signs and symbols. Authors deal with real worlds and real people. When an LLM writes a novel, nothing happens. It is left to the reader to assign meaning. When a human writes a novel, there is a real person trying to convey meaning in some real world. A reader may take a message from an LLM generated novel, but the sender had nothing to say.
A lot of this flows from work on the foundations of mathematics early in the 20th century. It turns out that one cannot pin down meaning with just signs and symbols. Since mathematicians were involved, it is dryly stated. It came as a surprise, and I wonder how much of it leaked into literary theory. (That and so many authors trying to excuse themselves for their earlier adulation of Stalin or Hitler.)
P.S. How dry is mathematical language? My favorite quote: "Not for us rose-fingers, the riches of the Homeric language. Mathematical formulae are the children of poverty." And, even then, ambiguity is at its heart.
_Pace_ Weatherby, an LLM based on transformers _is_ an example of a generative grammar in Chomsky's sense. It's just that since it's a (higher-order) Markov chain, it sits at the lowest level of the Chomsky hierarchy, that of "regular languages". (*) We knew --- Chomsky certainly knew! --- that regular languages can approximate higher-order ones, but it's still a bit mind-blowing to see it demonstrated, and to see it demonstrated with only tractable amounts of training data.
That said, Chomsky's whole approach to generative grammar was a bet on it being scientifically productive to study syntax in isolation from meaning! This is part of the argument between him and Lakoff (who wanted to insist on semantics)! (There was a reason "colorless green ideas sleep furiously" was so deliberately nonsensical.) This isn't the same as the structuralist approach, but if you want that kind of alienating detachment from ordinary concerns and thinking of language as a cozy thing affirming ordinary humanity, Uncle Noam can provide it just as well as Uncle Roman. Gellner has an essay on Chomsky from the late '60s or early '70s which brings this out very well --- I'll see if I can dig it up.
*: Strictly speaking, finite-order Markov chains form only a subset of regular languages, with even less expressive power. To use a go-to example, the "even process", where you have even-length blocks of 1s, separated by blocks of 0s that can be of any length, is regular, because it can be generated using just two (hidden) states, but not finite-order Markov. A transformer cannot generate this language perfectly, though obviously you can get better and better approximations by using longer and longer context windows.