29 Comments
User's avatar
Kaleberg's avatar

I think you'd do better to listen to the mathematicians. They admit they work with signs and symbols, but they know that those signs and symbols have meaning in some real world. The classic example is Euclid's geometry. If you ditch the Fifth Postulate, there's no way to tell if the theorems refer to the geometry of the plane, the sphere or the hyperboloid. To assign a more specific meaning, a mathematician can impose an appropriate postulate.

This is why mathematicians say that when an android proves a theorem, nothing happens. Androids work in the domain of signs and symbols. Mathematicians work with mathematical objects. This doesn't mean that automatic theorem proving is worthless, just that, as the sage said, man is the measure of all things.

Borrowing yet another page from the mathematicians and moving this discussion into a new domain, LLMs deal with signs and symbols. Authors deal with real worlds and real people. When an LLM writes a novel, nothing happens. It is left to the reader to assign meaning. When a human writes a novel, there is a real person trying to convey meaning in some real world. A reader may take a message from an LLM generated novel, but the sender had nothing to say.

A lot of this flows from work on the foundations of mathematics early in the 20th century. It turns out that one cannot pin down meaning with just signs and symbols. Since mathematicians were involved, it is dryly stated. It came as a surprise, and I wonder how much of it leaked into literary theory. (That and so many authors trying to excuse themselves for their earlier adulation of Stalin or Hitler.)

P.S. How dry is mathematical language? My favorite quote: "Not for us rose-fingers, the riches of the Homeric language. Mathematical formulae are the children of poverty." And, even then, ambiguity is at its heart.

Expand full comment
Alex Tolley's avatar

Which is why the mathematician Keith Devlin addressed a SETI audience with the concept that mathematics cannot be assumed to be understood by ET. Aliens may not understand human interpretation of symbols, making even the simplest mathematics potentially problematic as a basic language for communication.

The idea that generative language needs humans to do anything with the text leads me to think that someone needs to write a short Sci-Fi story where the machines have eradicated humans, but can do nothing more than send each other messages in human language, which are meaningless to themselves. The computers and robots endlessly send and receive these messages, like Searle's Chinese Room thought experiment, unable to understand the messages and how to do anything but respond with another message. If an intelligent biological species intercepted the messages, would they think there is an intelligent species on planet Earth, or would they realize that the messages did nothing, like the waves creating noise in the surf?

Expand full comment
Kaleberg's avatar

Odds are Earth mathematics can be understood by aliens because it is about signs and symbols. Again and again, mathematicians discover that things that seem to be different are actually the same, and the way they do it is using signs and symbols. For example, the proof of Fermat's last theorem was based on modular forms being the same thing as elliptical equations. Alien mathematicians may have an entirely different ontology, but they will find that Earth mathematicians work with the same things under different names.

(There's a noted 105 page proof that was inspired by a late friend appearing as a ghost in a mathematician's dream. It's 50 or so pages showing that one mathematical structure is the same as another. Then there's a page or so of transformation of that other structure. Then 50 more pages proving that the result of that transformation maps back into the original structure and that its properties hold. I mean, who does this?)

As for the empty post-human communications, I can imagine an ironic twist or two. The future researchers discover that the text is meaningless, all just statistically structured verbiage, but that they learn a great deal about the culture from the supportive protocol messages that preserved institutional and technological semantics. Imagine reconstructing an entire race and civilization from its DNS or ICMP packets. It ends with a final touch of sentiment, "They were robots, just like us."

Expand full comment
Alex Tolley's avatar

Yes, I have read about the connection and mapping between math disciplines. Articles in Quanta magazine has one every so often. I'm not a mathematician, so it is more of a "that's interesting to know" for me.

I like your sci-fi extension. It is almost entirely opposite the Arthur C Clarke short story about Venusians trying to understand Earth's human culture from a fragment of a film strip (that dates the story!) from a Disney Mickey Mouse cartoon, and of course, getting it all wrong as a result.

Expand full comment
Kaleberg's avatar

I remember that story.

Expand full comment
Alex Tolley's avatar

History Lesson (1949)

Expand full comment
Cosma's avatar

_Pace_ Weatherby, an LLM based on transformers _is_ an example of a generative grammar in Chomsky's sense. It's just that since it's a (higher-order) Markov chain, it sits at the lowest level of the Chomsky hierarchy, that of "regular languages". (*) We knew --- Chomsky certainly knew! --- that regular languages can approximate higher-order ones, but it's still a bit mind-blowing to see it demonstrated, and to see it demonstrated with only tractable amounts of training data.

That said, Chomsky's whole approach to generative grammar was a bet on it being scientifically productive to study syntax in isolation from meaning! This is part of the argument between him and Lakoff (who wanted to insist on semantics)! (There was a reason "colorless green ideas sleep furiously" was so deliberately nonsensical.) This isn't the same as the structuralist approach, but if you want that kind of alienating detachment from ordinary concerns and thinking of language as a cozy thing affirming ordinary humanity, Uncle Noam can provide it just as well as Uncle Roman. Gellner has an essay on Chomsky from the late '60s or early '70s which brings this out very well --- I'll see if I can dig it up.

*: Strictly speaking, finite-order Markov chains form only a subset of regular languages, with even less expressive power. To use a go-to example, the "even process", where you have even-length blocks of 1s, separated by blocks of 0s that can be of any length, is regular, because it can be generated using just two (hidden) states, but not finite-order Markov. A transformer cannot generate this language perfectly, though obviously you can get better and better approximations by using longer and longer context windows.

Expand full comment
Leif Weatherby's avatar

i'm after the syntax/meaning axis as you lay it out here, Cosma, i agree with what you're saying about the generative grammar point - there's some stuff about that "isolation" in chapter 5, where i hopefully don't butcher Uncle Noam. curious about the Markov thing though, bc if you're right, then it does seem that Chomsky predicted specifically that such a system would *never* achieve fluency...

Expand full comment
Cosma's avatar

I have a copy of your book and look forward to reading it!

As for the Markov bit: The output distribution for a transformer is a function of the symbol sequence in its context window. It follows that the next symbol emitted is probabilistically independent of earlier ones, before what's in the context window, given the content of the context window. Since the context window is of fixed (if generous) length, this process is a Markov chain of finite order.

Turning to Chomsky exegesis, and opening up "Three Models for the Description of Language" [https://doi.org/10.1109/TIT.1956.1056813], on p. 115 he argues that there are clearly-grammatical English sentences which involve either reversing the order of a string of symbols, or indeed repeating an string of symbols exactly, and claims (correctly) that this cannot be done with a finite order Markov model. He goes on (still on p. 115) as follows: "We might avoid this consequence by an arbitrary decree that there is a finite upper limit to sentence length in English. This would serve no useful purpose, however. The point is that there are processes of sentence formation that this elementary model for language is intrinsically incapable of handling. If no finite limit is set for the operation of these processes, we can prove the literal inapplicability of this model. If the processes have a limit, then the construction of a finite-state grammar will not be literally impossible (since a list is a trivial finite-state grammar), but this grammar will be so complex as to be of little use or interest."

I would gloss this as: of "little use or interest" for understanding _human_ language use and its mechanisms.

Expand full comment
Leif Weatherby's avatar

i think this might wash out in the reading - that's a great quote, and it's similar in Syntactic Structures, of course. but the question is, what do we think "human" language is, versus what this Very Big Markov Chain produces? i don't think the difference has been specified either conceptually *or* quantitatively in any effective way.

separate question: do you think the attention mechanism complicates the smooth ride from regular-size to Very Big Markov Chain? or is it all just stats?

Expand full comment
Leif Weatherby's avatar

Then we do agree - but I want a theory expressible in language of how the stats and the language get along

Expand full comment
Alex Tolley's avatar

For apparently such a good and important book, I cannot find a book review of it on Google search. Even the Amazon info doesn't have the usual word bytes from reviewers. Why is that?

Farrell notes that parts of the book are dense, and that he doesn't have the requisite language to be certain he has understood the book correctly. I have no academic training in the humanities, but I can read scientific papers which are generally written for clarity. [IMO, no one has written more clearly about subjects than the late Isaac Asimov. No weasel words, no ambiguous sentences, because the author wasn't sure about his subject either.]

Farrell slips in Derrida's "il n'y pas de hors texte" without translation, and my French wasn't quite good enough to perfectly translate it. Thank you, Google Translate, for a better, but not perfect, translation. Popping the French phrase into Google search with the prompt, "what is the meaning of Derrida's: "il n'y pas de hors texte", the retrieved links are from the humanities, with what appeared to me dense word salads implying different contextual meanings, but of no help to me. Finally, I made the same request of Gemini 2.5 and got a pretty understandable explanation as below:

------------------------------------------------------

Prompt: Explain what Derrida meant by this famous phrase: "il n'y pas de hors texte"

Response:

Jacques Derrida's famous phrase, "il n'y a pas de hors texte," is often translated as "there is no outside-text" or "there is nothing outside the text." This statement is one of the most frequently misunderstood and controversial aspects of his philosophy of deconstruction.

To understand what Derrida meant, it's crucial to grasp his expanded concept of "text":

"Text" is not limited to written words on a page. For Derrida, "text" encompasses any system of signs, symbols, or meaningful units that constitute our understanding of reality. This includes not only books and written documents but also speech, events, social practices, institutions, cultures, and even the world itself as we perceive and interpret it. Everything that presents itself to us, and everything we make sense of, does so through a system of differences and relationships, much like how words derive their meaning from their relation to other words within a language.

Critique of a "Transcendental Signified" or external reality: The phrase is a challenge to the traditional philosophical idea that there is a pure, unmediated reality or a "transcendental signified" (a fixed, ultimate meaning or truth) that exists independently of language or interpretation. Derrida argues that we never have direct access to a reality "outside" of our interpretive frameworks, our languages, and our systems of meaning. Every attempt to grasp something "outside" the text inevitably brings it into a textual, interpretive relationship.

Meaning is always contextual and relational: Meaning is not inherent or fixed within a text, but is constantly being produced and deferred through a network of differences and references to other signs. There is no ultimate, stable meaning that can be extracted from a text (in the broad sense) because its meaning is always in flux, influenced by its context, the reader's interpretation, and its relationship to other texts.

Implications for interpretation: This doesn't mean that "anything goes" in interpretation or that reality doesn't exist. Instead, it means that our engagement with reality is always mediated by interpretive acts. When we try to understand an event, a historical moment, a cultural practice, or even a scientific observation, we are always doing so through a "textual" lens – through our concepts, language, cultural assumptions, and the ways we construct narratives about it. There's no neutral, objective viewpoint from which to observe things without the "textual" mediation.

In essence, "il n'y a pas de hors texte" means that our understanding of the world, and indeed reality itself as we know it, is always already "textual" or caught within a web of signs and interpretations. There's no pure, uninterpreted access to an "outside" of this system. It's a statement about the pervasive nature of mediation and interpretation in human experience.

------------------------------------------------------

As an experiment, I would like an LLM to be trained purely on a corpus of Cultural Theory and related works, and see if it could generate text that at least Cultural Theory experts could understand.

One question for me is whether this book is worth purchasing. If Farrell finds it dense, what could I make of it? I would like to see the working of the relations between Information Theory and language, but would it be written in a clear language suitable for a scientist? I suspect I would give up on reading the whole book. I have to wonder who Wetherby has written the book for - his fellow Cultural Theory academics, rather than a wider audience who have taken various views on AIs?

Lastly, I want to turn the question of LLM-generated text and intelligence on its head. The assumption is that humans are intelligent, and our utterings are based on that intelligence. But is that connection relevant? Yes, we have intelligence, that should be clear. Even a child shows intelligent behaviors. But does that imply that our speech and other communications are really reflective of that intelligence? So many of our comments seem to be nothing more than Kahneman's System 1 (fast) thinking. Purely reactive, retrieval, from our own language graph in our brains. Listen to the nonsense that some "leaders" speak, and one has to wonder whether their brains are even operative. [The phrase "..doesn't have 2 brain cells to rub together" comes to mind]. Gary Marcus recently posted a lovely letter by Douglas Hofstadter replying to an AI-is-showing-intelligence enthusiast: https://garymarcus.substack.com/p/are-llms-starting-to-become-a-sentient

I find myself not only in that camp, but increasingly in the "Do humans display intelligence in many of their daily communications, or are these communications mostly unconscious responses without thought?"

Expand full comment
Michael Fuchs's avatar

Unfortunately, LLMs fail a good deal of the time with what are known as “hallucinations,” fabrications presented as facts, as well as violations of common sense that no human would utter. This is precisely because they contain no grounding model of the world—no mechanism for building one up through experience, no place to store one, no way to tie it to data in or out.

So, to me it is clear that LLMs prove the opposite of your argument. Signs that signify nothing are not language one can rely on to mean anything.

Expand full comment
Paul Houle's avatar

Yeah but ask me a question about SuperBowl XX and I might get it mixed up with a different SuperBowl -- my memory for specifics is not perfect and I can't trust everything that I think I recall unless I do my own retrieval augmented generation. The difference between me and Microsoft Copilot is I can look up things in the literature and come back with good citations most of the time but Copilot can't, even when it gives the right answer.

Expand full comment
Cosma's avatar

I want to push back on this. Here's what I got out of ChatGPT when I asked it to generate a biography for me, back in early 2023: (*)

"Cosma Shalizi is a prominent statistician, sociologist, and professor known for his contributions to the field of network science and complex systems. He was born on December 3, 1971, in Boston, Massachusetts, and grew up in Cleveland, Ohio.

"Shalizi received his undergraduate degree in mathematics and philosophy from Yale University in 1993. He then earned a Ph.D. in physics from the University of Wisconsin-Madison in 1998, where he focused on the statistical mechanics of complex systems. He went on to do postdoctoral research in statistical physics at the École Normale Supérieure in Paris and at the Santa Fe Institute in New Mexico.

"In 2002, Shalizi joined the faculty of the Department of Statistics at Carnegie Mellon University, where he is now a full professor. He is also a faculty member in the Machine Learning Department and the Center for the Neural Basis of Cognition at Carnegie Mellon."

This is full of "hallucinations": I'm not a sociologist, that's not my birthday or even the year I was born, I've never even visited Cleveland, I studied physics as an undergrad at Berkeley and have visited Yale exactly once, I finished my Ph.D. in 2001, I've never even visited Paris, that's not when I came to CMU, and I'm still not a full professor. I'm also pretty sure there were, then, no online sources with these particular errors it might have been copying. (Maybe there are now...) It's all confabulated. But the _text_ is plainly grammatical and even meaningful --- I couldn't be so sure it was wrong it if wasn't!

*: I also did this for a bunch of other people whose biographies I know well, with similar results, but I don't feel so comfortable revealing those results.

Expand full comment
eg's avatar

Your example is why I treat the output of LLM’s as I would the garrulous drunk at the end of the bar — entertaining, perhaps, but clearly unreliable.

Expand full comment
Jacob Blain Christen's avatar

This post and its comments are a trove of interesting thought.

Expand full comment
Armand Beede's avatar

Marc Friedman: There is something very moving about two great men -- Roman Jacobson and Vassily Leontieff -- with parallel thinking in two diverse fields:

"He colorfully illustrated this technoscientific fraternity when he entered a Harvard lecture hall one day to discover that the Russian economist Vassily Leontieff, who had just finished using the room, had left his celebrated account of economic input and output functions on the blackboard. As Jakobson’s students moved to erase the board he declared, “Stop, I will lecture with this scheme.” As he explained, “The problems of output and input in linguistics and economics are exactly the same.”

As a 77-year-old dinosaur, who grew up thinking the ENIAC was a miracle, I try to fathom:

"Similarly, large language models suggest that structural theory captures something important about the relationship between language and intelligence. They demonstrate how language can be generative, without any intentionality or intelligence on the part of the machine that produces them. Weatherby suggests that these models capture the “poetics” of language; not simply summarizing the innate structures of language, but allowing new cultural products to be generated."

And the limitations in artificial intelligence are quite engaging, here:

"This cashes out as a theory of large language models that are (a) genuinely culturally generative, and (b) incapable of becoming purposively intelligent, any more than the language systems that they imperfectly model are capable of becoming intelligent. Under this account, the “Eliza effect” - the tendency of humans to mistake machine outputs for the outputs of human intelligence - is not entirely in error."

The science fiction I grew up with has in the fullness of time, Kairos, become Man in the Incarnation of AI.

Expand full comment
Quentin Hardy's avatar

In the early 80s, the triumph of theory over original literary art seemed like lightning bugs jealous of lightning. Today you could say the same about the claims that LLMs possessing intelligence.

Expand full comment
Alex's avatar

I must say I found the book somewhat frustrating. I guess part of the issue is that there are a lot of ideas or authors that once has to know beforehand in order to assess weather Weatherby's reading are correct. He quotes phrases here and there but many times it is the explanation is too short for someone with no prior or limited knowledge. Likewise, I find that even if I am in agreement or sympathetic with Weatherby's claims, it is not clear to me that he demonstrates his claims in the book. At best he makes a prima facie case, but nothing exhaustive. And one thing that annoyed me is that he makes the case how the humanities, following the postmoderns, run away from technology and the insistence in remainder humanism. But the only thing he shows is that Derrida argued something that might have lead to that, but there is a leap of causality from Derrida's arguments to how the humanities focused on remainder humanism. One needs to accept Weatherby's claim or not, there is nothing to evaluate the claim except some arguments from Derrida.

Expand full comment
Matthew Lungerhausen's avatar

Damnit Farrell! I think I have come by my hatred for AGI and put my trauma of studying Post-Structuralism in the context of 1990s Social Sciences and Historiography behind me. Now you are saying they are related and there is a there there. Worse yet this sounds like something i should read. Sigh… I’ll ask the library to order the book so I don’t have to spend my own money on it…

Expand full comment
Resident Alien's avatar

Thank you. Languages can be described mathematically, LLMs are constrained by human knowledge, and don't have agency. Besides adding this book to my wishlist, I now have a greater desire to find my favorite and tiniest book from college on ethics and logic that's been popping in and out of my head the past eleven years or so.

Expand full comment
Derek Neal's avatar

Very good review. I’ve read the book as well and plan on writing up something soon.

Expand full comment
Gerben Wierda's avatar

Hmm. It begs the question when something is language. One might also argue that exactly the same piece of text can both be language and not language. It is language, when it is based on shared experience and thus it can be a bridge between persons to convey meaning (loosely following Wittgenstein, here). When an LLM produces the same text, it is not language because the meaning is missing, even if it is the same text, as the meaning requires shared experience.

That turns these systems into systems that are not producing *language*, but that are producing text that *approximates* language. And because their text doesn't come with meaning, the approximation can be perfect from the perspective of the LLM — technically a text that 'hallucinates' (I dislike that term) — is good from a text perspective (which is what the LLM produces) but it fails at the (approximating) *language* part.

That also fits with something that was easier to show when the LLM-craze started, before LLMs were embedded in large amounts of 'engineering the hell out of it', about two years ago. LLMs have a 'temperature' that influences randomness of next token selection out of a bag of calculated 'likely next tokens', and that randomness in the selection (which also is responsible for creativity) will — when cranked up high enough — not only fail to approximate the *meaning* aspect of language, at some point it even starts to fail at the grammar aspect. See https://youtu.be/9Q3R8G_W0Wc at 16:50 in for the temperature effect and at 30:38 in for the 'failed approximation' (and why). It turns out, approximating grammar is easier than approximating meaning (and the systems have been tuned not to accept temperatures of which the engineers know that grammar approximation breaks down, because then — Eliza effect — the believability goes way.

So, we might at some point have to accept — thanks to LLMs — that 'text' and 'language' are no longer the same thing. Just like earlier in the history of our languages, meanings did change, often because our experiences changed.

Secondly, the mention of Shannon's Information Theory always tickles me a bit. Shannon's definition of information was devoid of meaning. It was a technical, engineering, definition that was about how much bits of discrete yes/no can fit in a signal. But that equates an open telephone line to a conversation. Shannon should never be mentioned in discussions like this this. 'Bewitchment by language' strikes again, as Shannon was using the same word — information — in a different meaning (a different experience) than information defined as 'a *meaningful* ordering of data'.

Not all data is information. (And not all information is knowledge. And not all knowledge is wisdom).

Expand full comment
Paul Houle's avatar

LLMs do not fit neatly into the Chomsky hierarchy, they are something a bit different from Markov Chains and the kind of grammars you can parse with stack machines.

I worked on a few attempts to create advanced language models for things like clinical and aircraft maintenance notes in the mid 2010's that didn't quite work using recurrent and convolutional networks and there were two big ideas in transformer models that let them do what we couldn't do:

(1) Byte-pair encoding. Word-level tokenization clearly didn't work because of the Hapax legomenon phenomenon (often the most important word in a document isn't in the dictionary) and the fact that we can often understand what a word means (say "sinosphere") on the basis of morphological rules. It doesn't seem a good use of a sequence models to model text a character on the other hand. Subword tokenization seems to capture a lot more semantics while still having a lot of ability to exploit morphological regularity even if they don't explicitly represent it as explicitly in the sense "talk-ed", "sino-sphere"

(2) Attention. We had trouble making a system that could write clinical notes with GRU and LSTM models because they couldn't remember the name of the patient. Experts on RNNs would say "of course we could add something that would let them refer to a previous bit of text but that wouldn't help it understand anything" which was completely wrong. Fundamentally LLMs see the graph (as opposed to tree) structure that a part of a document here is related to part of a document there and that's a key reason they succeeded when everything else failed.

---

I love to see this book! For years structuralism has been discredited: how can you say "X is structured as a language" when Chomksy's theory explains language enough to be a paradigm in which linguists can do "normal science" but fails almost completely at the engineering level to build "language machines"? In the shadow of Chomsky's "poverty of the stimulus?" Where we came to think the "language instinct" is a peripheral attached to an animal which has a theory of mind and ability to reason, act and experience without language? Structuralists were left like the dog who caught the car: with a model for the structure of the language which could, outside linguistics as a community, be used to create strange things like "The Structure of Magic" but couldn't be put to work as an exact model for problems outside mathematics and artificial languages.

Then all of a sudden you can feed a machine a "pile" of text and it has a remarkable linguistic competence and all of a sudden "language is structured like a language!"

Expand full comment
Nicolas D Villarreal's avatar

Chomsky's problem was thinking the structures of grammar were constitutive of language and not just a particular operation, the signifier of which only appears in sentence and the signified, the actual operation, only appears occasionally. The actual operations which make up language are varied, syntax only a small portion of a larger sample space.

Structuralism understood that whatever the operations were, they would all be defined by their correlations and anti-correlations vs each other.

Expand full comment
Dominic Fox's avatar

Some recent writing along similar lines: https://codepoetics.substack.com/p/derrida-badiou-baudrillard-three

Expand full comment
Bob Tour's avatar

Too long. Everything is too long. Intro, body conclusion. Writing to communicate an understanding involves decisions..one human to another.

Expand full comment