Even if AI makes art, it may be bad for culture
Ted Chiang and the problem of "generating variety in the arts"
[Commercial announcement: My and Abraham Newman’s book, Underground Empire: How America Weaponized the World Economy is still available for $2.99 on Amazon Kindle. We now return you to your scheduled programming].
Ted Chiang wrote an article for the New Yorker a few days ago suggesting that commercial AI isn’t plausibly going to, and perhaps can’t, produce art. It’s gotten plenty of reactions, some enthusiastic, some furious. For a sample of the latter:
I don’t want to concern myself with a text that forbids artists to work in a specific way. The same text has been written by the same people when electric guitars, mobile phone cameras, video games, books, and any other new medium came along. There will always be a gatekeeper.
Others retorted that we should compare the technology to sampling and synthesizers, harrumphing about “takes maybe [motivated by] the usual predictable mix of dislike of industry, skepticism of a different way of doing things, a vested interest in the status quo, and thinking of humans as forever inherently unique and special.”
But these critics, for all their spleen, broadly agree with much of Ted’s argument! They deplore the “ugly” synthetic images and “frustratingly boring” vanilla outputs of Large Models (Large Language Models or LLMs, diffusion models and similar).
I disagree with Ted on some important questions (I think that blurry JPEGs can be useful), but his most important objections to Large Models don’t really depend on his claims about what is or isn’t art (which in any event are more complicated than the title of his article suggests). Even those who think that algorithms can produce art may worry that the widespread commercialization of LLMs, diffusion models and their cousins may be bad for human culture.
Here are the core elements of Ted’s argument as I understand them.
Art is “something that results from making a lot of choices.” A ten thousand word story involves around ten thousand choices - likely as many choices go into a painting. So too for photography, which seems mechanical at first.
Giving a hundred word prompt to a generative model does not involve a lot of choices. You are expecting the algorithm to make most of the choices for you.
Perhaps you can use off the shelf generative models to make art, but they’re really not designed to work that way. They are made by profit seeking entities, which know they won’t make money if they force people to make lots of difficult decisions. People want to just be able to press a button and produce art!
Large Models produce uninteresting prose and pictures, because they rely on an average of the choices made by people, which inevitably converges towards the “least interesting choices possible,” or else simply mimic particular styles.
Effort is required to produce value: “any writing that deserves your attention as a reader is the result of effort expended by the person who wrote it.”
The significance of human communications (whether they be letters of admiration or great works of art) lies in the desire to communicate subjective feeling. Since Large Models don’t have such feelings, they can’t communicate them.
People who rely on Large Models won’t develop the skills they need to think critically.
The use of Large Models to generate text and images for completely mundane requirements will create a self-perpetuating ecosystem that will grow ever larger.
Most generally: “The task that generative A.I. has been most successful at is lowering our expectations, both of the things we read and of ourselves when we write anything for others to read. It is a fundamentally dehumanizing technology because it treats us as less than what we are: creators and apprehenders of meaning. It reduces the amount of intention in the world.”
This is a profoundly humanistic vision, and it reminds me of a beautiful passage from Gene Wolfe’s Citadel of the Autarch.* The passage comes after a story telling competition in a military lazaret, where a prisoner of war from Ascia (a grim military oligarchy in which people are only allowed to communicate in officially sanctioned cliches - imagine a fantastic version of 1960s China where people could only converse in dictums taken from Mao’s Little Red Book) has just told his tale.
it often seems to me that of all the good things in the world, the only ones humanity can claim for itself are stories and music; the rest, mercy, beauty, sleep, clean water and hot food (as the Ascian would have said) are all the work of the Increate. Thus, stories are small things indeed in the scheme of the universe, but it is hard not to love best what is our own—hard for me, at least. From this story, though it was the shortest and the most simple too of all those I have recorded in this book, I feel that I learned several things of some importance. First of all, how much of our speech, which we think freshly minted in our own mouths, consists of set locutions. The Ascian seemed to speak only in sentences he had learned by rote, though until he used each for the first time we had never heard them. Foila seemed to speak as women commonly do, and if I had been asked whether she employed such tags, I would have said that she did not—but how often one might have predicted the ends of her sentences from their beginnings.
Wolfe wrote four decades before Large Language Models were created - but these sentences encapsulate much of Ted’s dilemma. We value stories (and art) so much because they seem our own, and because they are specifically human forms of expression. Yet human speech and stories also consist of “set locutions,” embedded in patterns that have their own structure and logic, regardless of our individual intentions. Claude Shannon’s scheme for mapping out how one can mathematically predict the ends of sentences from their beginnings is the father and the mother of those models that are “reducing the amount of intention in the world.”
The implication that I take from Wolfe’s passage is that humanistic fears about LLMs implicate our attachment to the human, but they do not necessarily hang on the question of what is art and what is not. Instead, they center on the subtly different question of what is creative and what is predictable in human culture. Even the Ascians - who are compelled to use the most grim and limited kind of Newspeak imaginable, can wring story and ambiguity out of their impoverished language. As Wolfe describes it:
I learned once again what a many-sided thing is the telling of any tale. None, surely, could be plainer than the Ascian's, yet what did it mean? Was it intended to praise the Group of Seventeen? The mere terror of their name had routed the evildoers. Was it intended to condemn them? They had heard the complaints of the just man, and yet they had done nothing for him beyond giving him their verbal support. There had been no indication they would ever do more.
The Ascian’s story consists of a twenty-seven rote phrases strung together in a particular order. From that order, a story emerges. Yet we don’t want to be restricted to such phrases - the Ascian’s culture is undoubtedly a much thinner and poorer one than that of Wolfe’s narrator, or ours.
That opens up a different (but largely complementary) way of thinking about the cultural risks of actually-existing LLMs to Ted’s. The outputs of LLMs and other Large Models are, on the whole, blander and less interesting than human created art. As Alison Gopnik argues, they are very strong on imitation, but not on innovation. Even if you think that AI, or much simpler algorithms for that matter, can be used to generate art, you can still worry that the currently popular versions are going to make culture duller and more disconnected.
Another extraordinary artist, Brian Eno, provides a very useful vocabulary for thinking about the “duller" part of this. Obviously, Eno is no enemy to algorithmic art or to the artistic use of automatic or semi-automatic systems - he is one of the people who has done most to develop such approaches. But Eno’s core concern, as this lovely early essay suggests, is the relationship between arts and variety.
Eno’s primary goal in this essay is to distinguish a particular “experimental” approach to composing and performing music from more classical approaches. Classical music tends toward a “rigidly ranked, skill-oriented structure moving sequentially through an environment assumed to be passive.” Experimental music is different. Its focus has been on its
own capacity to produce and control variety, and to assimilate 'natural variety'- the 'interference value' of the environment. Experimental music, unlike classical (or avant-garde) music, does not typically offer instructions toward highly specific results, and hence does not normally specify wholly repeatable configurations of sound. It is this lack of interest in the precise nature of the piece that has led to the (l think) misleading description of this kind of music as indeterminate. I hope to show that an experimental composition aims to set in motion a system or organism that will generate unique (that is, not necessarily repeatable) outputs, but that, at the same time, seeks to limit the range of these outputs.
But Eno goes on to provide some much broader parallels. Experimental music resembles:
the type of organization that typifies certain organic systems and whose most important characteristics hinge on this fact: that changing environments require adaptive organisms. Now, the relationship between an organism and its environment is a sophisticated and complex one, … Suffice it to say, however, that an adaptive organism is one that contains built-in mechanisms [or monitoring (and adjusting) its own behaviour in relation to the alterations in its surroundings. This type of organism must be capable of operating from a different type of instruction, as the real coordinates of the surroundings are either too complex to specify, or are changing so unpredictably that no particular strategy (or specific plan for a particular future) is useful.
Here, Eno leans hard on Stafford Beer. And I don’t think that it is going too far to say that his approach suggests a set of ideals not just for experimental music, but for art and culture more generally, as a form of social organization centered around fruitful directed experimentation. Cultural forms, from the little to the large, should not be chaotic, but they shouldn’t be too rigid either. They ought be adaptive to their environment. What we ought be most broadly concerned with, to quote the title of Eno’s essay, is “generating and organizing variety in the arts.”
The variety of a system is the total range of its outputs, its total range of behaviour. All organic systems are probabilistic: they exhibit variety, and an organism's flexibility (its adaptability) is a function of the amount of variety that it can generate. … But, just as it is evident that an organism will (by its material nature) and must (for its survival) generate variety, it is also true that this variety must not be unlimited. … what is important is not only that you get it right but also that you get it slightly wrong, and that the deviations or mutations that are useful can be encouraged and reinforced.
So the question then becomes: do actually existing Large Models make it more or less likely that human culture will generate that useful variety? The provisional answer - as Ted’s critics seem to agree - is “no, they do not.” Their outputs are “vanilla.” Many of the current criticisms of LLMs emphasize their tendency to “hallucinate” or make errors. From an Eno-esque perspective, the problem may be that they don’t make the right kinds of errors. Their imperfections tend to be repetitive, rather than to point towards interesting new possibilities. This is not to say that they cannot be used to create such possibilities, but that their central tendency is towards the easy creation of conformity rather than the generation of variety. And as Ted argues, there is little reason to expect that that will change, given commercial incentives.
Not only that, but as Ted and other writers and artists have argued, they may make us lazier, and less apt to generate interesting new varieties ourselves. We learn by doing; initially doing very badly, and then gradually improving as we correct our bad errors, and build on our fruitful misprisions. That is much less likely to happen if we rely on Large Models, which can lift us to a moderately competent level of performance, but have difficulty going further.
As the recent Zhang et al. paper on “Transcendence” argues, when Large Models are trained on the moves made by moderately skilled chess players, they can actually play chess better than the players who they have been trained on. But this is enabled by noise reduction rather than the generation of unexpected moves - at very low “temperatures,” the Large Model can avoid the idiosyncratic errors that individual mediocre players make, and converge on the moves that a majority of them would recognize as good. This can be really valuable in many ways, and it is not quite the same thing as Ted’s “least interesting choices possible.” The paper’s authors compare it to majority voting. But - to the extent that the lesson travels - it is not a good model for a dynamically adaptive culture. If you want to explore the adjacent possible, it is sometimes better to have a wide variety of mistakes, since a small minority of them may lead in unexpectedly valuable directions. For dynamic evolution, you don’t want a relentless convergence on the majority view, but the generation of useful imperfections. You can turn up the temperature on these models, increasing the noise, but this paper at least suggests that the noise does not introduce useful errors of the kind that Eno describes. Perhaps it works better in other contexts: who knows? And certainly, there are some interesting experiments in using AI in more creative ways - but this is still, as per Ted’s argument, a minority pursuit.
There’s more that I want to say about the relationship between effort, meaning and communication of subjective feeling, but I’m going to wait for a bit before saying it. Marion Fourcade and I have a relevant popular article coming out Sometime Real Soon.
Until then: it is a mistake to focus all the criticisms on Ted’s (qualified) claim that actually existing AI is unlikely to create art, or to condemn his humanism, still less to suggest that these criticisms undermine his larger case. You can reach very similar conclusions from a quite different set of initial premises: there is a lot of common ground between Ted Chiang’s view of the problem and Brian Eno’s. Both suggest that the blandness and conformity of Large Models’ outputs is a problem for larger culture. These tools can still be useful (I myself use them plenty!) but there are real grounded reasons for general caution, whether you are a humanist like Ted, or whether you’re concerned for other reasons with maintaining variety and exploration in the arts.
Update: I meant to tie this recent Eno interview into the discussion and didn’t. Well worth reading.
* Ted has described Wolfe as a “titan” who has greatly influenced his work but I have no reason to believe that this extract has particularly influenced his thinking on this particular topic.
** I’m reading Simon Reynolds’ Futuromania right now, which is fantastic on how electronic musicians have repeatedly discovered how to use their equipment in ways that were not intended, and sometimes not wanted, by the equipment’s manufacturer. But also, as Reynolds mentions: Autotune.
You left implicit what you ought to have made explicit: the LLM is an anti-Ascian (I suppose that makes it a Scian :-) - and maybe that's not just a joke.) The LLM interpolates between platitudes to find new ways of saying nothing where the Ascian cunningly combines them to tell a novel story.
Also: behind our efforts, let there be found our efforts!
What AI has mostly done is to make it easy to achieve mediocrity (I'm using this literally, not as a pejorative) in activities where that previously required a fair bit of skill and training to reach this level. I can now write and debug simple Python programs with ease, produce illustrations (in the characteristic AI style) for my Substack posts, and so on. If I'm feeling lazy, I can get an AI to turn my dot points from a presentation into text with a passable imitation of my style. I can't get it to write a 700-word opinion piece that's more than a string of set locutions, as in the story you cite.