16 Comments
User's avatar
Gerben Wierda's avatar

Definitely not noise, this post... (sorry, could not resist) and worthy of spreading this around.

I still have to read the paper (definitely will, but thank you for this reference).

So, food for thought, certainly. My mind is all over the place after reading this. Things like:

(1) the tools give us improved access to what the 'orthodoxy' of the training material is, which can be incredibly useful, but what if the orthodoxy ('common knowledge') is wrong (which it historically often has been)? Part of intelligence is to be able to escape that orthodoxy when that is needed. Will these 'influential' systems make innovation/change harder (just as all IT makes 'change' harder by locking us into a 'data-jail')?

(2) what happens when you crank up GenAI 'temperature'? Will the extra randomness maybe provide openings into that 'escape from orthodoxy' or will it simply make the quality less? Given that innovation comes from new ideas that needs to pass the hurdle of *understanding*, I do suspect the latter as 'understanding' is nowhere in the picture. But...

(3) These systems provide 'information'. But human convictions are more created by repetition and closeness-of-source and 'information' is generally a relatively weak influencer. E.g. if these systems will produce 'common knowledge' (i.e. what the scientists say) about climate change, will they be able to influence the skeptics?

Many thoughts. Thank you.

Expand full comment
Henry Farrell's avatar

A general fyi which I am putting on all comments - someone seems to be going through the substack comments for this newsletter and sending messages pretending to be "programmable mutter" and suggesting contact be made on Telegram. This is presumably a phishing attack, so ignore all such messages!

Expand full comment
Gerben Wierda's avatar

After reading the paper, I do find the title quite misleading, click-bait almost. It is interesting and a good paper, but there are caveats. For instance, their use of a discrete very-mini-world (chess) has consequences for more generalised conclusions (i.e. don't draw them). Or, their reward function makes use of another neural engine (the Stockfish chess system) to value positions. Or that fact that denoising works best when your 'temperature' is low (duh — high 'temperature' more or less adds 'noise'...).

The use of discrete environments like chess has a long history in AI research. The reason has been not that we're good at it, but mostly because we tend to find it hard and the fact that we're *bad* at something has made it our litmus test for assessing 'intelligence'. So, we find 'driving a car' pretty easy and chess hard, but it is behaviour like 'driving a car' where human intelligence actually performs best and AI has a hard time. Remember the hype when a previous technology beat Kasparov in chess, or won Jeopardy? Those are things we consider 'hard' so the belief that AGI was around the corner was everywhere.

On your question "Do the Zhang et al. results help explain why LLMs seem to be good at improving the tacit knowledge of not-so-strong employees, but less effective for really good ones?": the use of an LLM by humans is a totally different setting than the one in this paper, I would not try to connect these.

And regarding LLM use by humans, what about the reverse? We have for instance seen in OpenAI's safety research on bioweapons that experts gained more from GPT4 than beginners (who were mostly led astray because they did not recognise obvious wrong approximations by the model) who even performed worse than without GPT4 added to internet search. That low-level management consultants improved more than experts (McKinsey, if I recall correctly) may above all say something about the nature of management consultancy. Just like passing the Turing test by an AI might say more about the (quick and dirty estimation) workings of human intelligence than the performance of the AI.

Expand full comment
Henry Farrell's avatar

I'm guessing that the title's disparity with the content was the product of some co-author disagreement about how far the implications went? But just guessing. I also am a little more sanguine about the chess domain than you are, in part because they are fairly clear about what they are doing, and not making grand claims that human level intelligence is about to be a solved problem etc. But I would also love a more ambitious Scott Page-ian research agenda here, if only to see if there is any there there beyond noise reduction.

Expand full comment
Kaleberg's avatar

This gibes with my initial impression of these systems as being a lot like linear regression. They take a pile of data, summarize it and then let you interpolate or extrapolate. Obviously, they are a lot more sophisticated than linear regression, but the basic idea is the same. This means they have the same limits as linear regression. Often interpolation and definitely extrapolation will have blind spots. Outlier data points or poorly structured data may cause poor results. The quality of the input data determines the quality of the output. I'm sure someone has written an 800 page tome on linear regression, its triumphs and its discontents, so its nice to see people starting to think about LLMs the same way.

One thing to consider is that LLMs are not going to beat chess masters if they are trained on a million chess games played by rank novices. A system trained on a million chess games played by seriously good players would be a different creature entirely. This fits with why AI helps poor or mediocre employees a lot more than excellent ones. It also explains why it would be wise not to train the system by recording the actions of the lowest ranked, least effective employees. The system might help excellent employees now and then, but, as with linear regression, interpolation works much better in real world problems than extrapolation.

I don't think noise is so much the right metaphor. Linear regression does give worse answers if the data is noisy, but a lot of its problems flow from its sheer simplicity. Linear regression assumes things are linear. When they are not, it produces bad answers. Large language models assume that they can use linguistic structure to produce a useful model. When the model is not well described linguistically, they perform poorly. Imagine that Noam Chomsky was right about languages having a deep structure. There is no guarantee that even a very sophisticated structure can capture every aspect of the real world. As Godel demonstrated, talk is one thing, reality is another.

Expand full comment
Henry Farrell's avatar

A general fyi which I am putting on all comments - someone seems to be going through the substack comments for this newsletter and sending messages pretending to be "programmable mutter" and suggesting contact be made on Telegram. This is presumably a phishing attack, so ignore all such messages!

Expand full comment
Jonathan Powers's avatar

"It presents the beginnings of an approach to thinking of these models, not as capable of becoming reasoners in their own right, through some magical process of emergence, but rather as extracting diverse forms of human knowledge and making it more useful. In other words, it suggests that these models are a new technique for making collectively held knowledge more accessible and more useful."

This raises the question of whether one important part of reasoning--and I'm thinking specifically (but perhaps too narrowly) of the kind of reasoning craftspeople use when practising and improving their skills--is simply this process rendering knowledge "more useful" through various means.

Expand full comment
Henry Farrell's avatar

There is an interesting parallel debate on tacit knowledge, which Brynjolffson and others are starting to talk about, based on Polanyi and Hayek, but my sense is that the research questions haven't fully come into focus.

Expand full comment
Henry Farrell's avatar

A general fyi which I am putting on all comments - someone seems to be going through the substack comments for this newsletter and sending messages pretending to be "programmable mutter" and suggesting contact be made on Telegram. This is presumably a phishing attack, so ignore all such messages!

Expand full comment
Jonathan Powers's avatar

A great summary of Zhang et al.'s paper, and some stimulating reflections on its significance. Gerben's points about humans being impressed by things that are hard for humans, even though many of things humans do that we find easy are biomechanically and/or intellectually astonishing when looked at from a greater distance, seem quite on point. Algebra and writing fluently are hard for most (all?) humans, so no surprise that we're impressed that a machine can do them. But we should be suspicious of our being impressed.

I see the heart of the summary here: "Rather than suggesting that the model is in some way itself intelligent, or on the way to being so, they treat it as a better means of extracting information from the collectivity of experts who trained it, which is more ‘intelligent’ in some sense than any individual within it. Thus, their approach is a group based technique that closely approximates majority voting among the individual diverse perspectives of the expert."

One possible way to interpret this way of articulating the insight contained in the paper is as a critique of the conventional notion of intelligence itself. On this reading, intelligence could be understood not as a personal trait, but rather as an emergent social phenomenon that is localized in individual humans. This would provide interesting support for extended cognition as well as shed some light on the difference between untrained and expert intuitions that I think (if I'm hearing correctly) Kalberg is talking about.

Expand full comment
Henry Farrell's avatar

yes - Scott Page has lots on this!

Expand full comment
Andleep Farooqui's avatar

High quality explanation and paper.

Expand full comment
Henry Farrell's avatar

A general fyi which I am putting on all comments - someone seems to be going through the substack comments for this newsletter and sending messages pretending to be "programmable mutter" and suggesting contact be made on Telegram. This is presumably a phishing attack, so ignore all such messages!

Expand full comment
Philip Koop's avatar

So, when I listened to Sean Carroll's podcast interview of Francois Chollett (who, NB, is not just a computer scientist but an AI researcher who works with LLMs), I liked it so much that I said from now on, I'm just going to point people to this interview when LLMs come up. This is the first time!

The interview is long, but that is because it is comprehensive, nuanced, and accurate. If you hate podcasts, you can read the transcript, which has some errors but is still pretty good. I won't traduce Chollett by attempting to summarize him here, except to repeat one bon mot: LLMs aren't the road to AGI, they are an off-ramp on the road to AGI.

Expand full comment
Henry Farrell's avatar

Belatedly - Sean's conversation with Chollet is a really very good interview, which I am hoping to use lots in class etc as a 'here is what these technologies can do and what they can't.'

Expand full comment
Henry Farrell's avatar

A general fyi which I am putting on all comments - someone seems to be going through the substack comments for this newsletter and sending messages pretending to be "programmable mutter" and suggesting contact be made on Telegram. This is presumably a phishing attack, so ignore all such messages!

Expand full comment