Discussion about this post

User's avatar
Gerben Wierda's avatar

Definitely not noise, this post... (sorry, could not resist) and worthy of spreading this around.

I still have to read the paper (definitely will, but thank you for this reference).

So, food for thought, certainly. My mind is all over the place after reading this. Things like:

(1) the tools give us improved access to what the 'orthodoxy' of the training material is, which can be incredibly useful, but what if the orthodoxy ('common knowledge') is wrong (which it historically often has been)? Part of intelligence is to be able to escape that orthodoxy when that is needed. Will these 'influential' systems make innovation/change harder (just as all IT makes 'change' harder by locking us into a 'data-jail')?

(2) what happens when you crank up GenAI 'temperature'? Will the extra randomness maybe provide openings into that 'escape from orthodoxy' or will it simply make the quality less? Given that innovation comes from new ideas that needs to pass the hurdle of *understanding*, I do suspect the latter as 'understanding' is nowhere in the picture. But...

(3) These systems provide 'information'. But human convictions are more created by repetition and closeness-of-source and 'information' is generally a relatively weak influencer. E.g. if these systems will produce 'common knowledge' (i.e. what the scientists say) about climate change, will they be able to influence the skeptics?

Many thoughts. Thank you.

Expand full comment
Gerben Wierda's avatar

After reading the paper, I do find the title quite misleading, click-bait almost. It is interesting and a good paper, but there are caveats. For instance, their use of a discrete very-mini-world (chess) has consequences for more generalised conclusions (i.e. don't draw them). Or, their reward function makes use of another neural engine (the Stockfish chess system) to value positions. Or that fact that denoising works best when your 'temperature' is low (duh — high 'temperature' more or less adds 'noise'...).

The use of discrete environments like chess has a long history in AI research. The reason has been not that we're good at it, but mostly because we tend to find it hard and the fact that we're *bad* at something has made it our litmus test for assessing 'intelligence'. So, we find 'driving a car' pretty easy and chess hard, but it is behaviour like 'driving a car' where human intelligence actually performs best and AI has a hard time. Remember the hype when a previous technology beat Kasparov in chess, or won Jeopardy? Those are things we consider 'hard' so the belief that AGI was around the corner was everywhere.

On your question "Do the Zhang et al. results help explain why LLMs seem to be good at improving the tacit knowledge of not-so-strong employees, but less effective for really good ones?": the use of an LLM by humans is a totally different setting than the one in this paper, I would not try to connect these.

And regarding LLM use by humans, what about the reverse? We have for instance seen in OpenAI's safety research on bioweapons that experts gained more from GPT4 than beginners (who were mostly led astray because they did not recognise obvious wrong approximations by the model) who even performed worse than without GPT4 added to internet search. That low-level management consultants improved more than experts (McKinsey, if I recall correctly) may above all say something about the nature of management consultancy. Just like passing the Turing test by an AI might say more about the (quick and dirty estimation) workings of human intelligence than the performance of the AI.

Expand full comment
14 more comments...

No posts