Karpathy's talk is an equal parts technically authentic and approachable state of the union for LLMs, with some juicy nuggets even for those in the know.

https://www.youtube.com/watch?v=bZQun8Y4L2A

Two really important but little-discussed observations:

"LLMs need tokens to think"

Possibly the single most impactful and general piece of advice for prompt engineering.

Vanilla RLHF and SFT improve inference quality, but reduce output entropy/diversity.

This observation isn't actually limited to LLMs!

One observes identical behavior with fine-tuned image generation models, in a pretty visually striking manner. Each row, starting at the top, corresponds to an increasingly heavily fine-tuned image generation model on preference data:

Untitled

This is particularly worth paying attention to in applications that generate multiple options.

As another example of the recsys/generation duality, the recsys community has been aware of this behavior for years. Diversity is a core consideration for most well-tuned ranking models today.