Attention? Attention!

January 13, 2026

Shiba Inu wrapped in a striped blanket, wearing a black scarf and white sweater in a snowy landscape.

Attention is, to some extent, motivated by how we pay visual attention to different regions of an image or correlate words in one sentence. Take the picture of a Shiba Inu in Fig. 1 as an example.

Human visual attention allows us to focus on a certain region with “high resolution” (i.e. look at the pointy ear in the yellow box) while perceiving the surrounding image in “low resolution” (i.e. now how about the snowy background and the outfit?), and then adjust the focal point or do the inference accordingly. Given a small patch of an image, pixels in the rest provide clues what should be displayed there. We expect to see a pointy ear in the yellow box because we have seen a dog’s nose, another pointy ear on the right, and Shiba’s mystery eyes (stuff in the red boxes). However, the sweater and blanket at the bottom would not be as helpful as those doggy features.

Source

I spent the last few weeks trying to get a deeper understanding of the technologies and theories that underlie modern machine learning. One tremendous source I highly recommend is “Why Machines Learn“, a fantastic book that is about the mathematics of machine learning but don’t be put off by that – a lot of it you could simply skip the mathematics and understand the broad ideas and get a lot of benefit from. I couldn’t recommend it highly enough.

I also spent some time going through a reading list from Elicit, a company that I admire, that they give new hires to get them up to speed with the broad landscape of modern machine learning.

One of the key concepts of large language models is attention. Perhaps the key to making them work is this concept of attention. While this article is from several years ago now, it does a great job of giving you a sense of what attention is and how it works.