The Transformer Attention Head

See What the AI is Looking At: Visualising the Core Mechanism of LLMs.

Self-Attention Visualizer

Target Word: Hover over a word.

This simulation shows the **Attention Score** (focus) when the model processes the target word.

High Attention | Low Attention

Query, Key, and Value (The Core Concepts)

Every word in a sentence is processed through three mathematical projections: the Query (Q), the Key (K), and the Value (V). The Query is the question asked by the current word. The Keys are the labels of all other words.

The model compares the Query against all Keys to determine relevance. The results are then used to weight the Values, effectively creating a focused summary of the entire sentence for the target word.

The Attention Score (The Focus)

The heart of the mechanism is the Attention Score. This score is a single number that measures how relevant each surrounding word is to the current target word. The score determines the intensity of the visualisation.

In this tool, when you hover over a target word, a beam of light projects to every other word. The intensity and width of the beam visually represent the calculated Attention Score—a strong beam indicates high relevance, showing the model is heavily focusing on that context word.

Multi-Head Attention (The Diverse View)

A full Transformer doesn’t use just one Attention Head; it uses many (Multi-Head Attention). Each head learns to look for a different type of relationship.

For example, one head might look for grammatical relationships (verbs to subjects), while another might look for semantic relationships (nouns to related adjectives). This parallel processing allows the model to build a rich, multi-faceted understanding of the input.