Attention matrix showing how each token attends to other tokens. Query token row highlighted with attention weights.
5x5 attention matrix with token labels. Active query row highlighted. Cell intensity = attention weight.