Shows how position index p is mapped to a positional encoding vector PE(p) and injected into token representations (via add/concat) so parallel processing can still use order; contrasts with relative schemes that use offsets (p_i − p_j) as attention biases in an i×j attention grid.
Three-panel loop (absolute → relative → integration) over ~3.6s. Left column renders token positions and vector bars for E(token), PE(p), and their combination; right column renders an attention matrix where cell intensity depends on |i−j| and a highlighted (i,j) shows Δ=p_i−p_j. All geometry is grid-snapped for a blocky aesthetic; animation is time-based using ease() and cycling indices.