Scale Factors in TMEM (nvfp4)

M = 128, one MMA-K block (SF_K = 4)  ·  warpx4 broadcast R[4 : 32@TLane]

Logical SFA  (M × SF_K)

click a cell = SFA[m, sfk]  ·  color = m // 32 group

TMEM  (128 lanes × 16 bytes)

w0: m 0–31 · w1: m 32–63 · w2: m 64–95 · w3: m 96–127  ·  byte = sfk
Layout  SFA[m, sfk] → TMEM
TLane = m mod 32  ·  word = m div 32  ·  byte = sfk  ·  TCol = word·4 + byte  ·  warpx4 → lanes { TLane, +32, +64, +96 }
click a cell on the left…