Scale Factors in TMEM (nvfp4)
M = 128, one MMA-K block (SF_K = 4) · warpx4 broadcast R[4 : 32@TLane]
Logical SFA (M × SF_K)
click a cell = SFA[m, sfk] · color = m // 32 group
TMEM (128 lanes × 16 bytes)
w0: m 0–31 · w1: m 32–63 · w2: m 64–95 · w3: m 96–127 · byte = sfk