Use parallelism more effectively to massively cover buffer with multiple full threadgroups, instead of serial looping in shader. Performance improvement measured at 150x (yes...x not %) on macOS. MVKCmdFillBuffer move validation test to setContent() instead of encode().