Use parallelism more effectively to massively cover buffer with
multiple full threadgroups, instead of serial looping in shader.
Performance improvement measured at 150x (yes...x not %) on macOS.
MVKCmdFillBuffer move validation test to setContent() instead of encode().