The current code does not handle multiple subpasses, nor does it handle
secondary command buffers. Handling subpasses was easy enough. The
problem came with secondary command buffers. Tracking them became
extremely complicated, particularly since pipelines may be set either
inside or outside a render pass, and further, a pipeline set in one
buffer might be used in another.
I then realized a simpler and more elegant solution: Metal's deferred
store actions feature. This allows you to defer setting the store action
for a render pass until encoding time. This is exactly what we need,
since we won't know what store action we actually want until we start
encoding draws. This solution should now work with multiple subpasses
and secondary command buffers, with much less code.
It's a hard validation error to do so. We originally had a check for
this, but it was erroneously completely removed in #988, instead of
being limited to `renderTargetArrayLength`.
Fixes#991.
This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also eliminates the extra invocations previously needed to read the
varyings from the vertex shader into the tessellation shader. The number
of threads per workgroup is now lcm(SIMD-size, output control points).
This should ensure we always process a whole number of patches per
workgroup, and further reduce underutilization of the GPU's SIMD units.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader, nor do we always have to duplicate
the index buffer to insert gaps. This also fixes a long-standing issue
where if an index were greater than the number of vertices to draw, the
vertex shader would wind up writing outside the buffer, and the vertex
would be lost.
Since on macOS textures cannot be resident in shared (host-coherent) memory,
they need to be flushed before making the copy, to ensure that the modified
data is transferred.
fetchDependencies support option to skip all library builds.
fetchDependencies avoid sync locks if not building in parallel.
fetchDependencies build glslang headers.
Update ExternalRevisions/README.md glslang build integration section.
Update What's New.
SPIRV-Cross can now AND the `gl_SampleMask` output with an additional
fixed mask, presumably from the pipeline. Use this new functionality to
implement pipeline sample mask handling.
Special thanks to Tomek Pontika and Corentin Wallez of Google for
graciously contributing their implementation to SPIRV-Cross.
Update SPIRV-Cross to pull in the change necessary for this.
This extension provides weaker guarantees than `VK_EXT_robustness2` and
its `robustImageAccess2` feature. Metal easily meets those guarantees,
with no action on our part necessary.
MVKShaderLibrary::getMTLFunction() synchronize and refactor release of Metal objects.
Make use of existing autorelease pool instead of discrete retain/release.
Wrap entire specialization operation in @synchronized() to guard against
Metal internals not coping with multiple simultaneous specializations.