The current code does not handle multiple subpasses, nor does it handle
secondary command buffers. Handling subpasses was easy enough. The
problem came with secondary command buffers. Tracking them became
extremely complicated, particularly since pipelines may be set either
inside or outside a render pass, and further, a pipeline set in one
buffer might be used in another.
I then realized a simpler and more elegant solution: Metal's deferred
store actions feature. This allows you to defer setting the store action
for a render pass until encoding time. This is exactly what we need,
since we won't know what store action we actually want until we start
encoding draws. This solution should now work with multiple subpasses
and secondary command buffers, with much less code.
It's a hard validation error to do so. We originally had a check for
this, but it was erroneously completely removed in #988, instead of
being limited to `renderTargetArrayLength`.
Fixes#991.
This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also eliminates the extra invocations previously needed to read the
varyings from the vertex shader into the tessellation shader. The number
of threads per workgroup is now lcm(SIMD-size, output control points).
This should ensure we always process a whole number of patches per
workgroup, and further reduce underutilization of the GPU's SIMD units.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader, nor do we always have to duplicate
the index buffer to insert gaps. This also fixes a long-standing issue
where if an index were greater than the number of vertices to draw, the
vertex shader would wind up writing outside the buffer, and the vertex
would be lost.
Since on macOS textures cannot be resident in shared (host-coherent) memory,
they need to be flushed before making the copy, to ensure that the modified
data is transferred.
SPIRV-Cross can now AND the `gl_SampleMask` output with an additional
fixed mask, presumably from the pipeline. Use this new functionality to
implement pipeline sample mask handling.
Special thanks to Tomek Pontika and Corentin Wallez of Google for
graciously contributing their implementation to SPIRV-Cross.
Update SPIRV-Cross to pull in the change necessary for this.
This extension provides weaker guarantees than `VK_EXT_robustness2` and
its `robustImageAccess2` feature. Metal easily meets those guarantees,
with no action on our part necessary.
MVKShaderLibrary::getMTLFunction() synchronize and refactor release of Metal objects.
Make use of existing autorelease pool instead of discrete retain/release.
Wrap entire specialization operation in @synchronized() to guard against
Metal internals not coping with multiple simultaneous specializations.
Some things I missed in my review.
These formats use implicit reconstruction, and we don't support explicit
reconstruction here yet. If there's a demand for it, I or someone else
can implement it, but I don't expect it to be an issue.
Because it uses implicit reconstruction, chroma sampling is determined
by the implementation, i.e. Metal. My testing shows that Metal on AMD
and Intel uses midpoint reconstruction. Based on this, I think NV will
be the same.
Finally, according to the Vulkan spec:
> Implicit reconstruction takes place by the samples being interpolated,
> as required by the filter settings of the sampler, *except that
> `chromaFilter` takes precedence for the chroma samples*. [emphasis
> added]
Since Metal obviously uses the same filters for chroma and luma, we
can't support the separate reconstruction filter feature here.