This function was introduced with protected memory. Since we don't
support that, right now it does nothing that `vkGetDeviceQueue()` did
not already do. Despite that, I've added a method to `MVKDevice`,
because this is an extensible function analogous to e.g.
`vkGetPhysicalDeviceFeatures2()`.
The functions are now defined under their core names. To avoid code
bloat, I've defined the suffixed names as aliases of the core names.
Both symbols will be globally defined with the same value, and in the
dylib both will be exported.
Fix the default API version when none is given. Zero is the same as
`VK_API_VERSION_1_0`. Prior to this, we were overwriting it with zero if
no app info were given, or if it were zero in the app info. It wasn't
important before, but now that we gate API availability on maximum
Vulkan version, we need to make sure it's a valid version.
Also a non-functional base for future extensions. We can't implement it
anyway until all remaining bugs in `MTLEvent`-based semaphores are
fixed.
This is the last of the extensions that was promoted to core for Vulkan
1.1. We're almost there!
Like with `VK_KHR_device_group` and `VK_KHR_external_memory`, this just
adds the groundwork needed to support future extensions; it provides no
actual support for external fences.
We should be able to easily support `VK_KHR_external_fence_fd`, by using
a POSIX semaphore. Since the fence FDs produced by that extension are
opaque, only supporting `close(2)` and `dup(2)`, we shouldn't have to
worry about portable programs poking the FD in weird ways. Hopefully.
Other types of external fences we might support include GCD semaphores
(`dispatch_semaphore_t`) and Mach semaphores (`semaphore_t`). I really
think we want support for GCD semaphores, because that's the most likely
object we're going to see passed between processes on Darwin given GCD's
built-in support for XPC.
I have deliberately omitted mention of these extensions from the user
guide. `VK_KHR_external_memory` was not mentioned in there, presumably
because no actual external memory types are actually supported.
Also, add missing `vkGetInstanceProcAddr()` entry for
`vkGetPhysicalDeviceExternalBufferPropertiesKHR()`. We have the
function, and we export the extension's name string. We might as well
make it available via `vkGetInstanceProcAddr()`.
Originally, Metal did not support this directly, and still largely
doesn't on GPUs other than Apple family 6. Therefore, this
implementation uses vertex instancing to draw the needed views. To
support the Vulkan requirement that only the layers for the enabled
views are loaded and stored in a multiview render pass, this
implementation uses multiple Metal render passes for multiple "clumps"
of enabled views.
For indirect draws, as with tessellation, we must adjust the draw
parameters at execution time to account for the extra views, so we need
to use deferred store actions here. Without them, tracking the state
becomes too involved.
If the implementation doesn't support either layered rendering or
deferred store actions, multiview render passes are instead unrolled and
rendered one view at a time. This will enable us to support the
extension even on older devices and OSes, but at the cost of additional
command buffer memory and (possibly) worse performance.
Eventually, we should consider using vertex amplification to accelerate
this, particularly since indirect multiview draws are terrible and
currently require a compute pass to adjust the instance count. Also,
instanced drawing in itself is terrible due to its subpar performance.
But, since vertex amplification on family 6 only supports two views,
when `VK_KHR_multiview` mandates a minimum of 6, we'll still need to use
instancing to support more than two views.
I have tested this extensively against the CTS. I'm very confident in
its correctness. The only failing tests are
`dEQP-VK.multiview.queries.*`, due to our inadequate implementation of
timestamp queries; and `dEQP-VK.multiview.depth.*`, due to what I assume
is a bug in the way Metal handles arrayed packed depth/stencil textures,
and which may only be a problem on Mojave. I need to test this on
Catalina and Big Sur.
Update SPIRV-Cross to pull in some fixes necessary for this to work.
Fixes#347.
Delegate to MVKImage's getMTLTexture() method to use the overriden one
for swapchain images.
Also, the _mtlTextureViews member should be cleared to prevent accidental
reuse of released views.
Nothing will be drawn in that case. Nothing would've been drawn anyway,
but Metal's validation layer complains if you issue a draw command with
zero vertices or instances.
We unfortunately cannot do anything about indirect draws, since we won't
know how many vertices to draw until execute time.
Prior to this, we were assuming one layer per clear rect. When that
assumption were broken, we wound up smashing the stack. This will be
more important once multiview support lands, since all views must be
cleared by the clear command.
The attachment index in the `VkClearAttachment` struct is an index into
the current subpass attachment array, not the render pass attachment
array; so it's not enough to check `VkClearAttachment` for
`VK_ATTACHMENT_UNUSED`. We also need to check the current subpass.
If no attachments can be cleared, don't encode a command to the Metal
command buffer.
Perhaps I should bring back `recordBeginRenderPass()` and
`recordEndRenderPass()` and keep track of the current subpass; then we
can avoid recording the command entirely if the intersection of the sets
"attachments to clear" and "used attachments" is null.
The current code does not handle multiple subpasses, nor does it handle
secondary command buffers. Handling subpasses was easy enough. The
problem came with secondary command buffers. Tracking them became
extremely complicated, particularly since pipelines may be set either
inside or outside a render pass, and further, a pipeline set in one
buffer might be used in another.
I then realized a simpler and more elegant solution: Metal's deferred
store actions feature. This allows you to defer setting the store action
for a render pass until encoding time. This is exactly what we need,
since we won't know what store action we actually want until we start
encoding draws. This solution should now work with multiple subpasses
and secondary command buffers, with much less code.
It's a hard validation error to do so. We originally had a check for
this, but it was erroneously completely removed in #988, instead of
being limited to `renderTargetArrayLength`.
Fixes#991.
This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also eliminates the extra invocations previously needed to read the
varyings from the vertex shader into the tessellation shader. The number
of threads per workgroup is now lcm(SIMD-size, output control points).
This should ensure we always process a whole number of patches per
workgroup, and further reduce underutilization of the GPU's SIMD units.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader, nor do we always have to duplicate
the index buffer to insert gaps. This also fixes a long-standing issue
where if an index were greater than the number of vertices to draw, the
vertex shader would wind up writing outside the buffer, and the vertex
would be lost.
Since on macOS textures cannot be resident in shared (host-coherent) memory,
they need to be flushed before making the copy, to ensure that the modified
data is transferred.
fetchDependencies support option to skip all library builds.
fetchDependencies avoid sync locks if not building in parallel.
fetchDependencies build glslang headers.
Update ExternalRevisions/README.md glslang build integration section.
Update What's New.