Needed to complete fix for #1874
We can't wait until getMTLComputeEncoder() is called to dirty the state,
because this call will be avoided by dirty checks themselves.
Those checks are comparing against leftover and now incorrect state since
the previous encoder has already ended.
It needs to be dirtied on encoder end.
- Remove Xcode 11 build from GitHub CI.
- Leave MVK_XCODE_12 guards in place to allow devs to possibly continue to
attempt to build existing MoltenVK code using Xcode 11, even though it's
not officially supported. Such devs may have to add their own additional
MVK_XCODE_12 guards for any Xcode 12 API features added after this change.
- Remove visionOS from multi-platform builds because it
requires Xcode 15+ and will abort a multi-platform build.
- Define TARGET_OS_XR for older SDK's.
- A number of SDK deprecation warnings remain when building for visionOS.
These cannot be removed without significant refactoring.
- Build visionOS dependencies for Release build by default.
- Fix local variable initialization warning (unrelated).
This provides feedback that indicates:
* how long it took to compile each shader stage and the pipeline as a
whole;
* whether or not the pipeline or any shader stage were found in any
supplied pipeline cache; and
* whether or not any supplied base pipeline were used to accelerate
pipeline creation.
This is similar to the performance statistics that MoltenVK already
collects.
Since we don't use any supplied base pipeline at all, this
implementation never sets
`VK_PIPELINE_CREATION_FEEDBACK_BASE_PIPELINE_ACCELERATION_BIT`. However,
I've identified several places where we could probably use the base
pipeline to accelerate pipeline creation. One day, I should probably
implement that.
Likewise, because we don't yet support using `MTLBinaryArchive`s,
`VK_PIPELINE_CREATION_FEEDBACK_APPLICATION_PIPELINE_CACHE_HIT_BIT` is
never set on the whole pipeline, though it *is* set for individual
stages, on the assumption that any shader found in a cache is likely to
be found in Metal's own implicit cache.
In this implementation, shader stage compilation time includes any time
needed to build the `MTLComputePipelineState`s needed for vertex and
tessellation control shaders in tessellated pipelines.
This patch also changes compilation of the vertex stage
`MTLComputePipelineState`s in tessellated pipelines to be eager instead
of lazy. We really ought to have been doing this anyway, in order to
report pipeline failures at creation time instead of draw time. I'm not
happy, though, that we now pay the cost of all three pipeline states all
the time, instead of just the ones that are used.
This also gets rid of some fields of `MVKGraphicsPipeline` that were
only used during pipeline construction, which should save some memory,
particularly for apps that create lots of pipelines.
This extension allows apps to provide a hint to the presentation engine
indicating which parts of the surface need updating. To provide this
hint, we call `-[CALayer setNeedsDisplayInRect:]`, which indicates that
only the given rectangle needs updating.
I'm not sure if this will have any effect, especially if
`CAMetalLayer.presentsWithTransaction` is `NO`. Luckily for us, this is
only a hint, and it is permissible for the presentation engine to do
nothing with the hint.
The tests don't work because they apparently can't handle
`VK_SUBOPTIMAL_KHR` being returned.
In this commit, I've added support for Xcode 15, and added a case for MSL version 3.1. I added this because I noticed xcode was throwing some warnings about an unhandled switch case.
- Fix unreachable code in MVKDeferredOperation::join().
- Refactor code so deferred functions call back to MVKDeferredOperation
instance to update current status, and deferred function execution
returns result of individual thread execution.
- MVKDeferredOperation use MVKSmallVector for _functionParameters.
- MVKDeferredOperation use a single mutex lock.
- Add additional comments explaining design to developers of
future extensions that use deferred operations.
To reduce complexity and repetitive copy-pasted spaghetti code,
the design approach here was to implement triangle fan conversion on
MVKCmdDrawIndexedIndirect, as the most general of the draw commands,
and then populate and invoke a synthetic MVKCmdDrawIndexedIndirect
command from the other draw commands.
- Rename pipeline factory shader cmdDrawIndexedIndirectMultiviewConvertBuffers()
to cmdDrawIndexedIndirectConvertBuffers, and in addition to original support
for modifying indirect content to support multiview, add support for
converting triangle fan indirect content and indexes to triangle list.
- Modify MVKCmdDrawIndexedIndirect to track need to convert triangle fans
to triangle list, and invoke kernel function when needed.
- Modify MVKCmdDraw, MVKCmdDrawIndexed, and MVKCmdDrawIndirect to populate
and invoke a synthetic MVKCmdDrawIndexedIndirect command to convert triangle
fans to triangle lists.
- Add pipeline factory shader cmdDrawIndirectPopulateIndexes() to convert
non-indexed indirect content to indexed indirect content.
- MVKCmdDrawIndexedIndirect add support for zero divisor vertex buffers
potentially coming from MVKCmdDraw and MVKCmdDrawIndexed.
- Rename pipeline factory shader cmdDrawIndexedIndirectConvertBuffers()
to cmdDrawIndexedIndirectTessConvertBuffers() so it will be invoked from
MVKCommandEncodingPool::getCmdDrawIndirectTessConvertBuffersMTLComputePipelineState()
(unrelated).
This just provides support for the `SPV_KHR_non_semantic_info`
extension, which supports extended instruction sets that do not affect
the semantics of a SPIR-V shader (e.g. debug info). SPIRV-Cross already
handles these instruction sets, so no additional work is required on our
part to support this extension.
This extension has a direct Metal equivalent in the
`-[MTLDevice sampleTimestamps:gpuTimestamp:]` method. However, that
method returns CPU timestamps in the Mach absolute time domain, which is
*not* that of `CLOCK_MONOTONIC_RAW` but of `CLOCK_UPTIME_RAW`. The
function that corresponds to `CLOCK_MONOTONIC_RAW` is
`mach_continuous_time()`. Therefore, this implementation uses the
`mach_continuous_time()` function for the CPU timestamp. Perhaps we
should lobby the WG for `VK_TIME_DOMAIN_CLOCK_UPTIME_RAW_EXT`.
This turned out to be a little bit more involved than I had hoped. But,
with this, we can now use the `VK_FORMAT_A4R4G4B4_UNORM_PACK16` and
`VK_FORMAT_A4B4G4R4_UNORM_PACK16` formats from shaders, use them as blit
sources, and even clear them. Storage images and render targets of these
formats aren't supported, however. To support the latter would require
the insertion of a swizzle into the fragment shader before returning.
The former cannot be reasonably supported.
This commit changes the MVKSmallVector to a C-Style array because the parameters array will not have to resize, and it seems that a C-Style array is just as capable of doing the job.
The changes are as follows:
* Moved the code around to fit with the ordering system
* Added a function to get available cpu cores
* Renamed variables with _ in front of them
* Added mutexes and lock guards for the getters and setters of the max concurrency and result variables
* Made max concurrency dynamic by returning 0 when the operation is finished
Implemented deferred host operations in this commit. It was pretty simple with nothing Metal specific. I am a bit concerned on the return types of MVKDeferredOperation::join and about how to store the operation to be executed for later.
- When forcing the window system to use the same high-power GPU as the app,
move the call to MTLCreateSystemDefaultDevice() to MVKDevice constructor,
instead of MVKDevice::createSwapchain(), and test whether the
VK_KHR_swapchain extension is enabled to determine the need to swap GPUs.
- After calling MTLCreateSystemDefaultDevice() the GPU will already be
the same high-power GPU, so remove attempting to replace the MTLDevice.
- Remove MVKPhysicalDevice::replaceMTLDevice() as no longer used.
- Remove many unnecessary inline declarations in MVKDevice.h (unrelated).
As of macOS Big Sur and iOS/tvOS 14, the `discard_fragment()` function
in MSL is defined to have demote semantics; that is, fragment shader
output is discarded, but the fragment shader thread continues to run as
a helper invocation. This is very useful for Direct3D emulation, since
this is the semantic that HLSL `discard` has.
Signed-off-by: Chip Davis <chip@holochip.com>