- Update to latest SPIRV-Cross contining the fix.
- Modify CTS options in runcts script to avoid outputting full CTS log,
and use less file caching, all to reduce memory and filespace consumption,
and possibly improve performance (unrelated).
- Update MoltenVKShaderConverter tool to include Metal 3.1 support
and improved argument buffer settings (unrelated).
- Force Github CI to use Python 3.11, to avoid crash in
glslang::update_glslang_sources.py due to use of distutils,
removed in Python 3.12 (unrelated).
- Small unrelated non-functional edits.
- Add --keep-cache option to control whether or not to retain the
External/build/Intermediates directory (default not retained).
- Export KEEP_CACHE & SKIP_PACKAGING to be available within scripts
used by ExternalDependencies Xcode builds.
- Move BLD_SPECIFIED to build() instead of build_impl() to avoid
updating it from background thread (which will fail).
- Update MoltenVK version to 1.2.7 (unrelated).
- Add CompilerMSL::Options::replace_recursive_inputs to pipeline cache (unrelated).
- Update GitHub CI to Xcode 15.0.
- Update Whats_new.md document.
- Move check and warning to MVKRenderingCommandEncoderState.
- Pass primitiveRestartEnable to MVKRenderingCommandEncoderState.
- Warn only if primitiveRestartEnable disabled and strip topology is used.
- On Apple GPUs, set timestampPeriod to 1.0.
- On non-Apple GPUs, calculate timestampPeriod each time it is retrieved.
- On older devices that do not support GPU timestamps, use nanosecond
CPU timestamps to be consistent with timestampPeriod of 1.0.
- Change MVKConfiguration::timestampPeriodLowPassAlpha and environment
variable MVK_CONFIG_TIMESTAMP_PERIOD_LOWPASS_ALPHA to 1.0, to use
only latest value by default.
- Add build-time verification that MVKConfigMembers.def
includes all members of MVKConfiguration (unrelated).
- MVKPipeline only work around zero stride if stride is static.
- Ensure dynamic vertex stride is not enabled on builds before Xcode 15.
- Add MVKRenderStateType::LineWidth for track all default options (unrelated).
- Fix runtime failure on Metal versions that don't support dynamic attribute stride.
- Add MVKCommandEncoder::encodeVertexAttributeBuffer() consolidation function.
- Remove unnecessary validations that will be caught by Vulkan validation layers.
- To reduce memory, remove command class and pools for rendering commands that
are not supported, and perform no validation.
- Document extension conformance limitations in MoltenVK_Runtime_UserGuide.md.
- Add MVKPipelineCommandEncoderState subclasses
MVKGraphicsPipelineCommandEncoderState & MVKComputePipelineCommandEncoderState,
track patch control points in MVKGraphicsPipelineCommandEncoderState,
and add getGraphicsPipeline() & getComputePipeline() to simplify casting.
- Rename MVKRasterizingCommandEncoderState to MVKRenderingCommandEncoderState,
and MVKCommandEncoder::_rasterizingState to _renderingState.
- Rename MVKCmdRenderPass.h/mm to MVKCmdRendering.h/mm.
- Move MVKCmdExecuteCommands from MVKCmdRenderPass.h/mm to MVKCmdPipeline.h/mm.
- While working on vkCmdSetLogicOpEXT(), add support for
vkCmdSetLogicOpEnableEXT() from VK_EXT_extended_dynamic_state3.
- Add MVKRasterizingCommandEncoderState to consolidate handling
of static and dynamic rasterizing states in a consistent manner.
- Rework MVKDepthStencilCommandEncoderState to consolidate handling
of static and dynamic depth states in a consistent manner.
- MVKMTLDepthStencilDescriptorData clean up content setting, and struct layout.
- Add MVKRenderStateType to enumerate render state types.
- Add MVKRenderStateFlags to track binary info about states (enabled, dirty, etc).
- Add MVKMTLBufferBinding::stride.
- Add MVKPhysicalDeviceMetalFeatures::dynamicVertexStride.
- Set MVKPhysicalDeviceMetalFeatures::vertexStrideAlignment
to 1 for Apple5+ GPUs (unrelated).
- Set VkPhysicalDeviceLimits::maxVertexInputBindingStride
to unlimited for Apple2+ GPUs (unrelated).
- Add mvkVkRect2DFromMTLScissorRect() and simplify
mvkMTLViewportFromVkViewport() and mvkMTLScissorRectFromVkRect2D().
- MVKFoundation:
- Add mvkEnableAllFlags() and mvkDisableAllFlags().
- Improve performance of mvkClear(), mvkCopy() & mvkAreEqual()
when content is a single simple primitive type (unrelated).
- Declare more functions as static constexpr (unrelated).
- To force any incomplete CAMetalDrawable presentations to complete,
don't force the creation of another transient drawable, as this can
stall the creation of future drawables. Instead, when a swapchain
is destroyed, or replaced by a new swapchain, set the CAMetalLayer
drawableSize, which will force presentation completion.
- Add presentation completion handler in command buffer scheduling
callback, move marking available to presentation completion handler,
and minimize mutex locking.
- MVKQueue::waitIdle() remove wait for swapchain presentations,
and remove callbacks to MVKQueue from drawable completions.
- MVKQueue::submit() don't bypass submitting a misconfigured submission,
so that semaphores and fences will be signalled, and ensure misconfigured
submissions are well behaved.
- Add MVKSwapchain::getCAMetalLayer() to streamline layer access (unrelated).
- Add MVKConfiguration::timestampPeriodLowPassAlpha, along with matching
MVK_CONFIG_TIMESTAMP_PERIOD_LOWPASS_ALPHA env var.
- Add MVKConfigMembers.def file to describe MVKConfiguration members,
to support consistent batch handling of members.
- Add env var & build settings MVK_CONFIG_DEBUG, plus legacy
MVK_CONFIG_ALLOW_METAL_EVENTS & MVK_CONFIG_ALLOW_METAL_FENCES.
- Simplify environment variable retrieval functions and macros.
- Rename MVKDevice::updateTimestampsAndPeriod() to updateTimestampPeriod().
- vkCmdBlitImage() ensure swizzle texture view is retained for life
of MTLCommandBuffer.
- vkQueuePresentKHR() use MTLCommandBuffer that retains references.
- Update MoltenVK version to 1.2.6.
- Calling nextDrawable may result in a nil drawable, or a drawable with no
pixel format. Attempt several times to retrieve a drawable with a valid
pixel format, and if unsuccessful, return an error from vkQueuePresentKHR()
and vkAcquireNextImageKHR(), to force swapchain to be re-created.
- Reorganize MVKQueuePresentSurfaceSubmission::execute() to detect drawable
with invalid format, attach MTLCommandBuffer completion handler just before
commit, and delay enqueuing MTLCommandBuffer until commit.
- Refactor mvkOSVersionIsAtLeast() for clarity (unrelated).
- Fix failure building on Xcode 14.
- Track frame interval statistics, regardless of whether performance
tracking is enabled.
- Determine wait time for swapchain presentations from frame intervals.
- MVKSwapchain call markFrameInterval() from within mutex lock.
- MVKDevice rename addActivityPerformance() to addPerformanceInterval()
and addActivityByteCount() to addPerformanceByteCount().
- Add documentation about performance being measured in milliseconds.
In a recent Metal regression, Metal sometimes does not trigger the
[CAMetalDrawable addPresentedHandler:] callback on the final few (1-3)
CAMetalDrawable presentations, and retains internal memory associated
with these CAMetalDrawables. This does not occur for any CAMetalDrawable
presentations prior to those final few.
Most apps typically don't care much what happens after the last few
CAMetalDrawables are presented, and typically end shortly after that.
However, for some apps, such as Vulkan CTS WSI tests, which serially create
potentially hundreds, or thousands, of CAMetalLayers and MTLDevices,these
retained device memory allocations can pile up and cause the CTS WSI tests
to stall, block, or crash.
This issue has proven very difficult to debug, or replicate in incrementally
controlled environments. It appears consistently in some scenarios, and never
in other, almost identical scenarios.
For example, the MoltenVK Cube demo consistently runs without encountering
this issue, but CTS WSI test dEQP-VK.wsi.macos.swapchain.render.basic
consistently triggers the issue. Both apps run almost identical Vulkan
command paths, and identical swapchain image presentation paths, and
result in GPU captures that have identical swapchain image presentations.
We may ultimately have to wait for Apple to fix the core issue, but this
update includes workarounds that helps in some cases. During vkQueueWaitIdle()
and vkDeviceWaitIdle(), wait a short while for any in-flight swapchain image
presentations to finish, and attempt to force completion by calling
MVKPresentableSwapchainImage::forcePresentationCompletion(), which releases
the current CAMetalDrawable, and attempts to retrieve a new one, to trigger
the callback on the current CAMetalDrawable.
In exploring possible work-arounds for this issue, this update adds significant
structural improvements in the handling of swapchains, and quite a bit of new
performance and logging functionality that is useful for debugging purposes.
- Add several additional performance trackers, available via logging,
or the mvk_private_api.h API.
- Rename MVKPerformanceTracker members, and refactor performance result
collection, to support tracking and logging memory use, or other measurements,
in addition to just durations.
- Redefine MVKQueuePerformance to add tracking separate performance metrics for
MTLCommandBuffer retrieval, encoding, and execution, plus swapchain presentation.
- Add MVKDevicePerformance as part of MVKPerformanceStatistics to track device
information, including GPU device memory allocated, and update device memory
results whenever performance content is requested.
- Add MVKConfigActivityPerformanceLoggingStyle::
MVK_CONFIG_ACTIVITY_PERFORMANCE_LOGGING_STYLE_DEVICE_LIFETIME_ACCUMULATE
to accumulate performance and memory results across multiple serial
invocations of VkDevices, during the lifetime of the app process. This
is useful for accumulating performance results across multiple CTS tests.
- Log destruction of VkDevice, VkPhysicalDevice, and VkInstance, to bookend
the corresponding logs performed upon their creation.
- Include consumed GPU memory in log when VkPhysicalDevice is destroyed.
- Add mvkGetAvailableMTLDevicesArray() to support consistency when retrieving
MTLDevices available on the system.
- Add mvkVkCommandName() to generically map command use to a command name.
- MVKDevice:
- Support MTLPhysicalDevice.recommendedMaxWorkingSetSize on iOS & tvOS.
- Include available and consumed GPU memory in log of GPU device at
VkInstance creation time.
- MVKQueue:
- Add handleMTLCommandBufferError() to handle errors for all
MTLCommandBuffer executions.
- Track time to retrieve a MTLCommandBuffer.
- If MTLCommandBuffer could not be retrieved during queue submission,
report error, signal queue submission completion, and return
VK_ERROR_OUT_OF_POOL_MEMORY.
- waitIdle() simplify to use [MTLCommandBuffer waitUntilCompleted],
plus also wait for in-flight presentations to complete, and attempt
to force them to complete if they are stuck.
- MVKPresentableSwapchainImage:
- Don't track presenting MTLCommandBuffer.
- Add limit on number of attempts to retrieve a drawable, and report
VK_ERROR_OUT_OF_POOL_MEMORY if drawable cannot be retrieved.
- Return VkResult from acquireAndSignalWhenAvailable() to notify upstream
if MTLCommandBuffer could not be created.
- Track presentation time.
- Notify MVKQueue when presentation has completed.
- Add forcePresentationCompletion(), which releases the current
CAMetalDrawable, and attempts to retrieve a new one, to trigger the
callback on the current CAMetalDrawable. Called when a swapchain is
destroyed, or by queue if waiting for presentation to complete stalls,
- If destroyed while in flight, stop tracking swapchain and
don't notify when presentation completes.
- MVKSwapchain:
- Track active swapchain in MVKSurface to check oldSwapchain
- Track MVKSurface to access layer and detect lost surface.
- Don't track layer and layer observer, since MVKSurface handles these.
- On destruction, wait until all in-flight presentable images have returned.
- Remove empty and unused releaseUndisplayedSurfaces() function.
- MVKSurface:
- Consolidate constructors into initLayer() function.
- Update logic to test for valid layer and to set up layer observer.
- MVKSemaphoreImpl:
- Add getReservationCount()
- MVKBaseObject:
- Add reportResult() and reportWarning() functions to support logging
and reporting Vulkan results that are not actual errors.
- Rename MVKCommandUse::kMVKCommandUseEndCommandBuffer to
kMVKCommandUseBeginCommandBuffer, since that's where it is used.
- Update MVK_CONFIGURATION_API_VERSION and MVK_PRIVATE_API_VERSION to 38.
- Cube Demo support running a maximum number of frames.
In the rare case where vertex attribute buffers are bound to MVKCommandEncoder,
are not used by first pipeline, but are used by a subsequent pipeline, and no
other bindings are changed, the MVKResourcesCommandEncoderState will not appear
to be dirty to the second pipeline, and the buffer will not be bound to Metal.
When reverting a binding to dirty if it is not used by a pipeline, also revert
the enclosing MVKResourcesCommandEncoderState to dirty state.
Update MoltenVK to version 1.2.6 (unrelated).
- Guard against Intel returning zero values for CPU & GPU timestamps.
- Apply lowpass filter on timestampPeriod updates, to avoid wild temporary
changes, particularly at startup before GPU has been really exercised.
- In MoltenVK Xcode projects, set iOS & tvOS deployment targets to 12.0,
to avoid warnings while building MoltenVK.
- Add DYLD_LIBRARY_PATH to runcts script, to ensure Vulkan and MoltenVK
libraries are found during CTS runs.
- Update Whats_New.md and MoltenVK_Runtime_UserGuide.md documents.
Xcode simulator always requires 256B buffer alignment, even when running
on Apple Silicon. Previously, it was assumed that Apple Silicon would use
it's native 16B buffer alignment.
The [MTLDevice sampleTimestamps:gpuTimestamp:] function turns out to be
synchronized with other queue activities, and can block GPU execution
if it is called between MTLCommandBuffer submissions. On non-Apple-Silicon
devices, it was called before and after every vkQueueSubmit() submission,
to track the correlation between GPU and CPU timestamps, and was delaying
the start of GPU work on the next submission (on Apple Silicon, both
CPU & GPU timestamps are specified in nanoseconds, and the call was bypassed).
Move timestamp correlation from vkQueueSubmit() to
vkGetPhysicalDeviceProperties(), where it is used to update
VkPhysicalDeviceLimits::timestampPeriod on non-Apple-Silicon devices.
Delegate MVKPhysicalDevice::getProperties(VkPhysicalDeviceProperties2*)
to MVKPhysicalDevice::getProperties(VkPhysicalDeviceProperties*), plus
minimize wasted effort if pNext is empty (unrelated).
Move the declaration of several MVKPhysicalDevice member structs to
potentially reduce member spacing (unrelated).
When compiling tessellation vertex shaders, MVKGraphicsPipeline
pass array of MVKMTLFunction instead of MTLFunctions to retain
MTLFunctions for duration of processing.
Needed to complete fix for #1874
We can't wait until getMTLComputeEncoder() is called to dirty the state,
because this call will be avoided by dirty checks themselves.
Those checks are comparing against leftover and now incorrect state since
the previous encoder has already ended.
It needs to be dirtied on encoder end.
- Remove visionOS from multi-platform builds because it
requires Xcode 15+ and will abort a multi-platform build.
- Define TARGET_OS_XR for older SDK's.
- A number of SDK deprecation warnings remain when building for visionOS.
These cannot be removed without significant refactoring.
- Build visionOS dependencies for Release build by default.
- Fix local variable initialization warning (unrelated).
This provides feedback that indicates:
* how long it took to compile each shader stage and the pipeline as a
whole;
* whether or not the pipeline or any shader stage were found in any
supplied pipeline cache; and
* whether or not any supplied base pipeline were used to accelerate
pipeline creation.
This is similar to the performance statistics that MoltenVK already
collects.
Since we don't use any supplied base pipeline at all, this
implementation never sets
`VK_PIPELINE_CREATION_FEEDBACK_BASE_PIPELINE_ACCELERATION_BIT`. However,
I've identified several places where we could probably use the base
pipeline to accelerate pipeline creation. One day, I should probably
implement that.
Likewise, because we don't yet support using `MTLBinaryArchive`s,
`VK_PIPELINE_CREATION_FEEDBACK_APPLICATION_PIPELINE_CACHE_HIT_BIT` is
never set on the whole pipeline, though it *is* set for individual
stages, on the assumption that any shader found in a cache is likely to
be found in Metal's own implicit cache.
In this implementation, shader stage compilation time includes any time
needed to build the `MTLComputePipelineState`s needed for vertex and
tessellation control shaders in tessellated pipelines.
This patch also changes compilation of the vertex stage
`MTLComputePipelineState`s in tessellated pipelines to be eager instead
of lazy. We really ought to have been doing this anyway, in order to
report pipeline failures at creation time instead of draw time. I'm not
happy, though, that we now pay the cost of all three pipeline states all
the time, instead of just the ones that are used.
This also gets rid of some fields of `MVKGraphicsPipeline` that were
only used during pipeline construction, which should save some memory,
particularly for apps that create lots of pipelines.
This extension allows apps to provide a hint to the presentation engine
indicating which parts of the surface need updating. To provide this
hint, we call `-[CALayer setNeedsDisplayInRect:]`, which indicates that
only the given rectangle needs updating.
I'm not sure if this will have any effect, especially if
`CAMetalLayer.presentsWithTransaction` is `NO`. Luckily for us, this is
only a hint, and it is permissible for the presentation engine to do
nothing with the hint.
The tests don't work because they apparently can't handle
`VK_SUBOPTIMAL_KHR` being returned.
In this commit, I've added support for Xcode 15, and added a case for MSL version 3.1. I added this because I noticed xcode was throwing some warnings about an unhandled switch case.
- Fix unreachable code in MVKDeferredOperation::join().
- Refactor code so deferred functions call back to MVKDeferredOperation
instance to update current status, and deferred function execution
returns result of individual thread execution.
- MVKDeferredOperation use MVKSmallVector for _functionParameters.
- MVKDeferredOperation use a single mutex lock.
- Add additional comments explaining design to developers of
future extensions that use deferred operations.
To reduce complexity and repetitive copy-pasted spaghetti code,
the design approach here was to implement triangle fan conversion on
MVKCmdDrawIndexedIndirect, as the most general of the draw commands,
and then populate and invoke a synthetic MVKCmdDrawIndexedIndirect
command from the other draw commands.
- Rename pipeline factory shader cmdDrawIndexedIndirectMultiviewConvertBuffers()
to cmdDrawIndexedIndirectConvertBuffers, and in addition to original support
for modifying indirect content to support multiview, add support for
converting triangle fan indirect content and indexes to triangle list.
- Modify MVKCmdDrawIndexedIndirect to track need to convert triangle fans
to triangle list, and invoke kernel function when needed.
- Modify MVKCmdDraw, MVKCmdDrawIndexed, and MVKCmdDrawIndirect to populate
and invoke a synthetic MVKCmdDrawIndexedIndirect command to convert triangle
fans to triangle lists.
- Add pipeline factory shader cmdDrawIndirectPopulateIndexes() to convert
non-indexed indirect content to indexed indirect content.
- MVKCmdDrawIndexedIndirect add support for zero divisor vertex buffers
potentially coming from MVKCmdDraw and MVKCmdDrawIndexed.
- Rename pipeline factory shader cmdDrawIndexedIndirectConvertBuffers()
to cmdDrawIndexedIndirectTessConvertBuffers() so it will be invoked from
MVKCommandEncodingPool::getCmdDrawIndirectTessConvertBuffersMTLComputePipelineState()
(unrelated).