Don't reset MVKOcclusionQueryCommandEncoderState::_mtlVisibilityResultOffset
when ending occlusion query, as subsequent queries need different offsets.
Also, remove MVKOcclusionQueryCommandEncoderState::_needsVisibilityResultMTLBuffer,
as it is always set once from MVKCommandBuffer::_needsVisibilityResultMTLBuffer.
When app setting MVKConfiguration it with a partial copy, start with existing.
It is possible for the maximum buffer size to be 4 binary gigabytes or
greater. In that case, the upper 32 bits will be lost, and the value of
the `maxUniformBufferRange` and `maxStorageBufferRange` limits would be
too small or even zero. Clamp it to 4 GiB - 1 in that case, since that
is the maximum value of a 32-bit integer.
For #1240.
Refactor auto GPU capture code to support both device and frame capture.
Add ability to automatically capture first GPU frame by setting
`MVK_CONFIG_AUTO_GPU_CAPTURE_SCOPE` to `2`.
Wrap GPU capture in autorelease pool to handle both cases of capture call
being made in run loop as in an app GUI, or in one-shot app like CTS.
They are not expected to be useful beyond the commands that use them,
but they take up memory nonetheless. This is exactly the use case
purgeability was designed for. Tell the system that it's OK to reclaim
their memory if necessary.
Doing this every time the buffer is used will cause the purgeable state
to be reset from `MTLPurgeableStateEmpty`, in case the system really did
reclaim their memory.
In accordance with Apple's advice, lock the pages for the buffer when
loading it, so the memory isn't pulled out from under us.
Add support for "dedicated" temp buffers, where instead of allocating a
big buffer and carving regions out of it, a unique buffer is returned
for each allocation request. This is necessary for visibility buffers,
because the offset passed to `-[MTLRenderCommandEncoder
setVisibilityResultMode:offset:]` cannot exceed an
implementation-defined value, currently 256k less 8 bytes for Mac family
2 on Catalina and up, and on Apple family 7; and 64k less 8 bytes
otherwise.
According to the Metal Feature Set Tables, only family 2 supports
quad-scope permutation. We've been seeing issues with SIMD-group
functions on family 1 hardware, so for now I'm moving quad-group
permutation to family 2.
Instead of having Metal directly write to the query pool's internal
storage, we'll have it write to a temp buffer whose lifetime is tied to
the command buffer. The temp buffer's contents are then accumulated to
all queries that were activated.
This last step is particularly important for queries that span multiple
render passes. Since Metal resets the query counter at a render pass
boundary, this means that, up until now, only the last draw counted
toward the query. Data from the others were lost. By using this temp
buffer and accumulating the results to the query storage, the counter
will correctly count draws from all render passes inside the query
bounds.
This will also fix problems using multiple query pools, particularly
with large query pool support on, in a single render pass. Because Metal
requires us to set the visibility results buffer at render pass start
time, we couldn't use multiple query pools inside a single render pass.
Using a single temp buffer bypasses this problem.
Also, don't make queries available to the host unless they became
available to the device first. That way, a query that is immediately
reset during command buffer execution will properly report that the
query is unavailable. This fixes the remaining dEQP-VK.query_pool.*
tests. Fix some bugs that shook out of this.
Add SPIRVToMSLConversionResults::isPositionInvariant to query
position invariance from SPIR-V.
MVKDevice::getMTLCompileOptions() takes into consideration need to preserve invariance.
MVKShaderModule compile MSL to preserve invariance if required by shader.
Support querying SignedZeroInfNanPreserve execution mode
from SPIR-V to disable fast-math for individual shaders.
Clean up namespace references in SPIRVToMSLConverter.cpp.
MVKConfiguration access is now global, and the VkInstance provided in the
vkGet/Set/MoltenVKConfigurationMVK() functions is ignored. This allows these
functions to be provided with a VkInstance object that originates from a
different Vulkan layer than MoltenVK, without risking breaking the API.
MVKConfiguration extended to cover all MoltenVK environment variables.
Move all environment variable declarations to MVKEnvironment.h.
Add MVKEnvironment.cpp to define config functions.
Cleanup .m files to use MVKCommonEnvironment.h instead of MVKEnvironment.h.
MVKSmallVector allow constructor to size with default values.
Remove obsolete MVKVector, which was long ago replaced with MVKSmallVector.
Remove unnecessary concrete implementations of template functions that are
used only within a single compilation unit.
This parameter is intended to indicate the optimal granularity for the
render area. For TBDR GPUs, this will be the tile size. IMR GPUs
continue to use 1x1.
Apple GPUs support tile sizes of 16x16, 32x16, and 32x32. But, we can't
read the tile size used for a Metal render pass until a render command
encoder has been created. So for now, hardcode 32x32 for TBDR GPUs.