We actually can't support this. Metal's validation layer complains if
a pipeline has a different raster sample count from that of the
framebuffer, even in the no-attachment case. This means that the
`defaultRasterSampleCount` property must be set correctly if Metal
supports no-attachment rendering, and the sample count of the dummy
texture must be set properly otherwise.
For these, we must use the `depthPlane` property to set the layer to
clear, and we must iterate over the image's depth, since the layer count
will be 1 in this case.
Some tests intentionally pass an invalid address mode here when
unnormalized coordinates are in use, ostensibly to test that it is
ignored. Metal's validation layer, however, complains if you set
`rAddressMode` to an invalid value, even if `normalizedCoordinates` is
`false`. To avoid this, don't set the `rAddressMode`, since it can't be
used with unnormalized coordinates anyway.
This is needed to get us past the 3D blit tests that were recently added
to the CTS. It *almost* passes all these new tests; the 3D format tests
fail for some reason.
Fixes a use-after-free bug when the pipeline layout is destroyed after
recording--e.g. in the
`dEQP-VK.api.pipeline_layout.lifetime.destroy_after_end` test.
This can be done by copying between each slice of the 2D image and each
plane of the 3D image individually.
This was actually quite simple to implement. I don't know why I punted
on this.
Prior to this, we were leaking objects after failing to configure them.
The sole exception was `VkDescriptorSet`; failed descriptor sets are
automatically returned to the pool. Now all objects are destroyed or
freed when creation fails.
Linear textures on Mac family GPUs aren't renderable, so we cannot use
a `Clear`/`Store` `MTLRenderPass` to clear them. Instead, use a compute
shader to clear them.
I haven't expanded this to all color images, because the
`MTLTextureUsageShaderWrite` usage disables lossless compression on
Apple GPUs, but `RenderTarget` usage does not. Also, multisample
textures do not yet support writing.
When doing multisample resolution in Metal, the dimensions of the MSAA
RT and the resolve destination must be the same. Therefore, if the
resolve region does not cover the entire destination, we must use a
temporary transfer image. This fixes a validation error in the differing
image size tests from the CTS
(`dEQP-VK.api.copy_and_blit.*.resolve_image.diff_image_size.*`).
Because the temporary transfer image has the same dimensions as the
destination and is intended to be resolved to it, copies from the source
should use the destination's parameters for the temp image. That way,
the regions show up in the correct place in the destination. This fixes
the remaining resolve tests.
Don't do expansion blits if the resolve region covers the entire
destination. This should reduce the amount of needless work we do in
that case.
Remove EXCLUDED_ARCHS from all Xcode projects to allow fat platform libraries to be built.
Script copy_lib_to_staging.sh no longer breaks fat libraries into single-architecture
libraries, and simply copies fat file to XCFramework staging area.
This permits support for arm64 on macOS, and arm64e on iOS and tvOS.
Creating a Simulator dylib containing both x86_64 and arm64 (Apple Silicon)
architectures is not currently supported by Xcode, so Simulator dylibs are skipped.
Update remaining documents to reference Vulkan 1.1 instead of 1.0.
Per Vulkan 1.1 spec, remove now-obsolete MVKInstance code
that emits error if app requests higher Vulkan version.
Upgrade MoltenVK version to 1.1.0.
This will be needed for two other Vulkan 1.2 extensions,
`VK_KHR_depth_stencil_resolve` and
`VK_KHR_separate_depth_stencil_layouts`.
Most of this is just changing MVKRenderPass to store everything
internally in `RenderPass2` format. I also added some basic handling for
a few things I left out from earlier changes, input attachment aspect
masks and dependency view offsets. The former won't become important
until Metal supports depth/stencil framebuffer fetch. The latter won't
be needed until we start using untracked resources, and therefore need
to insert explicit fences and/or barriers between subpasses. We don't
need either right now, but I've handled them regardless.
When a pipeline cache were merged into another pipeline cache, we would
create new `MVKShaderLibrary` objects for each one contained in the
source. The objects would be exact copies of the originals... including
their owner, which could be destroyed after the pipeline caches were
merged. Fix the owner in the new objects to prevent a dangling
reference.
This function was introduced with protected memory. Since we don't
support that, right now it does nothing that `vkGetDeviceQueue()` did
not already do. Despite that, I've added a method to `MVKDevice`,
because this is an extensible function analogous to e.g.
`vkGetPhysicalDeviceFeatures2()`.
The functions are now defined under their core names. To avoid code
bloat, I've defined the suffixed names as aliases of the core names.
Both symbols will be globally defined with the same value, and in the
dylib both will be exported.
Fix the default API version when none is given. Zero is the same as
`VK_API_VERSION_1_0`. Prior to this, we were overwriting it with zero if
no app info were given, or if it were zero in the app info. It wasn't
important before, but now that we gate API availability on maximum
Vulkan version, we need to make sure it's a valid version.
Also a non-functional base for future extensions. We can't implement it
anyway until all remaining bugs in `MTLEvent`-based semaphores are
fixed.
This is the last of the extensions that was promoted to core for Vulkan
1.1. We're almost there!
Like with `VK_KHR_device_group` and `VK_KHR_external_memory`, this just
adds the groundwork needed to support future extensions; it provides no
actual support for external fences.
We should be able to easily support `VK_KHR_external_fence_fd`, by using
a POSIX semaphore. Since the fence FDs produced by that extension are
opaque, only supporting `close(2)` and `dup(2)`, we shouldn't have to
worry about portable programs poking the FD in weird ways. Hopefully.
Other types of external fences we might support include GCD semaphores
(`dispatch_semaphore_t`) and Mach semaphores (`semaphore_t`). I really
think we want support for GCD semaphores, because that's the most likely
object we're going to see passed between processes on Darwin given GCD's
built-in support for XPC.
I have deliberately omitted mention of these extensions from the user
guide. `VK_KHR_external_memory` was not mentioned in there, presumably
because no actual external memory types are actually supported.
Also, add missing `vkGetInstanceProcAddr()` entry for
`vkGetPhysicalDeviceExternalBufferPropertiesKHR()`. We have the
function, and we export the extension's name string. We might as well
make it available via `vkGetInstanceProcAddr()`.
Originally, Metal did not support this directly, and still largely
doesn't on GPUs other than Apple family 6. Therefore, this
implementation uses vertex instancing to draw the needed views. To
support the Vulkan requirement that only the layers for the enabled
views are loaded and stored in a multiview render pass, this
implementation uses multiple Metal render passes for multiple "clumps"
of enabled views.
For indirect draws, as with tessellation, we must adjust the draw
parameters at execution time to account for the extra views, so we need
to use deferred store actions here. Without them, tracking the state
becomes too involved.
If the implementation doesn't support either layered rendering or
deferred store actions, multiview render passes are instead unrolled and
rendered one view at a time. This will enable us to support the
extension even on older devices and OSes, but at the cost of additional
command buffer memory and (possibly) worse performance.
Eventually, we should consider using vertex amplification to accelerate
this, particularly since indirect multiview draws are terrible and
currently require a compute pass to adjust the instance count. Also,
instanced drawing in itself is terrible due to its subpar performance.
But, since vertex amplification on family 6 only supports two views,
when `VK_KHR_multiview` mandates a minimum of 6, we'll still need to use
instancing to support more than two views.
I have tested this extensively against the CTS. I'm very confident in
its correctness. The only failing tests are
`dEQP-VK.multiview.queries.*`, due to our inadequate implementation of
timestamp queries; and `dEQP-VK.multiview.depth.*`, due to what I assume
is a bug in the way Metal handles arrayed packed depth/stencil textures,
and which may only be a problem on Mojave. I need to test this on
Catalina and Big Sur.
Update SPIRV-Cross to pull in some fixes necessary for this to work.
Fixes#347.
Delegate to MVKImage's getMTLTexture() method to use the overriden one
for swapchain images.
Also, the _mtlTextureViews member should be cleared to prevent accidental
reuse of released views.
- Delete fat library and framework scripts and templates.
- MoltenVK build package now only includes one XCFramework, and separate platform dylibs.
- Modify fetchDependencies and Makefile targets to not build fat libraries,
and to build simulators separately than platforms instead.
- Script package_moltenvk.sh now copies dylibs for all built platforms.
- Consolidate package_all.sh and delete package_one_os.sh.
- Swap names of copy_lib_to_staging.sh and copy_to_staging.sh scripts.
- Cube demo now uses MoltenVK as XCFramework, and support Simulator builds.
- Hologram demo now uses MoltenVK as dylibs from new packaging location.
- API-Samples demo now uses MoltenVK as XCFramework.
- Update documentation.
Nothing will be drawn in that case. Nothing would've been drawn anyway,
but Metal's validation layer complains if you issue a draw command with
zero vertices or instances.
We unfortunately cannot do anything about indirect draws, since we won't
know how many vertices to draw until execute time.