Prior to this, we weren't even setting the `BLIT_DST` bit for
depth/stencil formats. Conforming apps would thus never pass DS images
at all to `vkCmdBlitImage()`. It is now possible to do that, and even
get scaling and inversion to boot.
Stencil blits require the use of stencil feedback. If this feature isn't
available, both stencil and packed depth/stencil formats have their
`BLIT_SRC` and `BLIT_DST` features turned off, to prevent apps from
attempting to blit the stencil aspect.
There's only a couple of failing tests, involving a 1D stencil blit
(really a 2D stencil with height 1). For some reason, the fragments
produced during a scaled blit get spread out over the rendering surface.
I think this is a bug in Metal; we can't do anything about it.
I've noticed that in some tests, the clear values seem to be off by
about one ULP, and the tests subsequently fail. This should cause Metal
to round up instead of down and fix those tests.
Move setting of limits from MVKPhysicalDevice::initProperties() into separate
initLimits() function. Call initProperties() before initMetalFeatures() and
initLimits() after initMetalFeatures().
Retrieve MTLDrawable when requested, not in MTLCommandBuffer scheduled
handler, in case a different drawable is established by then.
Call timed present after adding presented handler, to avoid race
condition if presentation happens before handler is added.
Update MVKPresentableSwapchainImage::presentCAMetalDrawable() to create a
MTLCommandBuffer scheduled-handler and present the MTLDrawable from there.
According to Apple, it is more performant to call MTLDrawable present from within a
MTLCommandBuffer scheduled-handler than it is to call MTLCommandBuffer presentDrawable:.
Pass presentation timing info as a struct to simplify calls.
useIOSurface() sets the plane information in IOSurface only if
the image format has chroma subsampling. And when we import an
IOSurface, we should only check count of planes when the format
has chroma subsampling.
This resolves a crash in the following case to let two VkImages
share the same IOSurface:
```
VkImage image1, image2;
// the format in |info| is a non-chroma-subsampling format,
// e.g. R8G8B8A8_UNORM.
vkCreateImage(info, &image1);
vkCreateImage(info, &image2);
IOSurfaceRef ioSurface;
vkUseIOSurfaceMVK(image1, nil);
vkGetIOSurfaceMVK(image1, &ioSurface);
vkUseIOSurfaceMVK(image2, ioSurface);
```
For an MVKImage, when we call vkUseIOSurfaceMVK(),
the MTLTexture of each plane will be released but not
reset. And later when we destroy the MVKImage, the
destructor of MVKImagePlane calls releaseMTLTexture()
again, which will cause a crash because of double releasing
of an object.
To solve this we need to reset _mtlTexture to nil after
it is successfully released.
MVKImage track VkImageFormatListCreateInfo::pViewFormats
and validate MVKImageView format against them.
Add MVKImage::getMTLTextureUsage() and move tests for dedicated aliasables,
compressed formats, and minimum image usage into it.
Add MVKMTLFormatDesc::mtlPixelFormatLinear and
MVKPixelFormats::compatibleAsLinearOrSRGB()
to track linear counterparts to each Metal sRGB format.
Don't use MTLTextureUsagePixelFormatView if image will only
allow views that use same format or its sRGB/linear variation.
I know I said I wouldn't have any more, but this fixes a problem with
my change #1064 that could trip Metal's validation layer and/or result in
no data being copied.
This was a good heuristic, because the index buffer must be bound for
indexed draws. However, it may also be bound for non-indexed draws--for
example, if an indexed draw were immediately followed by a non-indexed
draw, as happens in some `dEQP-VK.synchronization.*` tests. Therefore,
we can't tell from the presence or absence of the index buffer what kind
of draw we're in. We'll have to keep track of this state in the command
encoder.
According to the Vulkan spec:
> If `VK_WHOLE_SIZE` is used and the remaining size of the buffer is not
> a multiple of 4, then the nearest **smaller** multiple is used.
> [emphasis added]
Therefore, we should round down when calculating the number of words to
write.
We only want the window server to use the high-performance GPU if we
will use it to present to the display. If we won't use it to present, we
can save some battery life by not using the display. I had hoped this
would help window server stability in case something goes horribly
wrong while using the GPU, but my experience has sadly not borne this
out.
My testing shows that the device returned by
`MTLCreateSystemDefaultDevice()` is exactly equal (i.e. has the same
pointer value) to one of the devices returned by `MTLCopyAllDevices()`,
so we should see no problems from doing this at swapchain create time
instead of device create time.
I anticipated this, and I tried to design the support in SPIRV-Cross so
that this would just work. And they do... well, they work as well as
32-bit types currently do, which is to say, there's plenty of room for
improvement.
Believe it or not, this is valid usage. If an image is aliasable and it
has a dedicated alloc, it is valid for multiple images to bind to the
dedicated memory. Some tests actually try this--for example, the
`dEQP-VK.device_group.afr_dedicated` test.
Normally, we would have to check the framebuffer, but we don't know its
contents until draw time. To avoid yet another situation where we must
compile multiple pipelines, I've used a simple heuristic: if the vertex
pipeline writes to `BuiltInLayer`, this is likely for a layered
framebuffer, and we should use `texture_2darray` for subpass input.
Hopefully this is good enough for all intents and purposes. If not, then
we really will have to wait until draw time. And God help us if someone
try to do this with a 3D texture!
At this point in device initialization, the device properties have not
yet been initialized. Unfortunately, this includes the vendor ID, on
which the maximum SIMD-group size depends. Initialize that property so
we can use it to set the subgroup size correctly.