This was a good heuristic, because the index buffer must be bound for
indexed draws. However, it may also be bound for non-indexed draws--for
example, if an indexed draw were immediately followed by a non-indexed
draw, as happens in some `dEQP-VK.synchronization.*` tests. Therefore,
we can't tell from the presence or absence of the index buffer what kind
of draw we're in. We'll have to keep track of this state in the command
encoder.
According to the Vulkan spec:
> If `VK_WHOLE_SIZE` is used and the remaining size of the buffer is not
> a multiple of 4, then the nearest **smaller** multiple is used.
> [emphasis added]
Therefore, we should round down when calculating the number of words to
write.
We only want the window server to use the high-performance GPU if we
will use it to present to the display. If we won't use it to present, we
can save some battery life by not using the display. I had hoped this
would help window server stability in case something goes horribly
wrong while using the GPU, but my experience has sadly not borne this
out.
My testing shows that the device returned by
`MTLCreateSystemDefaultDevice()` is exactly equal (i.e. has the same
pointer value) to one of the devices returned by `MTLCopyAllDevices()`,
so we should see no problems from doing this at swapchain create time
instead of device create time.
I anticipated this, and I tried to design the support in SPIRV-Cross so
that this would just work. And they do... well, they work as well as
32-bit types currently do, which is to say, there's plenty of room for
improvement.
Believe it or not, this is valid usage. If an image is aliasable and it
has a dedicated alloc, it is valid for multiple images to bind to the
dedicated memory. Some tests actually try this--for example, the
`dEQP-VK.device_group.afr_dedicated` test.
Normally, we would have to check the framebuffer, but we don't know its
contents until draw time. To avoid yet another situation where we must
compile multiple pipelines, I've used a simple heuristic: if the vertex
pipeline writes to `BuiltInLayer`, this is likely for a layered
framebuffer, and we should use `texture_2darray` for subpass input.
Hopefully this is good enough for all intents and purposes. If not, then
we really will have to wait until draw time. And God help us if someone
try to do this with a 3D texture!
At this point in device initialization, the device properties have not
yet been initialized. Unfortunately, this includes the vendor ID, on
which the maximum SIMD-group size depends. Initialize that property so
we can use it to set the subgroup size correctly.
If a shader uses an input attachment and doesn't do layered rendering,
but the image view is of type `MTLTextureType2DArray`, Metal's
validation layer will complain about the texture type mismatching what
the shader expects. This change makes the texture types line up.
If there are no attachments and `renderTargetWidth` and
`renderTargetHeight` are zero, the Metal validation layer complains. To
prevent this, ensure both are at least 1.
One aspect of the `VK_EXT_vertex_attribute_divisor` spec that I
apparently missed is that, for a vertex buffer binding with a divisor of
0, the base instance determines where the attributes are read from. This
cannot be expressed in Metal, but we can emulate it by offseting the
buffer by `firstInstance * stride`.
Unfortunately, we can't do this for indirect draws. If a program tries
this, we're hosed.
When the extent covers both the source and destination images
completely, we can use the copy method on `MTLBlitCommandEncoder` which
can copy multiple slices at once. This should hopefully reduce CPU
overhead and command buffer memory usage.
Combine MoltenVKSPIRVToMSLConverter and MoltenVKGLSLToSPIRVConverter
frameworks into a single MoltenVKShaderConverter framework.
Update corresponding directory structures, symlinks, scripts, and build paths.
Update MoltenVK code to use new framework name for headers.
Add symlinks in API-Samples demo to support legacy
MoltenVKGLSLToSPIRVConverter header paths.
In addition to simplifying shader converter code and build management, the
use of only one shader converter framework fixes a race condition within Xcode,
prior to Xcode 12, when multiple targets use the same dependency XCFramework.
When calculating the vertices, we need to use the render area's
extent--but only if the implementation supports constraining the render
area using `renderTargetWidth` and `renderTargetHeight`. Otherwise, the
quad will be stretched and/or squashed because of the render area
constraint.
Fall through to the 2D case, so all the special handling for 2D is used
for 1D as well. Also, make sure 1D doesn't report multisampling or
support for 420 subsampled formats. There is no
`MTLTextureType1DMultisample` anyway.
Also, clear the `VkImageFormatProperties` struct if the format is not
supported with the given parameters. Some tests seem to expect this.
We don't want to do this for stencil attachment views, because we use
the original packed depth/stencil format in render pipelines, and
Metal's validation layer for some reason doesn't consider packed formats
and their corresponding stencil view formats to match. So only do this
if the image view usage includes `SAMPLED` or `INPUT_ATTACHMENT`.
If the image has a format that supports atomic access, or can be cast to
a format which supports atomic access, then use a texel buffer,
regardless of the memory type. If we can't use the `MTLBuffer` from the
device memory, then create our own.
For #1027.