This parameter is intended to indicate the optimal granularity for the
render area. For TBDR GPUs, this will be the tile size. IMR GPUs
continue to use 1x1.
Apple GPUs support tile sizes of 16x16, 32x16, and 32x32. But, we can't
read the tile size used for a Metal render pass until a render command
encoder has been created. So for now, hardcode 32x32 for TBDR GPUs.
Memoryless textures cannot use `Load`/`Store` actions, because there is
no memory to load from or store to. So don't use these actions, even if
we're not rendering to the whole thing.
Multiple subpasses might be a problem. Tessellation and indirect
multiview *will* be problems, because these require us to interrupt the
render pass in order to do compute--and that causes us to use `Store`
unconditionally.
One option, which I've mentioned before, is using tile shaders for these
cases. But I haven't looked seriously into that yet. The other involves
a subtle distinction between Metal's `MTLStorageModeMemoryless` and
Vulkan's `VK_MEMORY_TYPE_LAZILY_ALLOCATED_BIT`: the former *never*
commits memory; while the latter doesn't commit memory *at first*, but
may do so later. If we find that we're going to need to `Store` to a
`LAZILY_ALLOCATED` image, then what we can do is replace the
`Memoryless` texture with one with a real backing store in `Private`
memory. This change does not do that yet. It'll require some more
thought.
As for multiple subpasses, I eventually want to look into optimizing
render passes by shuffling the subpasses around to minimize the need to
load and store attachments from/to memory, which TBDR GPUs absolutely
hate. That should help with this problem, too.
Metal does not allow color write masks with this format where some but
not all of the channels are disabled. Either all must be enabled or none
must be enabled. Presumably, this is because of the shared exponent.
This is just good enough to stop the validation layer from violently
terminating the program. To implement this properly requires using
framebuffer fetch, with a change to SPIRV-Cross. Luckily, the only GPUs
that support `RGB9E5` rendering also support framebuffer fetch.
Honestly, I don't understand why Apple's drivers don't do this.
We can't rely on those enums not being re-`#define`d, because they're
only re-`#define`d like that in MVKPixelFormats.mm.
Signed-off-by: Chip Davis <cdavis@codeweavers.com>
The game NieR: Automata (via DXVK) attempts to create a 1680x1050
swapchain on a 1600x900 window. It then attempts to render to this
swapchain with a 1680x1050 framebuffer. But we created the textures at
1600x900, matching the window. The render area is thus too big, which
triggers a Metal validation failure. Apparently, DXVK doesn't check the
surface caps before creating the Vulkan swapchain.
Rather than expecting the swapchain to be the same size as the layer, we
can actually support any swapchain size, up to the maximum size of a
texture supported by the device. The system will just scale the texture
when rendering it if it doesn't match the layer size. If the sizes don't
match up, we return `VK_SUBOPTIMAL_KHR`, instead of
`VK_ERROR_OUT_OF_DATE_KHR`, indicating that presentation is still
possible, but performance may suffer. This is good enough to let the
game continue under the validation layers.
This really needs a corresponding change to the CTS, because it
currently assumes that we can't do this.
Signed-off-by: Chip Davis <cdavis@codeweavers.com>
MVKPixelFormats revert testing for stencil feedback and add Mac Catalyst test.
Revert mvkOSVersionIsAtLeast(mac, ios) to remove test for Mac Catalyst version.
MVKRenderPass revert testing for MTLStorageModeMemoryless for iOS.