Right now the Vulkan renderer blocks until the frame is complete
after rendering. This is necessary because Vulkan doesn't
interoperate well with implicit sync we use everywhere else.
Use the new kernel API to import a sync_file into a DMA-BUF to
avoid blocking.
We need to wait for the pending command buffer to complete before
re-using stage buffers. Otherwise we'll overwrite the stage buffer
with new contents before the texture is fully uploaded.
We need to wait for any pending command buffer to complete before
we're able to fully destroy a struct wlr_vk_texture: the Vulkan
spec requires the VkDescriptorSet to be kept alive.
So far we've done this in vulkan_end(), after blocking until the
command buffer completes. We'll soon stop blocking, so move this
logic in get_command_buffer(), where we check which commands buffers
have completed in a non-blocking fashion.
vkCmdCopyBufferToImage requires that the buffer offset be a multiple
of the texel block size, which for single plane uncompressed formats
is the same as the number of bytes per pixel. This commit adds an
alignment parameter to vulkan_get_stage_span which ensures that the
provided span (and the sequence of image copy operations derived which
use it) have this alignment.
We'll use this function from wlr_shm too.
Add some assertions, use int32_t (since the wire protocol uses that,
and we don't want to use 16-bit integers on exotic systems) and
switch the stride check to be overflow-safe.
Call glGetGraphicsResetStatusKHR in wlr_renderer_begin to figure
out when a GPU reset occurs. Destroy the renderer when this
happens (the OpenGL context is defunct).
Allow to get whether has alpha channel of the VkImage, it can help an
optimization to disable blending when the texture doesn't have alpha.
Because the VkFormat isn't enough because it's always set to
VK_FORMAT_B8G8R8A8_SRGB or VK_FORMAT_R8G8B8A8_SRGB.
Before re-using a VkCommandBuffer, we need to wait for its
operations to complete. Right now we unconditionally wait for
rendering to complete in vulkan_end(), however we have plans to
fix this [1]. To fully avoid blocking, we need to handle multiple
command buffers in flight at the same time (e.g. for multi-output,
or for rendering followed by texture uploads).
Implement a pool of command buffers. When we need to render, we
pick a command buffer from the pool which has completed its
operations. If we don't find one, try to allocate a new command
buffer. If we don't have slots in the pool anymore, block like we
did before.
[1]: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/3574
Up until now we were using a VkFence for GPU-to-CPU
synchronization. This has some limitations that become a blocker
when trying to have multiple command buffers in flight at once
(e.g. for multi-output). It's desirable to implement a command
buffer pool [1], but VkFence cannot be used to track command buffer
completion for individual subpasses.
Let's just switch to timeline semaphores [2], which fix this issue,
make synchronization a lot more ergonomic and are a core Vulkan 1.2
feature.
[1]: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/3802
[2]: https://www.khronos.org/blog/vulkan-timeline-semaphores
In wlroots we add comments near struct wl_list members to indicate
which type it's linked to. The Vulkan renderer had some comments
with mistakes, and some members without a comment.
See the spec at [1]. tl;dr EGL has terrible defaults: eglTerminate()
may have side-effects on completely unrelated EGLDisplay objects.
This extension allows us to opt-in to get the sane behavior:
eglTerminate() only free's our own EGLDisplay without affecting
others.
[1]: https://registry.khronos.org/EGL/extensions/KHR/EGL_KHR_display_reference.txt
These are trivial wrappers around eglMakeCurrent and
eglGetCurrentContext. Compositors which need to call these
functions will also call other EGL or GL functions anyways. Let's
reduce our API surface a bit by making them private.
These formats require EXT_texture_norm16, which in turn needs OpenGL
ES 3.1. The EXT_texture_norm16 extension does not support passing
gl_internalformat = GL_RGBA to glTexImage2D, as can be done for
formats available in OpenGL ES 2.0, so this commit adds a field to
wlr_gles2_pixel_format to provide a more specific internalformat
parameter to glTexImage2D.
They are never used in practice, which makes all of our flag
handling effectively dead code. Also, APIs such as KMS don't
provide a good way to deal with the flags. Let's just fail the
DMA-BUF import when clients provide flags.
This new renderer is implemented with the existing wlr_renderer API
(which is known to be sub-optimal for some operations). It's not
used by default, but users can opt-in by setting WLR_RENDERER=vulkan.
The renderer depends on VK_EXT_image_drm_format_modifier and
VK_EXT_physical_device_drm.
Co-authored-by: Simon Ser <contact@emersion.fr>
Co-authored-by: Jan Beich <jbeich@FreeBSD.org>
The half-float formats depend on GL_OES_texture_half_float_linear,
not just the GL_OES_texture_half_float extension, because the latter
does not include support for linear magni/minification filters.
The new 2101010 and 16161616F formats are only available on little-
endian builds, since their gl_types are larger than a byte and thus
endianness dependent.
Khronos refers to extensions with their namespace as a prefix in
uppercase. Change our naming to align with Khronos conventions.
This also makes grepping easier.
We never create an EGL context with the platform set to something
other than EGL_PLATFORM_GBM_KHR. Let's simplify wlr_egl_create by
taking a DRM FD instead of a (platform, remote_display) tuple.
This hides the internal details of creating an EGL context for a
specific device. This will allow us to transparently use the device
platform [1] when the time comes.
[1]: https://github.com/swaywm/wlroots/pull/2671
The wlr_egl functions are mostly used internally by the GLES2
renderer. Let's reduce our API surface a bit by hiding them. If
there are good use-cases for one of these, we can always make them
public again.
The functions mutating the current EGL context are not made private
because e.g. Wayfire uses them.
This allows users to know the capabilities of the buffers that
will be allocated. The buffer capability is important to
know when negotiating buffer formats.
When importing a DMA-BUF wlr_buffer as a wlr_texture, the GLES2
renderer caches the result, in case the buffer is used for texturing
again in the future. When the wlr_texture is destroyed by the caller,
the wlr_buffer is unref'ed, but the wlr_gles2_texture is kept around.
This is fine because wlr_gles2_texture listens for wlr_buffer's destroy
event to avoid any use-after-free.
However, with this logic wlr_texture_destroy doesn't "really" destroy
the wlr_gles2_texture. It just decrements the wlr_buffer ref'count.
Each wlr_texture_destroy call must have a matching prior
wlr_texture_create_from_buffer call or the ref'counting will go south.
Wehn destroying the renderer, we don't want to decrement any wlr_buffer
ref'count. Instead, we want to go through any cached wlr_gles2_texture
and destroy our GL state. So instead of calling wlr_texture_destroy, we
need to call our internal gles2_texture_destroy function.
Closes: https://github.com/swaywm/wlroots/issues/2941
Make it so wlr_gles2_texture is ref'counted (via wlr_buffer). This
is similar to wlr_gles2_buffer or wlr_drm_fb work.
When creating a wlr_texture from a wlr_buffer, first check if we
already have a texture for the buffer. If so, increase the
wlr_buffer ref'count and make sure any changes made by an external
process are made visible (by invalidating the texture).
When destroying a wlr_texture created from a wlr_buffer, decrease
the ref'count, but keep the wlr_texture around in case the caller
uses it again. When the wlr_buffer is destroyed, cleanup the
wlr_texture.
This adds a a function to create a wlr_texture from a wlr_buffer.
The main motivation for this is to allow the renderer to create a
single wlr_texture per wlr_buffer. This can avoid needless imports
by re-using existing textures.
This function is only required because the DRM backend still needs
to perform multi-GPU magic under-the-hood. Remove the wlr_ prefix
to make it clear it's not a candidate for being made public.
It allocates in local main memory via shm_open, and provides a FD
to allow sharing with other processes.
This is suitable for software rendering under the Wayland and X11
backends.
Compute only the transform matrix in the output. The projection matrix
will be calculated inside the gles2 renderer when we start rendering.
The goal is to help the pixman rendering process.
The swapchain maximum capacity is set to 4, so that we have enough room
for:
- A buffer currently displayed on screen
- A buffer queued for display (e.g. to KMS)
- A pending buffer that'll be queued next commit
- An additional pending buffer in case we want to invalidate the
currently pending one