Find a file
Joshua Ashton f132d66816 render/vulkan: Optimize vertex shader
This ends up being a horrible global load:

  s_getpc_b64   s[4:5]                                  // 000000000000: BE841C80
  v_add_u32     v0, s2, v0                              // 000000000004: 68000002
  v_sub_co_u32  v1, vcc, 0, v0                          // 000000000008: 34020080
  v_max_i32     v1, v0, v1                              // 00000000000C: 1A020300
  v_and_b32     v1, 3, v1                               // 000000000010: 26020283
  v_cmp_lt_i32  s[0:1], v0, 0                           // 000000000014: D0C10000 00010100
  v_sub_co_u32  v0, vcc, 0, v1                          // 00000000001C: 34000280
  v_cndmask_b32  v0, v1, v0, s[0:1]                     // 000000000020: D1000000 00020101
  v_lshlrev_b32  v1, 3, v0                              // 000000000028: 24020083
  v_mad_u32_u24  v0, v0, 8, 4                           // 00000000002C: D1C30000 02111100
  v_min_u32     v1, 32, v1                              // 000000000034: 1C0202A0
  v_min_u32     v0, 32, v0                              // 000000000038: 1C0000A0
  s_getpc_b64   s[0:1]                                  // 00000000003C: BE801C00
  s_add_u32     s0, s0, 0x0000003c                      // 000000000040: 8000FF00 0000003C
  s_addc_u32    s1, s1, 0                               // 000000000048: 82018001
  global_load_dword  v1, v[1:2], s[0:1]                 // 00000000004C: DC508000 01000001
  global_load_dword  v0, v[0:1], s[0:1]                 // 000000000054: DC508000 00000000
  v_mov_b32     v2, 0                                   // 00000000005C: 7E040280
  v_mov_b32     v3, 1.0                                 // 000000000060: 7E0602F2
  s_waitcnt     vmcnt(0)                                // 000000000064: BF8C0F70
  exp           pos0, v1, v0, v2, v3 done               // 000000000068: C40008CF 03020001
  exp           param0, off, off, off, off              // 000000000070: C4000200 00000000
  s_endpgm                                              // 000000000078: BF810000
  v_cndmask_b32  v0, s0, v0, vcc                        // 00000000007C: 00000000
  v_cndmask_b32  v0, s0, v0, vcc                        // 000000000080: 00000000
  v_add_f16     v192, s0, v0                            // 000000000084: 3F800000
  v_cndmask_b32  v0, s0, v0, vcc                        // 000000000088: 00000000
  v_add_f16     v192, s0, v0                            // 00000000008C: 3F800000
  v_add_f16     v192, s0, v0                            // 000000000090: 3F800000
  v_cndmask_b32  v0, s0, v0, vcc                        // 000000000094: 00000000
  v_add_f16     v192, s0, v0                            // 000000000098: 3F800000
  v_cndmask_b32  v0, s0, v0, vcc                        // 00000000009C: 00000000

With some bit magic, we can get something much nicer:

  v_add_u32     v0, s2, v0                              // 000000000000: 68000002
  v_add_u32     v1, 1, v0                               // 000000000004: 68020081
  v_and_b32     v1, 2, v1                               // 000000000008: 26020282
  v_cvt_f32_i32  v1, v1                                 // 00000000000C: 7E020B01
  v_mul_f32     v1, 0.5, v1                             // 000000000010: 0A0202F0
  v_and_b32     v0, 2, v0                               // 000000000014: 26000082
  v_cvt_f32_i32  v0, v0                                 // 000000000018: 7E000B00
  v_mul_f32     v0, 0.5, v0                             // 00000000001C: 0A0000F0
  v_mov_b32     v2, 0                                   // 000000000020: 7E040280
  v_mov_b32     v3, 1.0                                 // 000000000024: 7E0602F2
  exp           pos0, v1, v0, v2, v3 done               // 000000000028: C40008CF 03020001
  exp           param0, off, off, off, off              // 000000000030: C4000200 00000000
  s_endpgm                                              // 000000000038: BF810000

The above output was based on just shoving it in ShaderPlayground -- I was not able to use pipeline feedback as I was unable to get RenderDoc working due to the EXT_physical_device_drm requirement.

I additionally considered using >> 1 instead of * 0.5, but AMD has dedicated modifiers to merge a * 0.5, * 2.0, etc in a single instruction. (Albeit, not taken advantage of in the code above, but might with ACO)

Signed-off-by: Joshua Ashton <joshua@froggi.es>
2021-11-23 15:46:24 +00:00
.builds ci/archlinux: enable address and undefined sanitizers 2021-11-19 16:42:14 +00:00
backend input-device: remove wlr_input_device.link 2021-11-23 14:14:18 +00:00
docs docs: mention WLR_RENDERER=vulkan. 2021-11-19 02:08:51 +00:00
examples examples: init wlr_output with allocator and renderer 2021-11-18 09:37:57 -05:00
include input-device: remove wlr_input_device.link 2021-11-23 14:14:18 +00:00
protocol Fix spelling errors 2021-10-02 10:22:13 +02:00
render render/vulkan: Optimize vertex shader 2021-11-23 15:46:24 +00:00
tinywl tinywl: build with meson if examples option is enabled 2021-11-19 16:42:14 +00:00
types wlr_drag: emit destroy after wl_data_device.leave 2021-11-22 22:43:39 +01:00
util util/token: don't leak /dev/urandom fd to children 2021-11-14 12:30:03 +01:00
xcursor xcursor: fix CVE-2013-2003 2021-05-02 17:04:59 +02:00
xwayland xwayland: add support for -noTouchPointerEmulation 2021-11-02 12:02:51 +01:00
.editorconfig Set .editorconfig ident_size 2019-01-25 11:37:46 +01:00
.gitignore Remove rootston 2019-08-09 08:34:59 +09:00
.gitlab-ci.yml ci: add .gitlab-ci.yml 2021-11-01 16:51:18 +01:00
CONTRIBUTING.md CONTRIBUTING.md: add CoC section 2021-11-06 17:01:53 +03:00
LICENSE Update LICENSE year (MIT license) 2018-04-12 21:29:59 -04:00
meson.build tinywl: build with meson if examples option is enabled 2021-11-19 16:42:14 +00:00
meson_options.txt render/vulkan: add Vulkan renderer 2021-10-18 11:51:13 +02:00
README.md s/GitHub/GitLab/ 2021-11-01 18:54:26 +00:00
wlroots.syms build: simplify version script 2021-06-17 11:03:21 +02:00

wlroots

Pluggable, composable, unopinionated modules for building a Wayland compositor; or about 60,000 lines of code you were going to write anyway.

  • wlroots provides backends that abstract the underlying display and input hardware, including KMS/DRM, libinput, Wayland, X11, and headless backends, plus any custom backends you choose to write, which can all be created or destroyed at runtime and used in concert with each other.
  • wlroots provides unopinionated, mostly standalone implementations of many Wayland interfaces, both from wayland.xml and various protocol extensions. We also promote the standardization of portable extensions across many compositors.
  • wlroots provides several powerful, standalone, and optional tools that implement components common to many compositors, such as the arrangement of outputs in physical space.
  • wlroots provides an Xwayland abstraction that allows you to have excellent Xwayland support without worrying about writing your own X11 window manager on top of writing your compositor.
  • wlroots provides a renderer abstraction that simple compositors can use to avoid writing GL code directly, but which steps out of the way when your needs demand custom rendering code.

wlroots implements a huge variety of Wayland compositor features and implements them right, so you can focus on the features that make your compositor unique. By using wlroots, you get high performance, excellent hardware compatibility, broad support for many wayland interfaces, and comfortable development tools - or any subset of these features you like, because all of them work independently of one another and freely compose with anything you want to implement yourself.

Check out our wiki to get started with wlroots. Join our IRC channel: #sway-devel on Libera Chat.

wlroots is developed under the direction of the sway project. A variety of wrapper libraries are available for using it with your favorite programming language.

Building

Install dependencies:

  • meson
  • wayland
  • wayland-protocols
  • EGL and GLESv2 (optional, for the GLES2 renderer)
  • Vulkan loader, headers and glslang (optional, for the Vulkan renderer)
  • libdrm
  • GBM
  • libinput (optional, for the libinput backend)
  • xkbcommon
  • udev
  • pixman
  • libseat

If you choose to enable X11 support:

  • xwayland (build-time only, optional at runtime)
  • libxcb
  • libxcb-render-util
  • libxcb-wm
  • libxcb-errors (optional, for improved error reporting)

Run these commands:

meson build/
ninja -C build/

Install like so:

sudo ninja -C build/ install

Contributing

See CONTRIBUTING.md.