To EGLStream or not

The announcement of KDE Neon dev/unstable switching to Wayland by default raised quite a few worried comments as NVIDIA’s proprietary driver is not supported. One thing should be clear: we won’t break any setups. We will make sure that X11 is selected by default if the given hardware setup does not support Wayland. Nevertheless I think that the amount of questions show that I should discuss this in more detail.

NVIDIA does support Wayland – kind-of. The solution they came up with is not compatible to any existing Wayland compositor and requires patches to make it work. For the reference implementation Weston there are patches provided by NVIDIA, but those have not been integrated yet. For KWin such patches do not exist and we have no plans to develop such an adaption as long as the patches are not merged into Weston. Even if there would be patches, we would not merge them as long as they are not merged into Weston.

The solution NVIDIA came up with requires different code paths. This is unfortunate as it would require driver specific adjustments and driver specific code paths. This is bad for everybody involved. For us developers, for the driver developers and most importantly for our users. It means that we developers have to spend time on implementing and maintaining a solution for one driver – time which could be spent on fixing bugs instead. We could do such an effort for one driver, but once it goes to every driver requiring adjustment it gets not manageable.

But also adjustments for one driver are problematic. The latest NVIDIA driver caused a regression in KWin. On Quadro hardware (other hardware seems to be not affected) our shader self test fails which results in compositing disabled. If one removes the shader self test everything works fine, though. I assume that there is a bug in KWin’s rendering of the self test which is triggered only with this driver. But as I don’t have such hardware I cannot verify. Yes, I did pass multiple patches for investigating and trying to fix it to a colleague with such hardware. No, please don’t donate me hardware.

In the end, after spending more than half a day on it, we had to do the worst option which is to add a driver and hardware specific check to disable the self test and ship it with the 5.7.5 release. It’s super problematic for the code maintainability to add such checks. We are hiding a bug and we cannot investigate it. We are now stuck with an implementation where we will never be able to say “we can remove that again”. Driver specific workarounds tend to stick around. E.g. we have such a check:

// Broken on Intel chips with Mesa 9.1 - BUG 313613
if (gl->driver() == Driver_Intel && gl->mesaVersion() >= kVersionNumber(9, 1) && gl->mesaVersion() < kVersionNumber(9, 2))
    return;

It's nowadays absolutely pointless to have such code around as nobody is using such a Mesa version. But the code is still there, makes it more complex and has a maintenance cost. This is why driver specific implementations are bad and is nothing we want in our code base.

People asked to be pragmatic, because NVIDIA is so important. I am absolutely pragmatic here: we don't have the resources to develop and maintain an NVIDIA specific implementation on Wayland.

Also some people complained that this is unfair because we do have an implementation for (proprietary) Android drivers. I need to point out that this does not compare at all.

First of all our Android implementation is not specific for a proprietary driver. It is written for the open source hwcomposer interface exposed through libhybris. All of that is open source. The fact that the actual driver might be proprietary is nothing we like, but also not relevant for our implementation.

In addition the implementation is encapsulated in a platform plugin and significantly reduced in functionality (only one screen, no cursor, etc.). This is something we would not be able to do for NVIDIA (you would want multi-screen, right?).

For NVIDIA we would have to add a deviation into the DRM platform plugin to create the OpenGL context in a different way. This is something our architecture does not support and was not created for. The general idea is that if creating the GBM based context fails, KWin will terminate. Adding support there for a different way to get an OpenGL context up and running would result in lots of added complexity in a very important code path. We have to ensure that KWin terminates if OpenGL fails. At the same time we have to make sure that llvmpipe is not picked if NVIDIA hardware is used. This would be a horrible mess to maintain - especially if developers are not able to test this without huge effort.

From what I understand from the patch set it would also require to significantly change the presenting of a frame on an output and by that also turn our lower level code more complex. This code is currently able to serve both our OpenGL and our QPainter based compositor, but wouldn't allow to support NVIDIA's implementation. Adding changes there would hinder us in future development of the platform plugin. This is an important area we are working on and KWin 5.8 contains a new additional implementation making use of atomic mode settings. We want to have atomic mode settings used everywhere in the stack to have every frame perfect. NVIDIA's implementation would make that difficult.

EGLStreams bring another disadvantage as the code to bind a buffer (what a window renders) to a texture (what the compositor needs to render) would need changes. Binding the buffer is currently performed by KWin core and not part of the plugin infrastructure. Given that new additional code would also be needed there. We don't need that for any other platform we currently support. E.g. for hwcomposer on Andrid libhybris takes care of allowing us to use EGL the same way as on any other platform. I absolutely do not understand why changes would be needed there. Existing code shows that it can be done differently. And here we see again why I think the situation with EGLStream does not compare at all to supporting hwcomposer.

Overall we are not thrilled by the prospect of two competing implementations. We do hope that at XDC the discussions will have a positive end and that there will be only one implementation. I don't care which one, I don't care whether one is better as the other. What I care about is only requiring one code path, the possibility to test with free drivers (Mesa) and the support for atomic mode settings. Ideally I would also prefer to not have to adjust existing code.

Should we target EGL as the default?

When we started the compositing work in KWin the only way to initialize an OpenGL context was by using GLX. In fact GLX is even part of the OpenGL library on Linux. Being an X11 window manager and an X11 compositor it was not a big problem.

Years later we started the effort to get KWin using optionally OpenGL ES in addition to normal OpenGL. One of the differences is that OpenGL ES doesn’t use GLX, but rather EGL. So we gained code to setup compositing using EGL. But this was still a compile time switch and only supported together with using OpenGL ES. If I remember correctly the option to create on OpenGL context on top of EGL was not yet available in our drivers back then.

But there was a time when the FLOSS drivers started to support it and since the 4.10 release in beginning of 2013 we can provide a normal OpenGL context over EGL. This option was not really exposed and only bound to an environment variable. Very few users knew about it and this was also wanted. Using EGL was a good way to shoot yourself in the foot given that not all drivers support it and that we had much more testing on the GLX backend.

Now the situation has changed. With 5.x we expose EGL also through a config interface, but more importantly EGL is the only way to get OpenGL on Wayland. The assumption that every user is using GLX doesn’t hold any more. The assumption that we don’t need to spend so much time (as it’s only relevant in the reduced feature set of mobile devices) on the EGL backend is wrong. It must reach the same quality as the GLX backend, otherwise our Wayland experience suffers.

So this raises an important question: should we try targeting EGL as the default? In the past one of the main reasons to not even think about EGL as default is the reason that the NVIDIA driver didn’t support it. If we would have switched to EGL by default it would have meant dealing with the situation that EGL doesn’t work and we need to fall back to GLX. It would have complicated the startup significantly and would have resulted in our user base using different rendering paths.

Now this one road blocker seems to resolve itself. While I haven’t tested it yet, the latest NVIDIA beta driver supports OpenGL over EGL, so we should be able to have the majority of our user base on the same backend after all. Given that it is only a beta driver and the feature is marked as “experimental” it’s probably still some time till we can default to it.

Nevertheless it’s time to start thinking about it. I encourage NVIDIA users to give EGL now a try and to report issues. Also I encourage all other users to report issues they see with the EGL backend. If we get the EGL backend into a good state maybe 5.6 might be a good point in time to default to it and then start to slowly deprecate the GLX backend with the aim to get it removed in the distant future.