This week I once more noticed the biggest problem of KWin development: the differences between the multiple drivers. Since Monday KWin uses a new OpenGL 2 based coding path as default. The code was mostly written on one of my systems with the nouveau driver and regression testing was mostly done on my second system using fglrx which only supports the "legacy" old coding path by default but can be forced to the new one (but too slow for productive usage).
Around Wednesday notmart reported that the code is broken on NVIDIA proprietary driver. As I had developed on nouveau I had not tested with the blob. So I had to switch the driver again and could confirm the regression (after fixing: it was clearly our bug and not NVIDIA’s). The regression occurred in a code path which seems to be only executed with the NVIDIA blob. It took me hours to figure out what is causing the bug and how to fix it. And it illustrates the big problem in KWin development: without an NVIDIA card and the driver I would not have been able to fix it and I doubt that anyone not knowing the code would have been able to fix it.
Currently KWin supports hardware from the three big vendors: NVIDIA, ATI/AMD and Intel. Each of the vendors has a set of different hardware generations. The biggest difference is between the fixed functionality hardware and the general purpose GPUs. The difference between the hardware generations is so big that assuming that just because it works on one of it, that it works on the others, too, is rather naive.
So we have different hardware vendors with hardware in different generations. Let’s make it more complex: drivers. We also have different drivers for the same hardware. Currently we see the following drivers:
- NVIDIA proprietary driver
- Catalyst (aka fglrx) proprietary driver
- Intel (Mesa driver)
- Radeon (Mesa driver for AMD/ATI)
- Radeon/Gallium3D (Mesa driver for AMD/ATI)
- Nouveau (Mesa driver for NVIDIA, Gallium3D, experimental)
Except for Intel we have different drivers for the same piece of hardware. In case of AMD/ATI even different free drivers (the list is missing radeonhd). The big hope is here in the Gallium3D stack which will hopefully simplify the situation by having one free stack shared by various drivers. As my example above illustrates different drivers for the same hardware show different behavior. So we need not only test with various hardware, but also with the various drivers. So at that point we are somewhere around 30 to 50 combinations of different vendors, hardware generations and the drivers for them.
That’s quite complex already, isn’t it? So let’s make it more complex: driver versions. Obviously each vendor/driver developer is working on their drivers and is fixing bugs and introducing new ones. Of course we need to support all versions properly, so we would need to test with all driver versions. E.g. we know that we see regressions in the latest NVIDIA blobs. Unfortunately I’m still on an older one due to using Debian Testing and not wanting to mess with my setup. So I cannot investigate these issues.
Ok we just made everything more complex with driver versions. Let’s add another complexity factor for the free drivers. The free drivers are more or less bundled together with libdrm, mesa, XServer and the kernel. Some drivers require a specific kernel version or won’t work with another one. This is kind of opening Pandora’s box concerning the possibilities of combinations.
So we have reached the level of combinations which are not being possible to test without recompiling the complete kernel/mesa/X stack. Let’s add another complexity factor: distributions. Distributions do not only ship the upstream drivers, no they "improve" them. We have seen that in the past several times that distro specific code broke KWin. The distribution tries to support a vast diversity of software and it can happen that an optimization for Compiz causes rendering issues in KWin. Also distributions sometimes bundle combinations of drivers and kernel which do not work together. This mostly happens with rolling release distributions. E.g. new Mesa version requiring a new Linux kernel which is not yet released. But rolling release distributions like Arch are also hitting problems before other distributions which can help us to workaround issues before it hits distributions with a larger userbase such as Kubuntu or openSUSE.
All together there are several hundreds of combinations of the stack below KWin. Changes in any of the component can cause regressions. Even trying to test all of them seems to be ridiculous. If I would try to test a reasonable amount of the combinations (vendors * hardware generation * drivers * latest version) I would need something like 30 machines. Of course I don’t have the time for it and that’s also the reason why I declined all hardware offerings so far. It’s just like a drop in a bucket.
But we have users who can test. We need users willing to run the latest KWin master together with the latest driver versions to ensure that we do not run into regressions. And we need users reporting high quality bug reports who are willing to also report them on freedesktop.org bugtracker.
And we need developers: preferable with OpenGL experience. Now is the perfect time to join the KWin development. We switched to a more easily readable coding style and introduced a new and modern OpenGL Shader based compositing backend. You could work on improving the User Experience for all users by writing more awesome effects or optimizing our backend. It’s really fun stuff.
=-=-=-=-=
Powered by Blogilo