KDE is currently blamed for errors in external components: the graphic drivers. I am lately reading quite some crap (e.g. on it news today) that we KWin devs knew about problems in the drivers and shipped 4.5 nevertheless with changes enabled which trigger the driver bugs. That is of course not true.
First let’s have a look on the checks KWin performs in desktop effects:
- Desktop effects are only enabled if the driver is on a whitelist of known working drivers
- The capabilities of the driver are verified by an external helper application
- If KWin crashes while initializing the driver (you have to access OpenGL to know if OpenGL is supported), KWin will not try to enable desktop effects again
- If KWin crashes twice in a row (not during driver initialization), desktop effects get disabled to prevent further crashes
- Each effect verifies the required capabilities by testing against the supported OpenGL extensions. So for example the blur effect requires GLSL with fragment shaders and frambebuffer objects. Only if the driver supports these extensions the effect gets activated. If a driver does not support the extension it should not claim support for it. The extensions are the only reliable information we have from the graphics driver.
- We have a test in 4.5.0 (removed in 4.5.1 – reason given below) that ensures that framebuffer objects are working in case the driver claims incorrectly support for the extension
- We have a dynamic blacklist that disables effects which require an extension the driver claims incorrectly support for it
- KWin has a selfcheck to test if desktop effects work at all, if not they are disabled
- KWin has a runtime performance check. If the performance is bad, desktop effects are disabled on the fly
The most important fact in this list is, that KWin does not enable any functionality the driver does not claim support for it! Furthermore we have several runtime checks to ensure that our users have a smooth experience even if the drivers are claiming support for extensions they do not support. Many of these checks have been added in the 4.4 and 4.5 release cycle.
Now that I have explained all our checks we did to ensure a smooth user experience, I want to explain how it could happen that there are regressions in 4.5. In 4.5 we introduced two new features which require OpenGL Shaders: the blur effect and the lanczos filter. Both are not hard requirements. Blur effect can easily be turned off by disabling the effect and the lanczos filter is controlled by the general effect level settings which is also used for Plasma and Oxygen animations. Both new features check for the required extensions and get only activated iff the driver claims support for it. So everything should be fine, shouldn’t it?
Apparently not when it comes to the free graphics drivers (please note and remember: we do not see such problems with the proprietary NVIDIA driver!). We used to use indirect rendering for Intel drivers. In that case the driver still claims support for all the extensions which cannot work with indirect rendering, that is for example framebuffer object and shaders (NVIDIA does not support such extensions if direct rendering is not enabled). If we ask the driver if it is supported it says: "yes sure, everything is fine", till you try to use it. This caused serious problems with the blur effect. The effect checks at startup what is supported, sees that everything is supported and tells Plasma that blur is available. Plasma uses therefore the blur optimized theme and then blur just does not work. We do not see the problem on initial startup or creation of the resources, such as compiling the shader, we see it only during the painting pass. This results in completely transparent and impossible to read windows.
Of course this is unacceptable. Disabling the blur effect by default would not have been an option as users would have enabled it and have run into this driver issue. So we had to check it. I added a test to try to get a framebuffer object during initialization. If it fails KWin does not enable the blur effect. Fixed the issue for all Intel users. But this caused other problems, so for some drivers when using a "strange" resolution (e.g. multi screen) compositing cannot be enabled at all. We still do not know what is happening there and cannot debug due to missing hardware. This check had been removed in 4.5.1 again as it was causing problems.
We could drop the check because we had found a better solution: the free drivers also support direct rendering and so we activated it. Of course only when our test application running outside of KWin verifies that direct rendering works. So everything is fine: blur works and no more problems.
Well not if we have the graphics stack in X11. We saw some new problems appearing out of nowhere. The Intel driver seems to be able to support shaders for hardware that is not programmable, that means: software emulation. That’s a rather stupid idea. You use shaders because you want to have the power of the GPU, not the slow general purpose CPU. A GPU (even a bad one) is magnitudes faster in image processing tasks than a CPU and is highly parallel. We also saw issues like windows being rendered upside-down (this issue had already been fixed in mesa git master before we were aware of this problem). This issue hit for example all users of radeon driver in OpenSUSE 11.3. A very nice problem seems to exist for Intel drivers. Given the bug reports we can only conclude that the driver is not able to handle more than one GL context at the same time or started shortly after each other.
When I see these problems I think: "it looks like we are the first one to actually use the drivers". And then I start to think about it and realize: yes we are. Compiz does not yet use GLSL (Compiz’s Blur effect is written in GPU assembler. KWin blur also has an assembler part which is a fallback in case the driver does not claim support for GLSL), so we are probably the first ones to use these driver capabilities in a real world application. Now why are we using something that new? Because it is quite old: this is OpenGL 2 we are speaking about, a standard specified in 2004! Btw. Microsoft made use of blur by default when they introduced Vista, that was in 2006. So we are talking about functionality specified since six years and used by default by our competition for four years. Oh and please note: the same hardware runs fine in Vista or Windows 7 – at least that’s what we can see from the bug reports.
As those new problems appeared I worked on another solution. Just in case that you didn’t know: I don’t have such hardware, I could not reproduce any of these issues. It’s also important to know that we need special combinations of hardware, driver version, mesa version, X server version, kernel version and distribution to have either failed or working setups. So there are easily thousands of possible combinations. Testing everything is just impossible. We introduced a blacklist to workaround the most prominent issues concerning blur and lanczos filter. Lanczos is more important as it cannot be disabled as easy as blur. The blacklist is implemented in a way so that it can easily be extended. I tried to cloud source the problem of finding all broken drivers. This did not work out for 4.5 and for 4.5.1 I just did not find the time to write a new kconfig update script. I hope to get an update ready for 4.5.2.
So far I discussed about the problems we saw and new about before 4.5.0. Given that we introduced several additional checks to ensure a good and smooth setup and all the work we put into ensuring a smooth user experience, I am rather disappointed being blamed for what we did. In my opinion we did everything that is possible. The only other option would have been to withdraw blur completely (as explained: disabling by default would not have solved the problem). This would be rather bad given that we have a large user base which wants blur and have a working driver!
An issue we were not aware of before the release (bug was opened during beta phase, but not confirmed before the release) are freezes when changing any KWin related settings. In fact it is not a freeze. You can still use Alt+Shift+F12 to disable compositing and enable it again and everything works fine. We still do not know what caused the problem, but it seems to be related to enabling direct rendering by default. Unfortunately I am unable to reproduce this problem on the only Intel based hardware I have access to. The only solution so far is to disable direct rendering in general again, but this would be a regression for all users where it is working fine (see above: it cannot be reproduced on each Intel hardware).
So we are in a situation where each of the possible ways to go, is wrong. And it is all caused by the drivers claiming support for functionality they do not support. If they would not do that, we would not have those problems. It is a rather disappointing situation. I can only repeat what is listed in the release note: use the latest drivers.
I hope this brings some light on the problems and shows that we did everything that is possible to work around the driver bugs. The issues have to be fixed in the drivers!
=-=-=-=-=
Powered by Blogilo