Driver dilemma in KDE workspaces 4.5

KDE is currently blamed for errors in external components: the graphic drivers. I am lately reading quite some crap (e.g. on it news today) that we KWin devs knew about problems in the drivers and shipped 4.5 nevertheless with changes enabled which trigger the driver bugs. That is of course not true.

First let’s have a look on the checks KWin performs in desktop effects:

  • Desktop effects are only enabled if the driver is on a whitelist of known working drivers
  • The capabilities of the driver are verified by an external helper application
  • If KWin crashes while initializing the driver (you have to access OpenGL to know if OpenGL is supported), KWin will not try to enable desktop effects again
  • If KWin crashes twice in a row (not during driver initialization), desktop effects get disabled to prevent further crashes
  • Each effect verifies the required capabilities by testing against the supported OpenGL extensions. So for example the blur effect requires GLSL with fragment shaders and frambebuffer objects. Only if the driver supports these extensions the effect gets activated. If a driver does not support the extension it should not claim support for it. The extensions are the only reliable information we have from the graphics driver.
  • We have a test in 4.5.0 (removed in 4.5.1 – reason given below) that ensures that framebuffer objects are working in case the driver claims incorrectly support for the extension
  • We have a dynamic blacklist that disables effects which require an extension the driver claims incorrectly support for it
  • KWin has a selfcheck to test if desktop effects work at all, if not they are disabled
  • KWin has a runtime performance check. If the performance is bad, desktop effects are disabled on the fly

The most important fact in this list is, that KWin does not enable any functionality the driver does not claim support for it! Furthermore we have several runtime checks to ensure that our users have a smooth experience even if the drivers are claiming support for extensions they do not support. Many of these checks have been added in the 4.4 and 4.5 release cycle.

Now that I have explained all our checks we did to ensure a smooth user experience, I want to explain how it could happen that there are regressions in 4.5. In 4.5 we introduced two new features which require OpenGL Shaders: the blur effect and the lanczos filter. Both are not hard requirements. Blur effect can easily be turned off by disabling the effect and the lanczos filter is controlled by the general effect level settings which is also used for Plasma and Oxygen animations. Both new features check for the required extensions and get only activated iff the driver claims support for it. So everything should be fine, shouldn’t it?

Apparently not when it comes to the free graphics drivers (please note and remember: we do not see such problems with the proprietary NVIDIA driver!). We used to use indirect rendering for Intel drivers. In that case the driver still claims support for all the extensions which cannot work with indirect rendering, that is for example framebuffer object and shaders (NVIDIA does not support such extensions if direct rendering is not enabled). If we ask the driver if it is supported it says: "yes sure, everything is fine", till you try to use it. This caused serious problems with the blur effect. The effect checks at startup what is supported, sees that everything is supported and tells Plasma that blur is available. Plasma uses therefore the blur optimized theme and then blur just does not work. We do not see the problem on initial startup or creation of the resources, such as compiling the shader, we see it only during the painting pass. This results in completely transparent and impossible to read windows.

Of course this is unacceptable. Disabling the blur effect by default would not have been an option as users would have enabled it and have run into this driver issue. So we had to check it. I added a test to try to get a framebuffer object during initialization. If it fails KWin does not enable the blur effect. Fixed the issue for all Intel users. But this caused other problems, so for some drivers when using a "strange" resolution (e.g. multi screen) compositing cannot be enabled at all. We still do not know what is happening there and cannot debug due to missing hardware. This check had been removed in 4.5.1 again as it was causing problems.

We could drop the check because we had found a better solution: the free drivers also support direct rendering and so we activated it. Of course only when our test application running outside of KWin verifies that direct rendering works. So everything is fine: blur works and no more problems.

Well not if we have the graphics stack in X11. We saw some new problems appearing out of nowhere. The Intel driver seems to be able to support shaders for hardware that is not programmable, that means: software emulation. That’s a rather stupid idea. You use shaders because you want to have the power of the GPU, not the slow general purpose CPU. A GPU (even a bad one) is magnitudes faster in image processing tasks than a CPU and is highly parallel. We also saw issues like windows being rendered upside-down (this issue had already been fixed in mesa git master before we were aware of this problem). This issue hit for example all users of radeon driver in OpenSUSE 11.3. A very nice problem seems to exist for Intel drivers. Given the bug reports we can only conclude that the driver is not able to handle more than one GL context at the same time or started shortly after each other.

When I see these problems I think: "it looks like we are the first one to actually use the drivers". And then I start to think about it and realize: yes we are. Compiz does not yet use GLSL (Compiz’s Blur effect is written in GPU assembler. KWin blur also has an assembler part which is a fallback in case the driver does not claim support for GLSL), so we are probably the first ones to use these driver capabilities in a real world application. Now why are we using something that new? Because it is quite old: this is OpenGL 2 we are speaking about, a standard specified in 2004! Btw. Microsoft made use of blur by default when they introduced Vista, that was in 2006. So we are talking about functionality specified since six years and used by default by our competition for four years. Oh and please note: the same hardware runs fine in Vista or Windows 7 – at least that’s what we can see from the bug reports.

As those new problems appeared I worked on another solution. Just in case that you didn’t know: I don’t have such hardware, I could not reproduce any of these issues. It’s also important to know that we need special combinations of hardware, driver version, mesa version, X server version, kernel version and distribution to have either failed or working setups. So there are easily thousands of possible combinations. Testing everything is just impossible. We introduced a blacklist to workaround the most prominent issues concerning blur and lanczos filter. Lanczos is more important as it cannot be disabled as easy as blur. The blacklist is implemented in a way so that it can easily be extended. I tried to cloud source the problem of finding all broken drivers. This did not work out for 4.5 and for 4.5.1 I just did not find the time to write a new kconfig update script. I hope to get an update ready for 4.5.2.

So far I discussed about the problems we saw and new about before 4.5.0. Given that we introduced several additional checks to ensure a good and smooth setup and all the work we put into ensuring a smooth user experience, I am rather disappointed being blamed for what we did. In my opinion we did everything that is possible. The only other option would have been to withdraw blur completely (as explained: disabling by default would not have solved the problem). This would be rather bad given that we have a large user base which wants blur and have a working driver!

An issue we were not aware of before the release (bug was opened during beta phase, but not confirmed before the release) are freezes when changing any KWin related settings. In fact it is not a freeze. You can still use Alt+Shift+F12 to disable compositing and enable it again and everything works fine. We still do not know what caused the problem, but it seems to be related to enabling direct rendering by default. Unfortunately I am unable to reproduce this problem on the only Intel based hardware I have access to. The only solution so far is to disable direct rendering in general again, but this would be a regression for all users where it is working fine (see above: it cannot be reproduced on each Intel hardware).

So we are in a situation where each of the possible ways to go, is wrong. And it is all caused by the drivers claiming support for functionality they do not support. If they would not do that, we would not have those problems. It is a rather disappointing situation. I can only repeat what is listed in the release note: use the latest drivers.

I hope this brings some light on the problems and shows that we did everything that is possible to work around the driver bugs. The issues have to be fixed in the drivers!

=-=-=-=-=
Powered by Blogilo

Crash statistics for KWin

KWin is one of those parts of the KDE workspace which receives crash parts for all parts of the stack. We see crashes in Plasma (kind of upstream to us), Qt and most likely in the graphics drivers and X itself. My subjective feeling was that we receive more for us useless crash reports than valid ones. So I just used the bugzilla search to get some numbers for the last year (August 1st, 2009 till today). As you should not trust any statistics you haven’t faked yourself, I used the search to set some more bugs to upstream, duplicate and so on ๐Ÿ˜‰

  • Reported crashes: 330
  • Still open: 31
  • Fixed: 47
  • Set to duplicate: 159 (~ 50 %)
  • Upstream bug: 44
  • Missing backtrace: 37
  • Other (waiting for info, Compiz bug, etc): 12

What’s really bad is that half of the reported crashes are duplicates. And that’s although DrKonqui finds the right reports. In most cases I just copy&paste the bug number reported as a possible duplicate (thanks a lot for this feature). Among those duplicates we have one issue during the 4.4 cycle which caused 31 duplicates and which is now on number two of crashes with most duplicates. In lead is still the crash report for Compiz’ KDE4-Window-Decorator (I said we get crashes for the complete stack including competitive window managers ๐Ÿ˜€ ).

The upstream bugs are mostly driver related bugs. The number does not reflect reality perfectly as some driver crashes are set to duplicate. So we see that we have more crashes in drivers than crashes we fixed! The open bugs contain mostly crash reports where I think that they are also not valid, e.g. crash in malloc. Most of them are also missing a way to reproduce the crash or it is clearly wrong (yes kwin backtraces can tell what you were doing ๐Ÿ˜‰

Oh and we could use a bug day. Since I started to work I don’t have the time any more to search for duplicates and the number of open bugs has increased by > 60 since the 4.4 release. And yes also those are mostly duplicates. It just needs someone to filter them ๐Ÿ™

=-=-=-=-=
Powered by Blogilo

Beautiful Screenshots

This weekend I finally sat down and wrote a screenshot effect for KWin. This effect redirects the rendering of any window into an off-screen texture and saves the texture into the home directory. The advantage of this effect is that it is hooked into the normal rendering process and so we can also capture the shadow and the translucency to get beautiful screenshots. If we capture a transparent window it does not show the windows below but only the captured window with the alpha channel turned on correctly. See for example:

Translucent Konsole with Oxygen shadows

Of course the effect is not meant to replace KSnapshot. Instead I prepared a patch which allows KSnapshot to use this effect when the user selected window under mouse cursor. The screenshot shown above was captured with this change to KSnapshot. I hope I will be able to merge this change, soon ๐Ÿ™‚

=-=-=-=-=
Powered by Blogilo

Next generation OpenGL compositing in 4.6

In this blog post I want to give an overview on what I am planning and working on for KWin in KDE SC 4.6. The big topic for 4.6 is performance – in 4.5 we introduced the blur effect and our designers want to extend the usage of blur to all windows. This is currently not yet recommendable (yes there are widget styles on kde-look which offer this function, but KWin is not ready for it!), so we have to work on it.

In general there are three topics I will address in the next cycle:

  1. Mobile edition: port compositing stack to OpenGL ES 1.1 and/or 2.0
  2. Improve performance: render less, render faster and more frames per seconds during animations
  3. Stabilize the effect ABI to be able to provide BC latest in 4.7 (if it makes sense)

These three topics are very much related to each other. For a mobile port we need to improve the performance and port to OpenGL ES. This requires to cleanup the rendering code which gives us a more modern rendering code resulting (in theory (I do not trust drivers any more)) in better performance (render less and faster). As part of this I am aiming to use forward-compatible OpenGL Shaders by default for rendering. Yesterday OpenGL 4.1 was released and it would be nice to be able to use OpenGL 2 functionality in a real world application on Linux. So this is an urge to the driver developers: please get your drivers ready for 4.6 – we will stress them. In a next step I want to go to OpenGL 3 in 4.7, but the free drivers do not yet support this generation. (That’s perhaps a point to work together with Compiz developers)

Thanks to the improvements on the rendering stack our API gets into a state where it makes sense to think about guaranteeing BC. Currently we do not and break the ABI in each cycle several times (up to now I broke BC in trunk more than five times). Our API does not yet use d-pointers and we have many pure virtual classes, which will make it difficult. But the API itself has not changed much during the last cycles, so it’s pretty stable already. With 4.6 the compositing stack will have been used three years, with two years enabled by default, so it’s really time to think about it. But this is just an idea I have and is not yet discussed with other KWin developers and it will require lots of boring work.

So where are we? I started with improving the rendering code of our Plasma styled frames and moved the rendering code into the concrete implementations for XRender and OpenGL. That made it possible to get something like the following screenshot:

Boxswitch with blurred background

Also the rendering code is improved. Instead of recreating the icons in each rendered frame by doing a QPixmap -> QImage -> GLTexture conversation, the icon texture is reused and generated using Texture From Pixmap (TFP) as used for our window pixmaps (there are still some effects forcing a recreating in each frame, but this will soon be fixed). For passing the geometry to the GPU a vertex buffer object (VBO) is used instead of the very, very old glBegin/glEnd. This VBO is also cached, so each geometry is only created once and passed to the GPU once. My abstraction for VBO is able to fall back to legacy rendering if VBOs are not available (very unlikely), which gives us just one high level API call to do legacy, normal and modern (forward compatible) rendering. The old methods are deprecated although still used by the window rendering (this requires some more useful changes as they currently stream the vertices in each frame). For rendering the effect frames (and all other textures) triangles are now used instead of quads (unavailable on ES and OpenGL 3 forward compatible profiles).

And what next? We still have many things to improve. Especially window rendering has to be changed to cache the geometries (when it makes sense) and the clip regions. The clip regions are one of the reasons why I want to switch to Shaders (other reasons are Nuno’s wishes for effects – if you want to work on awesome effects get in touch with one of us). The API still needs some more cleanup and our compositing stack has to be split into parts for GLX and EGL. Nevertheless I think the API is already in a state where I think of daring to compile KWin effect library this weekend on Maemo to see what breaks (effects would not yet compile).

=-=-=-=-=
Powered by Blogilo

Blacklisting drivers for some KWin effects

This is mostly a post for our distributions. In 4.5 we will most likely activate the Blur effect and Lanczos filter for Taskbar Thumbnails and Present Windows by default. Unfortunately not each driver supports those correctly. We have the following possible problems:

  • Performance issues with Blur Effect
  • Upside-Down taskbar thumbnails with Lanczos filter (affects also Present Windows)
  • Too bright taskbar thumbnails with Lanczos filter (affects also Present Windows)
  • Performance issues with Lanczos filter in Present Windows

Most users won’t have problems with those two new or if there are problems they affect only one of the two. E.g. disabling blur for performance does not require to disable the Lanczos filter. Therefore I implemented a KConfig based blacklist during Akademy. It uses KWin’s default KConfig file (~/.kde[4]+/share/config/kwinrc) and uses KConfigGroup "[Blacklist]". The blur effect uses the sub-group "[Blur]", while the Lanczos filter uses the sub-group "[Lanczos]". This might be extended in future releases.

The blacklist is implemented in a way that specific driver versions for specific hardware are blacklisted. So if a new driver version is released this one won’t be blacklisted automatically in the hope that the new driver version fixes the issues (e.g. it is known that the upside-down issue is already resolved in Mesa master). On the other hand this means that if a problem is not fixed it will occur for the user as soon as the new version is installed. Which will of course cause problems and users will claim that $distribution is harming KDE because of $stupid thing.

The blacklist can be updated by a KConf update script and that’s the way we will ship a default blacklist in 4.5 (this is currently not yet implemented as we still collect information on drivers causing the problems). So if you as a distribution are informed about a problem with a specific driver please pass the information to me, so that I can update the blacklist for the final or minor release and (after the release) please ship a kconfig update script to change the blacklist. This is especially important if you update the drivers (e.g. a development release). Although I have no idea how to automatically update the blacklist if a user is getting the drivers from e.g. xorg-edgers PPA (btw this seems to cause quite some crashers).

The entries of the blacklist are normal KConfig entries with driver identifier as key (e.g. Intel or NVIDIA) and a list of strings for the values. Each list item is a concrete identifier for a combination of renderer and version separated by colon, dash, colon (:-:). Renderer and version can be read from glxinfo. Here an example blacklist:

[Blacklist][Blur]
NVIDIA=GeForce 9400M/PCI/SSE2:-:3.2.0 NVIDIA 195.36.24

[Blacklist][Lanczos]
NVIDIA=GeForce 9400M/PCI/SSE2:-:3.2.0 NVIDIA 195.36.24

Which will block my GPU on both blur effect and lanczos filter (which is of course not required). I will update the blogpost with link to the kconf update script as soon as it is implemented.

=-=-=-=-=
Powered by Blogilo

Web content embedded in KWin effects

Those who attended my talk about KWin Mobile at Akademy could see the new KWin effect which demonstrated that it is possible to embed web content in a KWin effect. Unfortunately nobody could see the highlight of the effect as Germany scored too early and too late. If somebody scored a goal a nice animation would have been shown. I just modified the effect to show the animation on startup to be able to record it. As always: sorry for the bad quality – recordmydesktop does not like me.

Direct download as ogv

And yes the effect is called "Kicker", as it connects to kicker.de to get the current score and in remembrance to a well known panel I kept the name. The effect is quite simple and is just something like 200 lines of code and it took me only one hour (first half of match Spain vs Chile plus the half time – yes I was confident that Germany will have a match during my talk) to implement the parsing of the current score and displaying it on the screen.

Each half a minute the effect connects to the remote site and downloads the html file. Using QtWebKit and some small reverse engineering it extracts the names of the two playing teams and the current score. These strings are combined and put into an "EffectFrame" – our high level API for displaying Plasma styled textures on the screen – and shown on each screen.

The effect also caches the current score and compares it with the previous one. If it changed we know that someone scored a goal and the animation is triggered. This animation (as seen above) uses another EffectFrame, but this time an unstyled. The animation will stop after seven seconds. In that duration the texture’s opacity value is constantly increasing and decreasing, so that you have a flash event. We have a high level API for that as well: KWin::TimeLine which wrapps QTimeLine in an API tainted for the usage in KWin effects.

The code is of course useless for KWin trunk as it is a really special feature and I don’t like the QtWebKit dependency in the effect library. Nevertheless I will import the code into our test directory as it is a nice example on how to use the EffectFrame and really showing how flexible our effect library is.

=-=-=-=-=
Powered by Blogilo

Problems with Desktop Effects in RC1

In 4.5 RC 1 we hit several driver bugs again. We fixed a bug which caused blur not to work, but that fix intoduced different problems with some drivers.

It is possible that effects get suspended directy on startup. In that case try disabling the blur effect! If this does not solve the issue disable direct rendering in advanced effects tab.

There are also some problems with taskbar thumbnails and present windows. The previews might be upside down, too bright or the effect might be too slow. The same problem does not appear in desktop grid! This is caused by driver bugs already fixed upstream. So please update your drivers. You can disable the filter which is used for this in the global effects level, but this will also disable animations in Plasma.

We have bugreports for all those issues. If you want to comment on them please note driver and version.

Akademy Talk: KWin Mobile

This weekend I prepared my slides for next weeks talk at Akademy about the mobile edition of KWin. I will talk about the reasons why we need a KWin mobile edition and why that will also benefit the desktop edition of KWin. For those who are not that familiar with KWin’s Compositing stack I will explain how the system works and where and how we can work on the OpenGL ES port. So that requires to illustrate the differences between the various OpenGL versions and what we have to do to get KWin’s codebase compiling on Maemo. As I have not yet started to implement the port (I think 4.5 is currently more important) I will also provide a roadmap on when we will see the pieces hitting trunk.

As my talk has a very special time slot I prepared a KWin effect on Friday evening for the talk. It’s a small but useful effect for the purpose and took me only about one hour to implement it and a nice effect to illustrate KWin’s flexibility, which is of course also part of the talk. Now I don’t want to spoil it, but I think it’s a nice and elegant solution to the problem that my talk collides with the second half of the quarterfinal Germany vs Argentina or Mexico. So please come to my talk and don’t stay in front of the TVs ๐Ÿ™‚ I really love football and I can’t remember that I ever missed a German match at a World Cup. Given the way Germany played today, it will be really hard to miss the game. Who btw planned Akademy during World Cup?

I'm going to Akademy

=-=-=-=-=
Powered by Blogilo

Technical Limitations of Client-Side-Decorations

Sorry to blog again about Client-Side-Window-Decorations (CSD) but I was just made aware of the fact that you can open Alt+F3 in Plasma popups and that helped to produce this wonderful screenshot (sorry Eike and Plasma devs for destroying your apps in such a horrible way):

As we see all windows have KWin decorations and they should not have them. Yakuake has a kind of own CSD – you see the controls at the bottom. For Yakuake it’s totally fine to use the own controls as it has a very special usecase. Also all the Plasma windows should not have decorations as they are part of the workspace and by that not traditional windows.

So this is a technical and unsolvable problem for all windows which want to have CSD: the window manager may reparent the window and the window has no control over it. And this issue is really unsolvable, even if the NETWM specification would be extended it’s not solvable. There are window managers which have to reparent the window in order to function correctly. These are all tiling window managers. They have to be able to collapse the window into the decoration. Also KWin would never be able to support this correctly as we have features like window rules which can enforce a decoration for windows – no matter if the window wants or not. So to support it correctly we would have to break existing features which we don’t want (and KWin is a tiling wm, now). This also implies that we can never have a change to the specification as this would require consensus on all required parties and I think it is obvious that such a consensus does and cannot exist.

And just to show that this is also an issue for the only real existing CSD application out in the wild: I installed Chromium and it has two decorations:

This was the first time that I used Chromium. It opened up on Desktop 1, but I wanted to have it on Desktop 4 for the screenshot, so I rightclicked the Chromium decoration to move it to the different desktop and it doesn’t work. It doesn’t show the KWin useraction menu. Damn it, I was right, my concerns about CSD I blogged about in my last posts seem to be valid. There just is no consistency. Not to mention that it doesn’t have a menu button, no sticky button, the distance between maximize and close button is too small, clicking anywhere in the titlebar unmaximizes the window, the resize cursors are ugly and not the nice Oxygen one, the tooltips on the buttons have a different text to the one of KWin (no translation). Hmm kind of all my concerns from the last blog post are verified by this small example. When I wrote my last blog post I have never ever used Chromium. I only installed it once in a Lucid VM to illustrate the brokeness due to button layout. So all my concerns were not from looking at what doesn’t work in Chromium, but from what is clear to me what cannot work. So it was a pure theoretical post, while this is the experimental proof.

Please do not try to decorate Chromium with KWin decorations. It will most likely crash KWin!

I'm going to Akademy

=-=-=-=-=
Powered by Blogilo

Blur Effect enabled by default

No, it’s not about window decorations ๐Ÿ™‚

I just enabled the blur effect by default for the beta cycle. If your graphics card at least supports the extension GL_ARB_fragment_program (check with glxinfo) you should see the blur behind Plasma tooltips, etc.

Please give it a try and please report issues. A blur effect could cause slowness and artifacts and we would like to know of them before the release. I hope we won’t have to disable the effect again. Speaking of new cool effects: Fredrik did not only contribute a new blur effect, but also added an awesome shader to the taskbar thumbnails, so they should look much better. Unfortunately there is a small issue only present with NVIDIA driver so the improved thumbnails are only used for RGBA windows. I’m pretty sure we will fix this issue – perhaps even before beta tagging (I just have to recompile complete trunk to test some ideas). And sorry for not presenting a picture – my trunk is really broken at the moment ๐Ÿ˜‰

I'm going to Akademy

=-=-=-=-=
Powered by Blogilo