February KWin/Wayland update: all about input

I haven’t blogged for quite some time about the progress on KWin/Wayland and had a few people requesting an update. As we are now approaching a feature freeze and I have most of the things I wanted to do for Plasma 5.6 done, it’s time to blog again. I use this also as a public service announcement: thanks to Let’s Encrypt my blog is also available through an encrypted connection.

Last month my development focus was on the input handling in KWin. That is the part between input events enter through libinput and are sent to the Wayland client. There are many things the compositor needs to consider for input events: updating the window which has focus, ensuring while the screen is locked to not pass events to normal windows, handling focus follows mouse, etc. etc. On X11 KWin has already code for most of these things, but the code is quite dependent on X11, so it needed to be partially adjusted and partially rewritten.

The code we had in KWin/Wayland for input handling so far already showed it’s age. It was written and designed before KWin really became a Wayland compositor, from the time when KWin could render X11 windows to another Wayland compositor. So it mostly cared about sending the events to the X server. Everything else continued to work as the X11 event handling was still in place.

So the first task was to untangle the code so that it’s easier to extend and at the same time guarantee that it won’t break. As we are now able to start KWin/Wayland on a virtual framebuffer, we can run it during our auto tests. This was a rather important corner stone for reworking the input as it allowed to write test cases for everything KWin does.

With that done existing features from X11 could be ported to Wayland including mouse actions (what to do when clicking inactive window), unrestricted move/resize with alt+(left/right) mouse button, focus follows mouse and auto raise, etc. All those features are now also under test coverage and as the code is mostly shared with the X11 implementation we now also have test coverage for these features on X11. That’s quite an improvement for our X11 implementation thanks to Wayland.

Another area of work is keyboard layout handling. So far KWin defaulted to use the us layout without any possibility to change. That was a huge drawback for my own usage as I couldn’t even write my name. Like with many other input related areas I’m not really familiar with the technology, so I had to look into it in more detail. I am very pleased with xkbcommon, it was really easy to get this working and hooked up properly in KWin. The result is that KWin/Wayland now fully supports keyboard layout switches and also the integration with Plasma’s keyboard layout configuration module. I was rather pleased to see that the configuration module was hardly X11 dependent and just works on Wayland. With KWin listening to the correct DBus signal it allows to reconfigure layouts. But there is still work in that area. So far I have not added support for compose keys, the systemtray applet for switching layouts is not ported yet, accessibility features are still lacking. If you are interested in these areas some help is appreciated.

In case you tried Plasma 5.5 on Wayland you might have noticed that the cursor was sometimes rather incorrect. Not anymore in Plasma 5.6. The cursor image handling got also redesigned and put under test coverage (unfortunately our CI system doesn’t like them yet, locally they pass). But having KWin handle cursors correctly is unfortunately not sufficient to have proper cursor images. Cursor images are set by the clients and if they set an incorrect cursor image, KWin cannot do anything about it. For example QtWayland doesn’t support animated cursors and doesn’t support custom cursors. With the feature freeze for Plasma 5.6 behind us I’m also looking into these issues now. Overall client bugs make it really hard to test features as you never know whether it’s a bug in your application or in the client (or XWayland). The fact that GTK+ and wayland-demos just crash, because KWin doesn’t support xdg-shell yet, doesn’t make it easier to test new features.

The last input area I looked at and landed just in time for feature freeze is drag’n’drop. The implementation is not yet using the updated protocol from Wayland 1.10 as we had already passed dependency freeze when I started to implement it.

Overall the improved input handling gives us a nice feature set for Plasma 5.6 on Wayland. On a single screen setup it’s quite useable already. Of course there are bugs and those need you. Please give it a try and report all bugs you see.

In the area of input there is also more work to be done. We need support for graphic tablets. If you are interested in it: I’m willing to mentor a GSoC project about it. We have a few GSoC ideas for Wayland, check them out!

So what’s next in Wayland world? Till the release of Plasma 5.6 I want to concentrate on bug fixes to make the experience better. Than there’s xdg-shell quite high on my priority list and making multi-screen work correctly (that’s blocking me to switch my main system, the single screen notebook is mostly used on Wayland nowadays).

Server-Side decorations coming to KWin/Wayland

As a kind of Christmas present to our Wayland users I’m happy to announce that over the last two weeks I worked on adding support for server-side decorations.

The main motivations for working on it was the fact that I want to switch to Wayland as primary driver for my system and the nested KWin running on top of another Wayland server, which I need for development, doesn’t have any decorations. Of course I could have implemented client-side decorations for it. But as my readers might know, I consider client-side decorations as an inferior solution. And KWin of course has support for server-side decorations anyway for X11 and thus it’s less work to go for server-side decoration than to go for client-side.

The second reason is that Qt’s default client-side decorations are comparable ugly and lack important features like a difference between active and inactive windows which makes using a Wayland session really hard.

In this case a possibility could have been to develop a plugin so that KDecoration based themes could be used for client-side decoration. But to get it really useable this would have required a complex protocol to get in on par with what KWin internally has.

So here’s the solution:

Server Side decoration support in KWin/Wayland
Server Side decoration support in KWin/Wayland

A core element is a protocol to negotiate whether a window should have server-, client-side or no decoration which got added to KWayland. KWin got an implementation for that both as server and client. I plan to submit the protocol for inclusion in Wayland next year. I do think that this can be a general solution: KWin won’t stay the only Wayland compositor preferring to not have client-side decorations. If we think about tiling and use cases like phones we see that client-side cannot be the ultimate solution. Thus I think it’s a useful extension. Of course it doesn’t forbid client-side decoration, that’s still possible with the protocol. So GTK+ applications build upon client-side decoration are still able to use it, but of course I highly recommend to use server-side decorations on a system that prefers server-side decorations (the protocol is also able to tell that).

The last part to get this working got implemented in our Qt Platform Theme plugin for Plasma. This plugin will move from frameworksintegration to Plasma with 5.6, so we can easily extend it and depend on KWayland. The plugin checks whether the server supports the protocol and if it does it disables Qt’s client side decorations. For each new created Wayland window it tells KWin to either use server-side decoration or no decoration (popup windows). As all of that is implemented in our platform theme plugin it means that it doesn’t affect other Wayland compositors. There the plugin does not get loaded and Qt’s client-side decorations will be used. So no fear: this won’t affect GNOME Shell at all. As the plugin is currently in the process of being moved, it’s only in a scratch repository and won’t make it to main this year. Our code deserves a Christmas break as well :-)

Happy holidays and a successful Wayland year 2016!

Gaming on Linux: Move to next generation?

In this blog post I want to outline some thoughts on how the gaming experience can be improved on Linux.

Situation on X11

On X11 the big problem for gaming is the Compositor. Games need access to the GPU and need to pretty much exclusively use the GPU. Compare to a true console like a PlayStation: while the game is running you can be sure it’s the exclusive user of the GPU. The way how compositing works on X11 this cannot be provided. There is the compositor which needs to render the scene. The setup looks simplified like:

  1. Game renders through OpenGL/GLX
  2. X-Server notifies Compositor through XDamage extension
  3. Compositor schedules a repaint for changed area
  4. Compositor uses the XComposite extension to get a pixmap for the Game window
  5. Compositor binds pixmap to an OpenGL texture
  6. Compositor renders the texture using OpenGL/GLX to the composite overlay window
  7. X server presents the rendered image from the Compositor through kernel mode settings

In this setup we have a few things which are not optimal for a fullscreen game: the roundtrips through Xserver-compositor and the possible delay added by the compositor. In such a setup the compositor will always
slow down the game as it performs vsync, etc.

Workarounds on X11

There is an existing solution to improve this which is known as “unredirection of full-screen windows”. The idea is that if there is a full screen window the compositor won’t run for it and the “normal” non-composited functionality of the X server is used.

In my opinion this is the wrong solution to the problem, because the Compositor is still running, it still gets damage events (from possibly other windows) and might start to composite the scene at any time again (e.g. a notification pops up as a override-redirect window).

In KWin/Plasma we have a better solution: blocking compositing. We can do that as we do not require compositing, other environments which require compositing can only use unredirection as best measurement. In fact we even have high-level API for games to tell us that they want to block compositing. It’s just one X atom, please use it.

This also explains why in KWin/Plasma the option to fullscreen unredirection is not enabled by default. It has drawbacks for a non-gaming usage, while we offer a better solution for games. It also explains why I don’t care at all about gaming benchmarks through the PTS as that in my opinion tests an invalid setup for us. If we cared about it, it would be easy to just ensure that the games used by PTS disable compositing.

Situation on Wayland

On Wayland the setup is better as we don’t have X11 in between. So a setup looks like the following:

  1. Game renders through OpenGL/EGL
  2. Compositor gets notified through wl_surface damage
  3. Compositor directly presents the wl_buffer through KMS as it knows there is nothing else to see

So the situation is much improved in an ideal case. I need to point out that KWin doesn’t support this ideal case yet and still renders through OpenGL, but that’s where we are going to.

But I think there are still problems. Our Compositor still gets damage events from other windows, it gets woken up, etc. Running a game in a desktop environment means that there are other session processes which the game needs to share resources with. We want to have a PlayStation like setup: game everything, everyone else nothing. I don’t want KWin to take away important CPU/GPU time from the game.

Kernel Mode Settings in games

So what can we do? I thought about this and propose that we change gaming completely on Linux: remove the windowing system! Games should talk to kernel mode settings directly, games should interact with libinput directly. Let’s remove everything in between, we don’t need it, it only can worsen the gaming experience.

When a fullscreen game starts, it can create a “sub-session” on a new virtual terminal and become the logind- session controller for that session. This would allow the game to open the device files for rendering and for input just like a Wayland compositor does. Rendering could be done through EGL on top of DRM/GBM just like a Wayland compositor. The game would have full control over rendering, there is no desktop environment to slow it down any more. And it would have control over mode setting. Need a different resolution? No problem, just set it. On a desktop environment that’s always problematic (terrible on X11, better on Wayland). For games in windowed mode nothing should be changed, those should stay on the desktop environment.

Of course this would remove all interaction with the desktop environment. This is something which needs to be considered, like how to get Mumble to work in such a setup? Maybe the game could launch its own Wayland server?

That breaks Alt+Tab! Well not really. For one on X11 at least games often grab the keyboard, so Alt+Tab won’t work anyway. And of course one can still switch with ctrl+alt+f1 to the running session. Games should also have a common way to achieve this in my opinion.

There are certainly a few things which still need to be solved to get there, like how to start such a sub-session, how to return to the running session without needing to unlock the screen. The experience should be as good as on dedicated gaming console. So it’s still some work, but I think this is work well invested and better than trying to somehow make games work in Desktop Environments as that just cannot work as good as a dedicated gaming console.

Looking at the security of Plasma/Wayland

Our beta release of Plasma 5.5 includes a Wayland session file which allows to start in a Plasma on Wayland session directly from a login manager supporting Wayland session. This is a good point in time to look at the security of our Plasma on Wayland session.

Situation on X11

X11 is not secure and has severe conceptual issues like

  • any client connected to the X server (either remote or local) can read all input events
  • any client can get information about when another window rendered and get the content of the window
  • any client can change any X attribute of any other window
  • any window can position itself
  • many more issues

This can be used to create very interesting attacks. It’s one of the reasons why I for example think it’s a very bad idea to start the file manager as root on the same X server. I’m quite certain that if I wanted to I could exploit this relatively easily just through what X provides.

The insecurity of X11 also influenced the security design of applications running on X11. It’s pointless to think about preventing potential attacks if you could get the same by just using core X11 functionality. For example KWin’s scripting functionality allows to interact with the X11 windows. In general one could say that’s dangerous as it allows untrusted code to change aspects of the managed windows, but it’s nothing you could not get with plain X11.

Improvements on Wayland

Wayland removes the threats from the X11 world. The protocols are designed in a secure way. A client cannot in any way interact with windows from other clients. This implies:

  • reading of key events for other windows is not possible
  • getting window content of other windows is not possible

In addition the protocols do not allow to e.g. position windows themselves, or raise themselves, change stacking order and so on.

Security removed in Plasma

But lots of these security restrictions do not work for us in Plasma. Our desktop shell need to be able to get information of other windows (e.g. Tasks applet), be able to mark a panel as a panel (above other windows) and need to be able to position windows itself.

Given that we removed some of the security aspects again and introduced a few custom protocols to provide window management facilities and also to allow Plasma windows to position themselves. At the moment we have no security restrictions on that yet which gives this functionality to all clients.

We will address this in a future release. There are multiple ways how this could be addressed, e.g. using the Wayland security module library or use systemd in some way to restrict access. Overall I think it will require rethinking security on a Linux user session in general, more on that later on.

Security added in Plasma compared to X11

The most important change on the security front of the desktop session is a secure screen locker. With Plasma 5.5 on Wayland we are able to provide that and address some long standing issues from X11. The screen locks even if a context menu is open or anything else grabbing input. The compositor knows that the screen is locked and knows which window is the screen locker. This is a huge change compared to X11: the XServer has no concept of a screen locker. Our compositor can now do the right thing when the screen is locked:

  • don’t render other windows
  • ensure input events are only handled in the lock screen
  • prevent access to screen grabbing functionality while screen is locked

As a matter of fact the Wayland protocol itself doesn’t know anything about screen locking either. This is now something we added directly to KWin and doesn’t need any additional custom Wayland interfaces.

How to break the security?

Now imagine you want to write a key logger in a Plasma/Wayland world. How would you do it? I asked myself this question recently, thought about it, found a possible solution and had a key logger in less than 10 minutes: ouch.

Of course there is no way to get a client to act as a key logger. The Wayland protocol is designed in a secure way and also our protocol additions do not weaken that. So the key to get a key logger is to attack KWin.

So what can an attacker do with KWin if he owns it? Well pretty much anything. KWin internally has a very straight forward trust model: everything is trusted and everything can access anything. There is not much to do about that, this is kind of how binaries work.

For example as a Qt application each loaded plugin has access to the QCoreApplication::instance. From there one could just use Qt’s meta object inspection to e.g. get to the InputRedirection model and connect to the internal signal on each key press:

<code>void ExamplePlugin::installKeyLogger()
    const auto children = QCoreApplication::instance()-&gt;children();
    for (auto it = children.begin(); it != children.end(); ++it) {
        const QMetaObject *meta = (*it)-&gt;metaObject();
        if (qstrcmp(meta-&gt;className(), "KWin::InputRedirection") != 0) {
        connect(*it, SIGNAL(keyStateChanged(quint32,InputRedirection::KeyboardKeyState)), this, SLOT(keyPressed(quint32)), Qt::DirectConnection);

void ExamplePlugin::keyPressed(quint32 key)
    qDebug() &lt;&lt; "!!!! Key: " &lt;&lt; key;

But Martin, why don’t you just remove the signal, why should any other aspect of KWin see the key events? Because this is just the example of the most trivial exploit. Of course it’s not the only one. If you have enough time and money you could write more sophisticated ones. For example look at this scenario:

KWin uses logind to open restricted files like the input event files or the DRM node. For this KWin registers as the session controller in logind. Now a binary plugin could just send a DBus call to logind to also open the input event files and read all events. Or open the DRM node and take over rendering from KWin. There is nothing logind could do about it: how should it be able to distinguish a valid from an invalid request coming from KWin?

How to secure again?

As we can see the threat is in loading plugins. So all we need to do is ensure that KWin doesn’t load any plugins from not trusted locations (that is not from any user owned locations). This is easy enough for QML plugins where we have the complete control. In fact it’s easy to ensure for any of KWin’s own plugins. We can restrict the location of all of them.

And even more: by default a system is setup in a way that no binary plugins are loaded from user’s home. So yeah, no problem after all? Well, unfortunately not. During session startup various scripts are sourced which can override the environment variables to influence the loading of plugins. And this allows to also use the well known LD_PRELOAD hack. My naive approach to circumvent this issue didn’t work out at all as I had to learn that already the session startup and the PAM interaction source scripts. So your session might be owned very early.

An approach to black list (unset) env variables is futile. There are too many libraries KWin relies on which in turn load plugins through custom env variables. Most obvious examples are Qt and Mesa. But there are probably many more. If we forget to unset one variable the protection is broken.

A different approach would be to white list some known secure env variables to be passed to KWin. But this also requires that at the point where we want to do the restriction the session is not already completely broken. This in turn means that neither PAM nor the session manager may load any variables into the session before starting the session startup. And that’s unfortunately outside what we can do in our session startup.

So for Plasma 5.5 I think there is nothing we can do to get this secure, which is fine given that the Wayland session is still under development. For Plasma 5.6 we need to rethink the approach completely and that might involve changing the startup overall. We need to have a secure and controlled startup process. Only once KWin is started we can think about sourcing env variables from user locations.

So how big is the threat? By default it’s of course secure. Only if there is already a malicious program running in the system there is a chance of installing a key logger in this way. If one is able to exploit e.g. a browser in a way that it can store an env variable script in one of the locations, you are owned. Or if someone is able to get physical access to your unencrypted hard disk, there is a threat. There are easy workarounds for a user: make all locations from where scripts are sourced during session startup non-writable and non-executable, best change ownership to root and encrypt your home location.

Overall it means that Plasma 5.5 on Wayland is not yet able to provide the security I would have liked to have, but it’s still a huge improvement over X11. And I’m quite certain that we will be able to solve this.

October Plasma on Wayland Update: all about geometry

Last month our Wayland efforts made a huge step forward. In KWin we are now at a state where I think the big underlying work is finished, we entered the finishing line of the KWin Wayland porting. The whole system though still needs a little bit more work.

The big remaining task which I worked on last month was geometry handling. That is simplified: moving and resizing windows. Sounds relatively easy, but isn’t. Moving and resizing windows or in general the geometry handling is one of the core aspects of a window manager. It’s where our expertise is, the code which makes KWin such a good window manager. Naturally we don’t want to throw that code out and want to reuse it in a Wayland world.

And that meant: refactoring. The big problem here was that the code in question is also highly X11 specific. Examples are: the sync protocol for synced resizing, window gravity handling, various X11 specific quirks for moving X11 window moving smooth (e.g. only update at end of move if compositing is active, direct updates if not active).

The task was now to separate the X11 specific geometry handling from generic geometry handling so that it can support both Wayland and X11 windows. With this now in place all the geometry handling works also for Wayland windows. This includes move/resize through the alt+f3 menu, but also triggered from the window itself. During move/resize the quick tiling areas are of course triggered correctly, show the outline and snap to the area if released. Resizing is automatically synced to the speed the client supports and that also in a much better way than on X11 (yay for Wayland and double-buffered state!). During move/resize the windows snap to screen borders and other windows.

Unfortunately our checks to ensure that the window titlebar is not moved outside the visible area doesn’t work, because of client side decorations. Yay, everything is awesome! Have I mentioned that client side decorations are a stupid idea because it breaks useful stuff? No, cannot be. Related to that: QtWayland enters move mode if you click the title bar to activate the window. Of course there should be a delay which KWin implements for it’s decorations. After all we have more than 15 years of experience in doing window decorations. Yay, client side decos! Given control to the application! Let’s leave them in charge of window management, what could go wrong? Wohoo! Totally awesome! Broken window management! That’s the way to go! Of course that could be fixed in Qt, but well it doesn’t fix the problem. We have seen this in Chromium years ago, we even had to adjust our quick tiling/maximization behavior because of it. Another fun fact: QtWayland deco has a minimize button which does nothing because it’s not supported in wl_shell protocol. Have I mentioned that client side decos are awesome?

All right, all right, I’ll end the sarcasm mode now. Our geometry handling has a few more very handy features like window packing: you can configure a shortcut to move the window left (or other direction) and it will move to the next window or the screen border. Similar we support growing or shrinking the window to the next window. I was hardly aware of that feature before doing the refactoring, so I thought I should point it out. It can be used from scripting, so should be useful for ideas like poor man tiling or resizing multiple quick tiled windows at the same time. As a nice addition this code is now covered by auto tests.

So with geometry handling in place it’s possible to do real testing and one of my systems (my notebook) migrated from X11 to Wayland. Actually I have been using Wayland on that system already for watching videos since April as it gives tear-free rendering. A clear plus when watching videos.

And this gives us already the outlook for what we will see in the November update: I’ll focus on stabilizing the current state of Plasma/Wayland and fix bugs, fix bugs and fix bugs. My aim is to have a useable early-adopters version ready for the Plasma 5.5 release. It’s looking good, so I’m confident that we will reach that state.

Upgrading libhybris

One of the most important dependencies for our phone project is libhybris. Libhybris is a neat technology to allow interfacing with Android drivers allowing for example to bring Wayland to a device where all we have are Android drivers.

Given that KWin provides a hwcomposer backend which uses libhybris to create an OpenGL context. All other applications need libhybris indirectly to have the Wayland OpenGL buffer exchange work automatically.

When we started the work on the hwcomposer backend we based it on the libhybris version used in Ubuntu (0.1.0+git20131207) as we used Ubuntu as the reference platform. Soon enough we noticed that this version diverged a lot from the upstream version. Lots of recent changes are missing and there are API incompatible changes.

This made working with it difficult. How much time should we invest in investigating issues? Should we write code which we know might break once Ubuntu decides to upgrade libhybris? How well is Wayland integrated in the Ubuntu version given that they don’t need it? If we need help, who to talk to? Ubuntu who will tell us that they don’t know anything about Wayland, or the libhybris devs who might just tell us: use later version?

Furthermore we want other distributions to provide Plasma for the phone. This means they need to provide libhybris. Of course this is difficult if we need to tell them that we need exactly the version used by Ubuntu. And even more it might conflict with other uses. Considering distributions like Mer would have to chose between a libhybris for Plasma and a libhybris for lipstick.

With that in mind we wanted to invest some time on upgrading libhybris in our stack in this release cycle and then fix the issues we were seeing in the stack. Our awesome packagers did the job of creating packages so that I can port KWin against it. And in deed after some hacking I had KWin rendering again. A more difficult task was to get other applications to work as we run into a problem that libwayland-egl does not use the alternatives system. Thus our packagers needed to do some ld tricks to get this worked around. But with that we had a nice rendering system.

A surprise in this exercise was that our input handling code in the hwcomposer backend didn’t compile any more. The code was gone. While that was in the first moment an unpleasant surprise, it soon turned into something wonderful. If that code is not needed at all on an Android powered device it means that we must be able to get libinput to work with it. 400 lines of code deleted and it’s using the shared input stack through libinput. I’m very happy about that!

With that all in place we finally were able to investigate the rendering issues we were seeing. My hope was that just upgrading libhybris would fix the visual tearing, but unfortunately not. While I’m still surprised that it’s possible to get tearing in the first place on Android devices (hey ever heard of things like Atomic Mode Settings, Android?), it at least gives us a vsync event. Unfortunately the only tear-free solution I could find invokes blocking till we get the event. I don’t like that and I think that’s a bad architecture. One can have blocking free and tear free rendering. Our DRM (kernel mode setting) backend can do so with an easy to use API. Really disappointing that the Android stack is in that regard not better than the glx backend. But well at least it’s tear free :-)

As we now use upstream libhybris I hope to see distributions to pick up the work and provide a Plasma phone spin. I’d love to see an openSUSE phone or a Fedora phone (or any other distribution). Distributions: you can of course ask us on how to integrate :-)

The return of kwin_gles

Back in 4.x we provided two binaries for KWin: one compiled against OpenGL (kwin) and one compiled against OpenGL ES (kwin_gles). The reason for that is that one can only reasonably link either OpenGL or OpenGL ES and OpenGL ES is only a subset of OpenGL, so one needs to hide the OpenGL calls (especially the OpenGL 1 calls).

With 5.x we were no longer able to provide these two binaries. The reason for that is that OpenGL got “upgraded” in Qt and QtGui itself links either OpenGL or OpenGL ES. To keep the system’s sanity we decided to follow how Qt is compiled. If Qt is compiled with OpenGL support KWin gets compiled with OpenGL support, otherwise with OpenGL ES support.

That’s of course a reasonable design, but it means that it becomes difficult to test the OpenGL ES code paths. One needs a dedicated Qt and all other dependencies compiled against OpenGL ES. Or one needs a nice device like a Nexus 5 with Plasma mobile. This had resulted in breakage already as we were not able to test enough. Such times belong to the past as I have a nice Plasma mobile device to compile and test my KWin on.

But since we introduced OpenGL ES support through a compile time switch, many things have changed. KWin dropped the OpenGL 1 support which means that most of the code which wouldn’t compile with OpenGL ES is just gone. Furthermore we switched to use libepoxy, so we don’t link OpenGL or OpenGL ES at all, but libepoxy which does the right thing for us. With that we are able to remove all the compile time checks. Of course we need runtime checks to ensure that we don’t call functionality which isn’t available on OpenGL ES.

Now with the upcoming 5.5 release we are able to have one binary which serves both OpenGL and OpenGL ES. Note to distributions: the artifact kwinglesutils.so is no longer compiled, please adjust your packaging rules. KWin will use either OpenGL or OpenGL ES depending on what Qt uses.

Given that nowadays it’s also possible to create both an OpenGL and OpenGL ES context at runtime we can also make use of that and introduced a new value for our KWIN_COMPOSE environment variable: O2ES. If that’s specified KWin will use the EGL backend and create an OpenGL ES context. Although it in general is also possible to create an OpenGL ES context through glx we do not support that for simplification. As proof debug output (qdbus org.kde.KWin /KWin supportInformation) from my running KWin instance:

Compositing is active
Compositing Type: OpenGL ES 2.0
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Ivybridge Desktop 
OpenGL version string: OpenGL ES 3.0 Mesa 10.6.8
OpenGL platform interface: EGL
OpenGL shading language version string: OpenGL ES GLSL ES 3.00
Driver: Intel
GPU class: IvyBridge
OpenGL version: 3.0
GLSL version: 3.0
Mesa version: 10.6.8
X server version: 1.17.3
Linux kernel version: 4.2
Direct rendering: Requires strict binding: no
GLSL shaders:  yes
Texture NPOT support:  yes
Virtual Machine:  no
OpenGL 2 Shaders are used
Painting blocks for vertical retrace:  yes

So in a way we have kwin_gles back, it’s different as it’s no longer a dedicated binary, but it’s runtime switchable. For the moment the only way will be the environment variable. I’m reluctant to add a config option as that sounds like quite some chance for breakage.

Looking at some crashers fixed this week

Some of the feedback we got, is that we should blog more about how we improve the quality, what kind of bugs we fixed. So today I want to do that with an in-depth explanation of four crash reports I looked at and fixed this week. All of them will go be part of the upcoming 5.4.3 release. All of them were related to QtQuick with three of them being caused by a problem in QtQuick and one caused by a workaround for a QtQuick problem. As I explained in my Monday blog post we are getting hit by issues in the libraries we use, in this case QtQuick.

Closing glxgears crashes KWin

The first issue was communicated to me through IRC. After a small investigation together with the user we figured out the condition to crash it and how to reproduce it:
1. Use an aurorae window decoration theme (e.g. Plastik)
2. Open glxgears
3. close glxgears through the close button

This got reported as Bug 346857 and was nothing new to us: we have had similar problems before, which makes it a little bit sad. The problem here is that glxgears doesn’t speak the close window protocol. Normally when you click the close button the window isn’t closed directly, but the window gets notified “please close your window”. So the mechanism is asynchronous. This also explains why such an issue has not been detected during the testing: it’s not happening with default decoration and it’s not happening with “normal” applications. It can only be reproduced with applications which do not behave correctly.

So what’s the difference in this case? KWin handles the destruction in a synchronous way which causes the decoration to be destroyed in direct result of the mouse click. When the decoration is destroyed the QtQuick scene driving the decoration is also destroyed and it looks like Qt doesn’t like that:

<code>#0  0x00007ffff50fd107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff50fe4e8 in __GI_abort () at abort.c:89
#2  0x00007ffff5de8291 in qt_message_fatal (context=..., message=...) at global/qlogging.cpp:1578
#3  0x00007ffff5de495c in QMessageLogger::fatal (this=0x7fffffff8ca0, msg=0x7ffff6104270 "ASSERT: \"%s\" in file %s, line %d") at global/qlogging.cpp:781
#4  0x00007ffff5dddb90 in qt_assert (assertion=0x7ffff67df129 "context() &amp;&amp; engine()", file=0x7ffff67df0e4 "qml/qqmlboundsignal.cpp", line=183) at global/qglobal.cpp:2966
#5  0x00007ffff6667fe7 in QQmlBoundSignalExpression::function (this=0xf56080) at qml/qqmlboundsignal.cpp:183
#6  0x00007ffff6667d4c in QQmlBoundSignalExpression::sourceLocation (this=0xf56080) at qml/qqmlboundsignal.cpp:155
#7  0x00007ffff663f1fb in QQmlData::destroyed (this=0x873d90, object=0xf54eb0) at qml/qqmlengine.cpp:1709
#8  0x00007ffff663cf6b in QQmlData::destroyed (d=0x873d90, o=0xf54eb0) at qml/qqmlengine.cpp:674
#9  0x00007ffff605c6e9 in QObject::~QObject (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at kernel/qobject.cpp:912
#10 0x00007ffff7265df9 in QQuickItem::~QQuickItem (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at items/qquickitem.cpp:2224
#11 0x00007ffff7324866 in QQuickMouseArea::~QQuickMouseArea (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at items/qquickmousearea.cpp:439
#12 0x00007ffff72d69c5 in QQmlPrivate::QQmlElement&lt;QQuickMouseArea&gt;::~QQmlElement (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#13 0x00007ffff72d69fa in QQmlPrivate::QQmlElement&lt;QQuickMouseArea&gt;::~QQmlElement (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#14 0x00007ffff605e516 in QObjectPrivate::deleteChildren (this=0xf54c90) at kernel/qobject.cpp:1946
#15 0x00007ffff605cb80 in QObject::~QObject (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at kernel/qobject.cpp:1024
#16 0x00007ffff7265df9 in QQuickItem::~QQuickItem (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at items/qquickitem.cpp:2224
#17 0x00007ffff72c3e22 in QQuickRectangle::~QQuickRectangle (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at items/qquickrectangle_p.h:128
#18 0x00007ffff72d6357 in QQmlPrivate::QQmlElement&lt;QQuickRectangle&gt;::~QQmlElement (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#19 0x00007ffff72d638c in QQmlPrivate::QQmlElement&lt;QQuickRectangle&gt;::~QQmlElement (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#20 0x00007ffff736a976 in QQuickView::~QQuickView (this=0x6b1980, __in_chrg=&lt;optimized out&gt;) at items/qquickview.cpp:225
#21 0x00007ffff736a9d2 in QQuickView::~QQuickView (this=0x6b1980, __in_chrg=&lt;optimized out&gt;) at items/qquickview.cpp:227

Now in order to never have that happen again I created a test application and reported a bugreport against Qt. And of course we worked around the problem by delaying the handling of the close to the next event cycle. Now you might wonder how it’s possible that we allowed such a regression to sneak in again given that we had seen it before? Well that’s easily explained, up until recently we were not able to run full tests against KWin. We might have been able to unit test this area, but it would have passed, this issue needed an integration test. And that’s what I added now, so that we won’t hit it ever again. It’s probably the weirdest auto test I have ever written, involving starting glxgears (thanks to our sysadmins for installing it for this test case!), simulating the mouse click on the close button and ensuring glxgears closes without crashing KWin.

Crash when opening window decorations configuration module

The second crash I run into directly after investigating the first one. I wanted to switch back to Breeze decoration but couldn’t because the config module crashed directly. This was bug 344278 – a really nasty one with 74 duplicates. We had wonderful crash traces, but missed the important part about how to reproduce it. From the crash trace alone we were not able to figure out what’s going on. Well we saw some aspects but it wasn’t enough to properly investigate.

Now alas I had a 100 % sure way to reproduce the problem and also understood what’s going on. So here the actual steps to reproduce:
1. Open Window decoration configuration menu
2. Download many themes, many of them, the more the better
3. Select a theme which name’s first latter is far away from “B”, e.g. Plastik.
4. Apply and quit
5. Open window decoration configuration menu again

And boom! What happens is that we load all decorations and put them into a ListView, then we select the theme the user is currently using and ensure it’s centered in the list view. This results in the list view scrolling. ListView has a cache of elements and if you have too many it will throw out other elements. So our Breeze deco which gets created at the start (early in alphabet, will be shown) is kicked out again when selecting Plastik if there are too many decorations. This is the requirement to have many themes installed and also to select one far down in the alphabet. With anything else you won’t trigger it.

The bug itself got triggered through Breeze decoration because there is a Property animation running which triggers another update on the decoration after it has been kicked out. It accesses a member which got destroyed by the QtQuick engine.

All of that also means that the crash would not have been triggered if that code would have been e.g. in Oxygen and not in Breeze. Because then it would not have hit. Also just scrolling in the list after it loaded even if you have many installed, won’t trigger the crash, because then the animation is no longer running. It’s only running after loading in the preview. What I want to show here is that this is an extremely hidden corner case to find. You must have exactly the same conditions to trigger it. Given that I’m also surprised that we have so many duplicates for the report.

So how to fix it? Obvious idea: one could disable the anyway not visible animation in Breeze. But that’s not a fix, that’s just a workaround and doesn’t solve the actual problem. Other decorations might trigger that again. So we need to understand what’s going on. What we knew is that the Decoration triggers an update and crashes because the DecorationBridge is no longer valid. But that’s impossible! The contract of the KDecoration API is that the DecorationBridge will always be valid if there’s a Decoration. So how could the contract break?

For this we need to look into how the QtQuick code for rendering the previews works. Each Decoration is wrapped by a PreviewItem and each decoration gets it’s own DecorationBridge provided by a PreviewBridge. The PreviewBridge is directly constructed through QtQuick, the Decoration is loaded later on. What is now interesting is the tear down. When the PreviewItem gets destroyed by QtQuick it doesn’t delete the Decoration directly, but delays it to the next event cycle. So the Decoration outlives the QtQuick items. When the PreviewItem gets destroyed by QtQuick also the PreviewBridge gets destroyed by QtQuick and thus we have a Decoration which is still alive, but the PreviewBridge isn’t. Why was it done that way? Well to prevent crashers caused by QtQuick. So our workaround for a crash just introduced a new one. Meh.

The solution now is to also delay the deconstruction of the PreviewBridge to the next event cycle. I hope this does not again create a new crash.

Crash when exiting Screens configuration module

The third issue I looked into was also reported to me in response to my Monday’s blog post. This one was supposed to be fixed in Qt 5.5, but wasn’t. It has a clear way to reproduce it:
1. Open Systemsettings
2. Go to “Display Configuration”
3. Click “All Settings” to go back to overview

According to our users that should crash it. Just it doesn’t. This raised my interest – also that it has 73 duplications which makes it rather important. I know the user who reported this to me and know that I can fully believe what he tells me. So this crash raised my interest. It also has a very interesting crash trace:

<code>#0 0x00007ffff2e48d59 in QQuickItemPrivate::addToDirtyList (this=0xdbdcc0) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:5610
#1 0x00007ffff2e48e43 in QQuickItemPrivate::dirty (this=0xdbdcc0, type=&lt;optimized out&gt;) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:5594
#2 0x00007ffff2e496cd in QQuickItem::update (this=0xdbdc40) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:4088
#3 0x00007ffff2e56c0d in QQuickItem::qt_static_metacall (_o=&lt;optimized out&gt;, _c=&lt;optimized out&gt;, _id=&lt;optimized out&gt;, _a=&lt;optimized out&gt;) at .moc/moc_qquickitem.cpp:597
#4 0x00007ffff45f2ae1 in QObject::event (this=this@entry=0xdbdc40, e=e@entry=0x7fffc41f80b0) at kernel/qobject.cpp:1239
#5 0x00007ffff2e53a63 in QQuickItem::event (this=0xdbdc40, ev=0x7fffc41f80b0) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:7294
#6 0x00007ffff6087d94 in QApplicationPrivate::notify_helper (this=this@entry=0x681dd0, receiver=receiver@entry=0xdbdc40, e=e@entry=0x7fffc41f80b0) at kernel/qapplication.cpp:3717
#7 0x00007ffff608d2c8 in QApplication::notify (this=0x7fffffffe4a0, receiver=0xdbdc40, e=0x7fffc41f80b0) at kernel/qapplication.cpp:3500
#8 0x00007ffff45c49dc in QCoreApplication::notifyInternal (this=0x7fffffffe4a0, receiver=0xdbdc40, event=event@entry=0x7fffc41f80b0) at kernel/qcoreapplication.cpp:965
#9 0x00007ffff45c7dea in sendEvent (event=0x7fffc41f80b0, receiver=&lt;optimized out&gt;) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:224
#10 QCoreApplicationPrivate::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0, data=0x681430) at kernel/qcoreapplication.cpp:1593
#11 0x00007ffff45c8230 in QCoreApplication::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0) at kernel/qcoreapplication.cpp:1451
#12 0x00007ffff4617f63 in postEventSourceDispatch (s=0x6d7aa0) at kernel/qeventdispatcher_glib.cpp:271
#13 0x00007fffefcce9fd in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#14 0x00007fffefccece0 in ?? () from /usr/lib/libglib-2.0.so.0
#15 0x00007fffefcced8c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#16 0x00007ffff4617fd7 in QEventDispatcherGlib::processEvents (this=0x6d5850, flags=...) at kernel/qeventdispatcher_glib.cpp:418
#17 0x00007ffff45c339a in QEventLoop::exec (this=this@entry=0x7fffffffe380, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#18 0x00007ffff45cb23c in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1229
#19 0x00007ffff592cbf4 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1528
#20 0x00007ffff6084bb5 in QApplication::exec () at kernel/qapplication.cpp:2977 #21 0x000000000040f52b in main (argc=1, argv=&lt;optimized out&gt;) at /mnt/AUR/systemsettings-git/src/systemsettings/app/main.cpp:55

The interesting part here is that it’s completely inside Qt. We come from the event loop and an event is handled inside Qt and crashes. No code of the control module is executed in this trace any more.

But why am I not able to reproduce? After all there are so many users hitting it. I have the same Qt version, so it should crash. That I figured it out was pure chance. I remembered that the user has an NVIDIA system and I verified that this is still the case. I have an Intel system. So why is that important? For NVIDIA QtQuick uses threaded rendering, while for Mesa it uses the main gui thread for rendering. I had seen in the past that with threaded rendering destruction might be moved to the next event cycle. So easy thing to test:
QSG_RENDER_LOOP=threaded systemsettings5 and follow the reproduction steps and boom! Yay, I have a test case. Following the old saying: “consider a bug fixed when a developer is able to reproduce” the hardest way was solved.

So I started investigating. Let’s try with kcmshell5 instead of systemsettings. Hmm doesn’t crash. Let’s try with kscreen’s test application instead of systemsettings. Hmm, doesn’t crash. So something in systemsettings must trigger it! And I started reading code and read and read, tried here something, tried there something and come to the conclusion: systemsettings is not doing anything wrong.

Given that it must be the QtQuick code of the screen configuration. After some trials I had a minimal derivation to the QtQuick code which didn’t crash any more. Here again helped previous experience with QtQuick related crashers: I saw some usage of QtGraphicalEffects and remembered that this one used to crash KWin with the Breeze Aurorae Theme prior to Plasma 5.2. Removing the OpacityMask didn’t crash. Yay! So how to get that into a fix? The solution was actually in the debug output of QtQuick:
QSGDefaultLayer::bind: ShaderEffectSource: ‘recursive’ must be set to true when rendering recursively.

So obviously I set this missing recursive on the OpacityMask. Hmm, compile error, that doesn’t exist. So let’s make it non recursive and there we go, it doesn’t crash any more.

I would love to report this to Qt, but so far I failed with creating a simple test case. Just like with my testing with kcmshell5 and the kscreen test application I’m not able to hit the condition and I haven’t figured out yet what is different in systemsettings.

Opening effects configuration twice crashes

The last issue to look at was triggered through the previous issue. During review it was pointed out that there are more QtQuick configuration modules which crash in similar ways. So I tried them all and hit a crash if:
1. Open Systemsettings
2. Go to Desktop Behavior
3. Click Desktop Effects
4. Click All Settings
5. Repeat steps 2 and 3

Again a very interesting back trace:

<code>#0 0x00007ffff28e7107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff28e84e8 in __GI_abort () at abort.c:89
#2 0x00007ffff35d2291 in qt_message_fatal (context=..., message=...) at global/qlogging.cpp:1578
#3 0x00007ffff35ce95c in QMessageLogger::fatal (this=0x7fffffff3920, msg=0x7ffff38ee270 "ASSERT: \"%s\" in file %s, line %d") at global/qlogging.cpp:781
#4 0x00007ffff35c7b90 in qt_assert (assertion=0x7ffff1f9276a "value.isString()", file=0x7ffff1f92710 "jsruntime/qv4runtime.cpp", line=439) at global/qglobal.cpp:2966
#5 0x00007ffff1ddb5eb in QV4::RuntimeHelpers::convertToObject (engine=0x139a400, value=...) at jsruntime/qv4runtime.cpp:439
#6 0x00007ffff1ddcc6e in QV4::Runtime::getProperty (engine=0x139a400, object=..., nameIndex=136) at jsruntime/qv4runtime.cpp:682
#7 0x00007ffff1dca362 in QV4::Moth::VME::run (this=0x7fffffff41d7, engine=0x139a400, code=0x7fffcc15f830 "\357\230\334\361\377\177", storeJumpTable=0x0) at jsruntime/qv4vme_moth.cpp:487
#8 0x00007ffff1dce656 in QV4::Moth::VME::exec (engine=0x139a400, code=0x7fffcc15f6c8 "\366\251\334\361\377\177") at jsruntime/qv4vme_moth.cpp:925
#9 0x00007ffff1d58eb3 in QV4::SimpleScriptFunction::call (that=0x7fffd3424010, callData=0x7fffd3424018) at jsruntime/qv4functionobject.cpp:564
#10 0x00007ffff1c93e14 in QV4::Object::call (this=0x7fffd3424010, d=0x7fffd3424018) at ../../include/QtQml/5.5.1/QtQml/private/../../../../../src/qml/jsruntime/qv4object_p.h:305
#11 0x00007ffff1e93cac in QQmlJavaScriptExpression::evaluate (this=0x2b7a690, context=0x1392ad0, function=..., callData=0x7fffd3424018, isUndefined=0x7fffffff4553) at qml/qqmljavascriptexpression.cpp:158
#12 0x00007ffff1e939a5 in QQmlJavaScriptExpression::evaluate (this=0x2b7a690, context=0x1392ad0, function=..., isUndefined=0x7fffffff4553) at qml/qqmljavascriptexpression.cpp:116
#13 0x00007ffff1e9c484 in QQmlBinding::update (this=0x2b7a670, flags=...) at qml/qqmlbinding.cpp:194
#14 0x00007ffff1e9cfac in QQmlBinding::update (this=0x2b7a670) at qml/qqmlbinding_p.h:97
#15 0x00007ffff1e9cab2 in QQmlBinding::expressionChanged (e=0x2b7a690) at qml/qqmlbinding.cpp:260
#16 0x00007ffff1e94d67 in QQmlJavaScriptExpressionGuard_callback (e=0x15ddbb0) at qml/qqmljavascriptexpression.cpp:361
#17 0x00007ffff1e72d0f in QQmlNotifier::emitNotify (endpoint=0x0, a=0x0) at qml/qqmlnotifier.cpp:94
#18 0x00007ffff1dfb3f6 in QQmlData::signalEmitted (object=0x33914f0, index=30, a=0x0) at qml/qqmlengine.cpp:763
#19 0x00007ffff384d31e in QMetaObject::activate (sender=0x33914f0, signalOffset=29, local_signal_index=1, argv=0x0) at kernel/qobject.cpp:3599
#20 0x00007ffff1df7906 in QQmlVMEMetaObject::activate (this=0x3391720, object=0x33914f0, index=44, args=0x0) at qml/qqmlvmemetaobject.cpp:1325
#21 0x00007ffff1df5436 in QQmlVMEMetaObject::metaCall (this=0x3391720, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at qml/qqmlvmemetaobject.cpp:841
#22 0x00007ffff1bb7432 in QAbstractDynamicMetaObject::metaCall (this=0x3391720, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at /home/martin/src/qt5/qtbase/include/QtCore/5.5.1/QtCore/private/../../../../../src/corelib/kernel/qobject_p.h:421
#23 0x00007ffff1df5e91 in QQmlVMEMetaObject::metaCall (this=0x2ca1900, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at qml/qqmlvmemetaobject.cpp:969
#24 0x00007ffff1bb7432 in QAbstractDynamicMetaObject::metaCall (this=0x2ca1900, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at /home/martin/src/qt5/qtbase/include/QtCore/5.5.1/QtCore/private/../../../../../src/corelib/kernel/qobject_p.h:421
#25 0x00007ffff3817ce1 in QMetaObject::metacall (object=0x33914f0, cl=QMetaObject::WriteProperty, idx=42, argv=0x7fffffff6860) at kernel/qmetaobject.cpp:294
#26 0x00007ffff1e14605 in QQmlPropertyPrivate::write (object=0x33914f0, property=..., value=..., context=0x3391390, flags=...) at qml/qqmlproperty.cpp:1308
#27 0x00007ffff1e13f47 in QQmlPropertyPrivate::writeValueProperty (object=0x33914f0, core=..., value=..., context=0x3391390, flags=...) at qml/qqmlproperty.cpp:1237
#28 0x00007ffff1e163ab in QQmlPropertyPrivate::writeBinding (object=0x33914f0, core=..., context=0x3391390, expression=0x3391bd0, result=..., isUndefined=false, flags=...) at qml/qqmlproperty.cpp:1597
#29 0x00007ffff1e9c567 in QQmlBinding::update (this=0x3391bb0, flags=...) at qml/qqmlbinding.cpp:198
#30 0x00007ffff1e9cfac in QQmlBinding::update (this=0x3391bb0) at qml/qqmlbinding_p.h:97
#31 0x00007ffff1e9cab2 in QQmlBinding::expressionChanged (e=0x3391bd0) at qml/qqmlbinding.cpp:260
#32 0x00007ffff1e94d67 in QQmlJavaScriptExpressionGuard_callback (e=0x15dd948) at qml/qqmljavascriptexpression.cpp:361
#33 0x00007ffff1e72d0f in QQmlNotifier::emitNotify (endpoint=0x0, a=0x0) at qml/qqmlnotifier.cpp:94
#34 0x00007ffff1dfb3f6 in QQmlData::signalEmitted (object=0x3391fc0, index=31, a=0x0) at qml/qqmlengine.cpp:763
#35 0x00007ffff384d31e in QMetaObject::activate (sender=0x3391fc0, signalOffset=31, local_signal_index=0, argv=0x0) at kernel/qobject.cpp:3599
#36 0x00007ffff384d120 in QMetaObject::activate (sender=0x3391fc0, m=0x7ffff2688b20 &lt;QQuickLoader::staticMetaObject&gt;, local_signal_index=0, argv=0x0) at kernel/qobject.cpp:3578
#37 0x00007ffff2432687 in QQuickLoader::itemChanged (this=0x3391fc0) at .moc/moc_qquickloader_p.cpp:321
#38 0x00007ffff2431370 in QQuickLoaderPrivate::incubatorStateChanged (this=0x2ca0700, status=QQmlIncubator::Ready) at items/qquickloader.cpp:666
#39 0x00007ffff24312e4 in QQuickLoaderIncubator::statusChanged (this=0x30f9e70, status=QQmlIncubator::Ready) at items/qquickloader.cpp:654
#40 0x00007ffff1e1f2ea in QQmlIncubatorPrivate::changeStatus (this=0x30f9e90, s=QQmlIncubator::Ready) at qml/qqmlincubator.cpp:701
#41 0x00007ffff1e1eab7 in QQmlIncubatorPrivate::incubate (this=0x30f9e90, i=...) at qml/qqmlincubator.cpp:368
#42 0x00007ffff1e1dd0e in QQmlEnginePrivate::incubate (this=0x13882d0, i=..., forContext=0x30f9db0) at qml/qqmlincubator.cpp:87
#43 0x00007ffff1e1a6d3 in QQmlComponent::create (this=0x15bfe90, incubator=..., context=0x2ca7160, forContext=0x0) at qml/qqmlcomponent.cpp:1068
#44 0x00007ffff24317a4 in QQuickLoaderPrivate::_q_sourceLoaded (this=0x2ca0700) at items/qquickloader.cpp:714
#45 0x00007ffff2430f10 in QQuickLoaderPrivate::load (this=0x2ca0700) at items/qquickloader.cpp:597
#46 0x00007ffff24319bf in QQuickLoader::componentComplete (this=0x3391fc0) at items/qquickloader.cpp:806
#47 0x00007ffff1ead859 in QQmlObjectCreator::finalize (this=0x31869c0, interrupt=...) at qml/qqmlobjectcreator.cpp:1207
#48 0x00007ffff1e1a094 in QQmlComponentPrivate::complete (enginePriv=0x13882d0, state=0x30b89a0) at qml/qqmlcomponent.cpp:928
#49 0x00007ffff1e1a17c in QQmlComponentPrivate::completeCreate (this=0x30b8900) at qml/qqmlcomponent.cpp:964
#50 0x00007ffff1e1a12c in QQmlComponent::completeCreate (this=0x1544040) at qml/qqmlcomponent.cpp:957
#51 0x00007ffff1e19953 in QQmlComponent::create (this=0x1544040, context=0x2bdd5f0) at qml/qqmlcomponent.cpp:791
#52 0x00007ffff24396e6 in QQuickView::continueExecute (this=0x134d440) at items/qquickview.cpp:476
#53 0x00007ffff2438617 in QQuickViewPrivate::execute (this=0x15c38c0) at items/qquickview.cpp:124
#54 0x00007ffff2438a2c in QQuickView::setSource (this=0x134d440, url=...) at items/qquickview.cpp:253
#55 0x00007fffd5c20a65 in KWin::Compositing::EffectView::init (this=0x134d440, type=KWin::Compositing::EffectView::DesktopEffectsView) at /home/martin/src/kf5/kde/workspace/kwin/kcmkwin/kwincompositing/model.cpp:613

Like in the previous example it’s a crash deep down in Qt. The first code related to code we distribute is at stack position 55. So again I needed to experiment to figure out the minimal code which triggers the crash and after some trials I figured out what causes it: setting root context properties. After reworking this to no longer needing a context property, it doesn’t crash any more.

Again this should be reported to Qt, bug so far I failed to create a simple example demonstrating the problem Of course setting a root context property works in my examples. If one of my readers have an idea what’s so special about systemsettings to trigger it, please let me know so that I can report bugs against Qt.


As we can see with these four cases: we are hit by issues further down in the stack. We can work around them, but this comes with a cost as it doesn’t fix the actual problem. Some of the problems are real corner cases, hardware dependent and cannot be reproduced by all developers.

Some thoughts on the quality of Plasma 5

Last week we got quite some criticism about the quality of KDE Plasma 5 on the Internet. This came rather surprising for us and is at least in my opinion highly undeserved. So far what we saw is that Plasma has a high quality – probably better than previous iterations of what was known as the KDE Desktop Environment – and got lots of praise for the state it is in. So how come that there is such a discrepancy between what we see and what our users see?

The chain breaks at the weakest link

Plasma is just one piece of the user experience our users get. It’s the most visible one and it’s also the one which gets all the blame if things break. But in reality there is much more to it. Plasma depends on other libraries – most importantly Qt – and on drivers (OpenGL/Mesa) and on hardware. All of that is put together by distributions. We don’t ship the software to our users, we ship it to the distributions which then integrate it with the rest of the stack. All we can do is give recommendations to our distributions about common issues and how to prevent that they happen. Now I don’t want to move the blame to the distributions, because that would also be undeserved. I just want to explain how complex the process is.

Some numbers

So let’s start with a look at some numbers. I’m looking at a combination of the applications plasmashell, kwin, krunner and systemsettings. These are the most visible to the users. The numbers will be slightly distorted, because there were also bugs going in for the old 4.11 series and only plasmashell is a 5 only product.

Over the last 365 days 1313 crash reports were reported for these products. That’s quite a number, but means it’s only about 3.5 per day. Given our large user bases that’s not that bad. If we had a huge quality problem we would get hundreds of bug reports per day. And yes we do, I had been in such situations in KWin that we got two digits numbers of duplicates per day.

Of those 1313 crash reports 127 are still unconfirmed (we need to improve here!) and 23 confirmed. The number looks high, but now we need to look at how they distribute. There are a few bugs with a high number of duplicates. Given my limited search skills I found one with 91 duplicates which got a fix released six months ago, but we still see new duplicates for it. I come back to how that happens later. A few bugs reported several times and we get close to the numbers. I just summed up the number of the ten most often reported crashers and that’s already about half of the reported crashes.

So why are there crashes reported so often and are not immediately fixed? This is exactly the weakest link I have been talking about. It’s things we don’t have an influence on.

Graphics drivers

One of the most severe issues we hit with Plasma 5 is an issue with the Intel graphics drivers. I think that this problem is the reason why people complain about instability of Plasma. The only reason. Crashes in drivers (especially Intel) are nothing new, it’s a problem which has haunted us for years. For example the most often reported crash against KWin is a crash in the Intel driver – we worked around that issue by disabling a feature for all Intel users and will never ever enable it again.

Now I know that this sounds like passing the blame to the graphics drivers and I can imagine that you say that we need to QA test. That is true, we need to QA test, also against the drivers. Just that’s not possible. I blogged about why that’s not possible back in 2011. Nothing has changed except that the number of possible hardware and driver combinations further increased.

The biggest problem for us is that the drivers a distro will ship doesn’t exist yet when we develop and test our software. Even if we would use development drivers it would be too old compared to what distros ship.

One could say it’s the responsibility of the distribution to do the QA. Yes, they as a software integrator should make sure that the software works together. And they do, but lots of it runs in VMs which don’t provide the “real” hardware. But expecting distributions to do all the combinations we cannot test, is also wrong. If there is one which can do the quality assurance of e.g. the Intel driver it’s Intel. They have the hardware, they have the developers (AFAIK there are more fulltime devs working on the Intel drivers than we have devs on Plasma), they have the QA team. I do not know how they test, but I am sure they could spot such severe driver regressions.

What to do when we hit such a severe driver issue? Well that’s difficult. If possible we can workaround like the KWin problem I mentioned above. But that still takes time till it reaches the users. In this particular case we informed distributions about a possible workaround on Xserver level and I hope ll distributions applied it. If not please complain to your distribution.

Multi Screen woes

Another source of reported instability is multi screen handling on X11. Again the problem does not lie on our side but in this case Qt. First a word of clarification: auto-testing multi-screen on X11 is extremely difficult. Virtual X servers like Xvfb do not support the randr extension which makes it impossible to mock a correct behavior in X. Also testing with real X doesn’t help as that can only test with the screens one has and physically plugging out a screen during an auto-test isn’t really a solution.

So this sounds like blaming Qt, which of course is not what I want to do. You can rightly question why we as KDE do not help Qt with it. Well we did. Especially Dan Vratil did an incredible job of improving the experience directly in Qt. There were things which one wouldn’t believe are possible. For example there is an xrandr call to read the configuration and that blocks the Xserver in the Intel driver. Now each Qt application did that and caused a freeze. When I figured out the root source for this problem I created a remote denial of service proof of concept against X server and informed the security list about this problem. I did this in January and haven’t got a reply till now and have not talked in public about this. So yeah I just did a full-disclosure. Anyway Dan worked around this problem and fixed many more.

So why is this all still an issue? This is slightly related to how Qt and Plasma releases are not synced. For example at the moment Qt 5.5 will not receive any bug fix releases any more, but all we have in distributions is Qt 5.4. This means any fixes we do will not reach users till distributions roll out Qt 5.6. It’s rather depressive to be honest. You know you fixed an issue, but it doesn’t reach your users. I think this needs works from both Qt and the distributions. We need more bug fix releases and distributions must update Qt more often. I hope that the long term release Qt 5.6 will improve the situation as that gives us also a chance to provide bug fixes and hopefully reach our users.

As it stands there seem to still be crash cases in Qt 5.6. Since recently I have a nice tool which should allow to mock multi screen setup and I want to try to dedicate some time this week to create test cases. If that works I hope we can move this forward.

Issues fixed months ago

Another problem we noticed is that fixes we created doesn’t reach our users. This is mostly to how some distributions work with the exception of rolling release distributions. They create a “stable” and “feature frozen” product. If a major version number increases it’s not going to be updated. I just described the problem with outdated Qt, that’s part of the story. The update didn’t get into the distribution and thus fixes don’t reach our users.

Even more with frameworks we don’t provide bug fix releases and that creates “problems” for distributions. They don’t roll our the new frameworks, although they fix important issues. This is slowly improving, distributions need to get used to this process and also accept that their policy doesn’t apply.

Furthermore one needs to point out that some distributions do have additional repositories to get newer software. If you use e.g. Kubuntu 15.04 I highly recommend to use that. Every non LTS release from Ubuntu is basically a testing distribution. If you use that you decided to go with faster software updates. Please use the ways to get those updates. If you don’t want that please stay with LTS.

This is something which applies to pretty much all distributions. Stay with the long term releases if you don’t want to update to newer software.

Removed features

What we also heard a lot lately are complains about that we removed features. Yes we did that, we streamlined some implementations, we decided to focus on the core, we decided to not port all X11 specific code, but allow 3rd party applications to fill that niche (e.g. SDDM or LightDM instead of KDM). Nevertheless let’s look at some of the complains.

Legacy systray

The biggest problem for our users seems to be the removal of support of the legacy system tray (xembed). This was not a move against our users, but a change we did not expect to cause so many problems. We did evaluate the situation prior to the release and saw that it was possible to do this step without loss of functionality. Nevertheless it created problems. How was that possible?

A key feature to the switch was getting the distributions in. Our distributions had to patch software in the same way as Ubuntu did. So we collected the required changes and informed distributions about it. The patches to Qt 4, which flags to enable in which repositories and so on.

With some distributions that worked awesomely, with some we had problems. In one distributions the GNOME maintainers refused to enable the required flags (boo), one distribution decided to not enable the Qt 4 patch because that’s not what they do, but they already had patches for arm64 (WTF? Support for a theoretical architecture is more important than users on existing hardware? Wtf, wtf!) If your distribution did not do the transition in a sufficient way, please complain to them and not to us.

Now there were things which got overlooked. Skype needed proper multi-arch and the package installation did not always work. Distros were informed about this problem once we noticed. Some proprietary applications used static linking destroying the “magic”. That’s unfortunate, but also not our fault. And also in the area of proprietary applications: wine applications have no fallback. Sorry about that, but we just were not aware of that and we didn’t get a feedback quickly enough.

We noticed that this is a problem and David started to work on a solution for Plasma 5.5. I’m not happy that we had to do that as it just delays solving the actual problem: we need to get rid of the old systray. I’m already seeing the bug reports about unusable systray icons on HighDPI or not working on Wayland. There is also one solution: work together to get this transition done. Bug your distributions to include all patches (if not done yet), bug your favorite projects to port. It’s about time!

Applets per virtual desktop and dashboard

Some of the feedback we got is that users are unhappy that they cannot have different applets and wallpaper per virtual desktop and also on dashboard. Yes we understand that this is seen as a regression, so let’s look at why we changed.

First of all: change is sometimes required also removing features is sometimes required. We are sorry for the users affected by such changes, but we normally don’t do that without good reason. We are extremely careful as you can see in the systray case where we coordinated with distributions months prior to the release to make the transition smooth.

For the features here in question we need to look back at the last iteration of Plasma. Some of the things we learned was that these features just didn’t really work, they were buggy and there were many conditions were it just didn’t really work. An example is KWin which just never really knew when Plasma is in dashboard mode, we failed to properly set up the state and it just resulted in lots of quirks. So when we looked at it for Plasma 5 we realized that it’s extremely close to another mode KWin could handle well: Show Desktop. So we merged those two and improved the Show Desktop experience at the same time.

Multiple virtual desktops in Plasma had always been a quirk. I remember one discussion shortly after starting to contribute to Plasma: we were not able to properly support this feature in Desktop Grid or Desktop Cube. Technically that was just impossible how Plasma worked. We were not able to solve this problem in all of Plasma 4 and we would not be able to solve it in Plasma 5. In addition it also caused problems with various other features, so overall it looked like a better idea to disable this feature in core Plasma. This goes along with the fact that we consider Virtual Desktops as a high productivity and experienced user feature. By default they are disabled.

Now we understand that this is a feature important to some users. But we need to ask you to understand that we cannot maintain all possible features for all possible users and that sometimes the quality of the complete product is more important. Furthermore Plasma is a flexible product and it should be possible to provide this as a 3rd party feature.

Not ported featured

There are a few more features which just didn’t make it to the new release. In many cases that’s because of the features do not have a maintainer. Examples for this are KHotkeys and the application menu. In both cases the port was not trivial due to changes in the underlying infrastructure, so we couldn’t manage it with our limited resources.

But that’s of course a chance for you, dear reader. If you are a developer and loved some of the features we were not able to provide, you can step up and maintain them. We are always looking for more help!

What was important to us is to not overload ourselves with more code than we can maintain. We want to provide a high quality product which we can ensure that it is high quality. This required to cut down in some areas. I think that was the correct decision and I think we reached our goal in providing a foundation for a high quality product.

Improved Workflows

Overall with our new product we put a focus on quality and adjusted our workflows to ensure that we have a high quality. As already mentioned we decided to focus on the core and by that focus only on what we can ensure to maintain. That’s just one aspect to it. Another aspect is the increased usage of automatic testing thanks to continuous integration done directly by KDE and also by our distributions. A big thanks to openSUSE for openQA and Kubuntu for the Kubuntu CI! Those two instances have caught many issues which slipped through our CI.

Additionally we release in shorter cycles (frameworks each month, Plasma every three months). This is only possible by ensuring quality through higher code review, requirements for auto tests, etc. A good overview can be found in sebas’s blog post on the topic.

Reporting the obvious bug

Yesterday we had a blog post on planetkde about issues with Plasma 5. There is one aspect which I want to pick out:

The bugs are so obvious that I’m sure they are all reported.

Don’t ever do that. If you think they are obvious that implies that also the devs see them. If the bugs are embarrassing to look at (like in this blog post mentioned not updating digital clock) you can be quite certain that the devs haven’t seen them. We use the system as well and come on if the clock doesn’t update we would notice. This implies now that the “obvious” bugs are not “obvious”. The devs are not seeing them.

Thus report them! Even those which are so clear to see that you ask yourself what the KDE devs have done to release software in that quality. Report your bugs, all of them. They are not obvious.