Looking at some crashers fixed this week

Some of the feedback we got, is that we should blog more about how we improve the quality, what kind of bugs we fixed. So today I want to do that with an in-depth explanation of four crash reports I looked at and fixed this week. All of them will go be part of the upcoming 5.4.3 release. All of them were related to QtQuick with three of them being caused by a problem in QtQuick and one caused by a workaround for a QtQuick problem. As I explained in my Monday blog post we are getting hit by issues in the libraries we use, in this case QtQuick.

Closing glxgears crashes KWin

The first issue was communicated to me through IRC. After a small investigation together with the user we figured out the condition to crash it and how to reproduce it:
1. Use an aurorae window decoration theme (e.g. Plastik)
2. Open glxgears
3. close glxgears through the close button

This got reported as Bug 346857 and was nothing new to us: we have had similar problems before, which makes it a little bit sad. The problem here is that glxgears doesn’t speak the close window protocol. Normally when you click the close button the window isn’t closed directly, but the window gets notified “please close your window”. So the mechanism is asynchronous. This also explains why such an issue has not been detected during the testing: it’s not happening with default decoration and it’s not happening with “normal” applications. It can only be reproduced with applications which do not behave correctly.

So what’s the difference in this case? KWin handles the destruction in a synchronous way which causes the decoration to be destroyed in direct result of the mouse click. When the decoration is destroyed the QtQuick scene driving the decoration is also destroyed and it looks like Qt doesn’t like that:

<code>#0  0x00007ffff50fd107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff50fe4e8 in __GI_abort () at abort.c:89
#2  0x00007ffff5de8291 in qt_message_fatal (context=..., message=...) at global/qlogging.cpp:1578
#3  0x00007ffff5de495c in QMessageLogger::fatal (this=0x7fffffff8ca0, msg=0x7ffff6104270 "ASSERT: \"%s\" in file %s, line %d") at global/qlogging.cpp:781
#4  0x00007ffff5dddb90 in qt_assert (assertion=0x7ffff67df129 "context() &amp;&amp; engine()", file=0x7ffff67df0e4 "qml/qqmlboundsignal.cpp", line=183) at global/qglobal.cpp:2966
#5  0x00007ffff6667fe7 in QQmlBoundSignalExpression::function (this=0xf56080) at qml/qqmlboundsignal.cpp:183
#6  0x00007ffff6667d4c in QQmlBoundSignalExpression::sourceLocation (this=0xf56080) at qml/qqmlboundsignal.cpp:155
#7  0x00007ffff663f1fb in QQmlData::destroyed (this=0x873d90, object=0xf54eb0) at qml/qqmlengine.cpp:1709
#8  0x00007ffff663cf6b in QQmlData::destroyed (d=0x873d90, o=0xf54eb0) at qml/qqmlengine.cpp:674
#9  0x00007ffff605c6e9 in QObject::~QObject (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at kernel/qobject.cpp:912
#10 0x00007ffff7265df9 in QQuickItem::~QQuickItem (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at items/qquickitem.cpp:2224
#11 0x00007ffff7324866 in QQuickMouseArea::~QQuickMouseArea (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at items/qquickmousearea.cpp:439
#12 0x00007ffff72d69c5 in QQmlPrivate::QQmlElement&lt;QQuickMouseArea&gt;::~QQmlElement (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#13 0x00007ffff72d69fa in QQmlPrivate::QQmlElement&lt;QQuickMouseArea&gt;::~QQmlElement (this=0xf54eb0, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#14 0x00007ffff605e516 in QObjectPrivate::deleteChildren (this=0xf54c90) at kernel/qobject.cpp:1946
#15 0x00007ffff605cb80 in QObject::~QObject (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at kernel/qobject.cpp:1024
#16 0x00007ffff7265df9 in QQuickItem::~QQuickItem (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at items/qquickitem.cpp:2224
#17 0x00007ffff72c3e22 in QQuickRectangle::~QQuickRectangle (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at items/qquickrectangle_p.h:128
#18 0x00007ffff72d6357 in QQmlPrivate::QQmlElement&lt;QQuickRectangle&gt;::~QQmlElement (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#19 0x00007ffff72d638c in QQmlPrivate::QQmlElement&lt;QQuickRectangle&gt;::~QQmlElement (this=0xf56050, __in_chrg=&lt;optimized out&gt;) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98
#20 0x00007ffff736a976 in QQuickView::~QQuickView (this=0x6b1980, __in_chrg=&lt;optimized out&gt;) at items/qquickview.cpp:225
#21 0x00007ffff736a9d2 in QQuickView::~QQuickView (this=0x6b1980, __in_chrg=&lt;optimized out&gt;) at items/qquickview.cpp:227
</code>

Now in order to never have that happen again I created a test application and reported a bugreport against Qt. And of course we worked around the problem by delaying the handling of the close to the next event cycle. Now you might wonder how it’s possible that we allowed such a regression to sneak in again given that we had seen it before? Well that’s easily explained, up until recently we were not able to run full tests against KWin. We might have been able to unit test this area, but it would have passed, this issue needed an integration test. And that’s what I added now, so that we won’t hit it ever again. It’s probably the weirdest auto test I have ever written, involving starting glxgears (thanks to our sysadmins for installing it for this test case!), simulating the mouse click on the close button and ensuring glxgears closes without crashing KWin.

Crash when opening window decorations configuration module

The second crash I run into directly after investigating the first one. I wanted to switch back to Breeze decoration but couldn’t because the config module crashed directly. This was bug 344278 – a really nasty one with 74 duplicates. We had wonderful crash traces, but missed the important part about how to reproduce it. From the crash trace alone we were not able to figure out what’s going on. Well we saw some aspects but it wasn’t enough to properly investigate.

Now alas I had a 100 % sure way to reproduce the problem and also understood what’s going on. So here the actual steps to reproduce:
1. Open Window decoration configuration menu
2. Download many themes, many of them, the more the better
3. Select a theme which name’s first latter is far away from “B”, e.g. Plastik.
4. Apply and quit
5. Open window decoration configuration menu again

And boom! What happens is that we load all decorations and put them into a ListView, then we select the theme the user is currently using and ensure it’s centered in the list view. This results in the list view scrolling. ListView has a cache of elements and if you have too many it will throw out other elements. So our Breeze deco which gets created at the start (early in alphabet, will be shown) is kicked out again when selecting Plastik if there are too many decorations. This is the requirement to have many themes installed and also to select one far down in the alphabet. With anything else you won’t trigger it.

The bug itself got triggered through Breeze decoration because there is a Property animation running which triggers another update on the decoration after it has been kicked out. It accesses a member which got destroyed by the QtQuick engine.

All of that also means that the crash would not have been triggered if that code would have been e.g. in Oxygen and not in Breeze. Because then it would not have hit. Also just scrolling in the list after it loaded even if you have many installed, won’t trigger the crash, because then the animation is no longer running. It’s only running after loading in the preview. What I want to show here is that this is an extremely hidden corner case to find. You must have exactly the same conditions to trigger it. Given that I’m also surprised that we have so many duplicates for the report.

So how to fix it? Obvious idea: one could disable the anyway not visible animation in Breeze. But that’s not a fix, that’s just a workaround and doesn’t solve the actual problem. Other decorations might trigger that again. So we need to understand what’s going on. What we knew is that the Decoration triggers an update and crashes because the DecorationBridge is no longer valid. But that’s impossible! The contract of the KDecoration API is that the DecorationBridge will always be valid if there’s a Decoration. So how could the contract break?

For this we need to look into how the QtQuick code for rendering the previews works. Each Decoration is wrapped by a PreviewItem and each decoration gets it’s own DecorationBridge provided by a PreviewBridge. The PreviewBridge is directly constructed through QtQuick, the Decoration is loaded later on. What is now interesting is the tear down. When the PreviewItem gets destroyed by QtQuick it doesn’t delete the Decoration directly, but delays it to the next event cycle. So the Decoration outlives the QtQuick items. When the PreviewItem gets destroyed by QtQuick also the PreviewBridge gets destroyed by QtQuick and thus we have a Decoration which is still alive, but the PreviewBridge isn’t. Why was it done that way? Well to prevent crashers caused by QtQuick. So our workaround for a crash just introduced a new one. Meh.

The solution now is to also delay the deconstruction of the PreviewBridge to the next event cycle. I hope this does not again create a new crash.

Crash when exiting Screens configuration module

The third issue I looked into was also reported to me in response to my Monday’s blog post. This one was supposed to be fixed in Qt 5.5, but wasn’t. It has a clear way to reproduce it:
1. Open Systemsettings
2. Go to “Display Configuration”
3. Click “All Settings” to go back to overview

According to our users that should crash it. Just it doesn’t. This raised my interest – also that it has 73 duplications which makes it rather important. I know the user who reported this to me and know that I can fully believe what he tells me. So this crash raised my interest. It also has a very interesting crash trace:

<code>#0 0x00007ffff2e48d59 in QQuickItemPrivate::addToDirtyList (this=0xdbdcc0) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:5610
#1 0x00007ffff2e48e43 in QQuickItemPrivate::dirty (this=0xdbdcc0, type=&lt;optimized out&gt;) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:5594
#2 0x00007ffff2e496cd in QQuickItem::update (this=0xdbdc40) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:4088
#3 0x00007ffff2e56c0d in QQuickItem::qt_static_metacall (_o=&lt;optimized out&gt;, _c=&lt;optimized out&gt;, _id=&lt;optimized out&gt;, _a=&lt;optimized out&gt;) at .moc/moc_qquickitem.cpp:597
#4 0x00007ffff45f2ae1 in QObject::event (this=this@entry=0xdbdc40, e=e@entry=0x7fffc41f80b0) at kernel/qobject.cpp:1239
#5 0x00007ffff2e53a63 in QQuickItem::event (this=0xdbdc40, ev=0x7fffc41f80b0) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:7294
#6 0x00007ffff6087d94 in QApplicationPrivate::notify_helper (this=this@entry=0x681dd0, receiver=receiver@entry=0xdbdc40, e=e@entry=0x7fffc41f80b0) at kernel/qapplication.cpp:3717
#7 0x00007ffff608d2c8 in QApplication::notify (this=0x7fffffffe4a0, receiver=0xdbdc40, e=0x7fffc41f80b0) at kernel/qapplication.cpp:3500
#8 0x00007ffff45c49dc in QCoreApplication::notifyInternal (this=0x7fffffffe4a0, receiver=0xdbdc40, event=event@entry=0x7fffc41f80b0) at kernel/qcoreapplication.cpp:965
#9 0x00007ffff45c7dea in sendEvent (event=0x7fffc41f80b0, receiver=&lt;optimized out&gt;) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:224
#10 QCoreApplicationPrivate::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0, data=0x681430) at kernel/qcoreapplication.cpp:1593
#11 0x00007ffff45c8230 in QCoreApplication::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0) at kernel/qcoreapplication.cpp:1451
#12 0x00007ffff4617f63 in postEventSourceDispatch (s=0x6d7aa0) at kernel/qeventdispatcher_glib.cpp:271
#13 0x00007fffefcce9fd in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#14 0x00007fffefccece0 in ?? () from /usr/lib/libglib-2.0.so.0
#15 0x00007fffefcced8c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#16 0x00007ffff4617fd7 in QEventDispatcherGlib::processEvents (this=0x6d5850, flags=...) at kernel/qeventdispatcher_glib.cpp:418
#17 0x00007ffff45c339a in QEventLoop::exec (this=this@entry=0x7fffffffe380, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#18 0x00007ffff45cb23c in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1229
#19 0x00007ffff592cbf4 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1528
#20 0x00007ffff6084bb5 in QApplication::exec () at kernel/qapplication.cpp:2977 #21 0x000000000040f52b in main (argc=1, argv=&lt;optimized out&gt;) at /mnt/AUR/systemsettings-git/src/systemsettings/app/main.cpp:55
</code>

The interesting part here is that it’s completely inside Qt. We come from the event loop and an event is handled inside Qt and crashes. No code of the control module is executed in this trace any more.

But why am I not able to reproduce? After all there are so many users hitting it. I have the same Qt version, so it should crash. That I figured it out was pure chance. I remembered that the user has an NVIDIA system and I verified that this is still the case. I have an Intel system. So why is that important? For NVIDIA QtQuick uses threaded rendering, while for Mesa it uses the main gui thread for rendering. I had seen in the past that with threaded rendering destruction might be moved to the next event cycle. So easy thing to test:
QSG_RENDER_LOOP=threaded systemsettings5 and follow the reproduction steps and boom! Yay, I have a test case. Following the old saying: “consider a bug fixed when a developer is able to reproduce” the hardest way was solved.

So I started investigating. Let’s try with kcmshell5 instead of systemsettings. Hmm doesn’t crash. Let’s try with kscreen’s test application instead of systemsettings. Hmm, doesn’t crash. So something in systemsettings must trigger it! And I started reading code and read and read, tried here something, tried there something and come to the conclusion: systemsettings is not doing anything wrong.

Given that it must be the QtQuick code of the screen configuration. After some trials I had a minimal derivation to the QtQuick code which didn’t crash any more. Here again helped previous experience with QtQuick related crashers: I saw some usage of QtGraphicalEffects and remembered that this one used to crash KWin with the Breeze Aurorae Theme prior to Plasma 5.2. Removing the OpacityMask didn’t crash. Yay! So how to get that into a fix? The solution was actually in the debug output of QtQuick:
QSGDefaultLayer::bind: ShaderEffectSource: ‘recursive’ must be set to true when rendering recursively.

So obviously I set this missing recursive on the OpacityMask. Hmm, compile error, that doesn’t exist. So let’s make it non recursive and there we go, it doesn’t crash any more.

I would love to report this to Qt, but so far I failed with creating a simple test case. Just like with my testing with kcmshell5 and the kscreen test application I’m not able to hit the condition and I haven’t figured out yet what is different in systemsettings.

Opening effects configuration twice crashes

The last issue to look at was triggered through the previous issue. During review it was pointed out that there are more QtQuick configuration modules which crash in similar ways. So I tried them all and hit a crash if:
1. Open Systemsettings
2. Go to Desktop Behavior
3. Click Desktop Effects
4. Click All Settings
5. Repeat steps 2 and 3

Again a very interesting back trace:

<code>#0 0x00007ffff28e7107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff28e84e8 in __GI_abort () at abort.c:89
#2 0x00007ffff35d2291 in qt_message_fatal (context=..., message=...) at global/qlogging.cpp:1578
#3 0x00007ffff35ce95c in QMessageLogger::fatal (this=0x7fffffff3920, msg=0x7ffff38ee270 "ASSERT: \"%s\" in file %s, line %d") at global/qlogging.cpp:781
#4 0x00007ffff35c7b90 in qt_assert (assertion=0x7ffff1f9276a "value.isString()", file=0x7ffff1f92710 "jsruntime/qv4runtime.cpp", line=439) at global/qglobal.cpp:2966
#5 0x00007ffff1ddb5eb in QV4::RuntimeHelpers::convertToObject (engine=0x139a400, value=...) at jsruntime/qv4runtime.cpp:439
#6 0x00007ffff1ddcc6e in QV4::Runtime::getProperty (engine=0x139a400, object=..., nameIndex=136) at jsruntime/qv4runtime.cpp:682
#7 0x00007ffff1dca362 in QV4::Moth::VME::run (this=0x7fffffff41d7, engine=0x139a400, code=0x7fffcc15f830 "\357\230\334\361\377\177", storeJumpTable=0x0) at jsruntime/qv4vme_moth.cpp:487
#8 0x00007ffff1dce656 in QV4::Moth::VME::exec (engine=0x139a400, code=0x7fffcc15f6c8 "\366\251\334\361\377\177") at jsruntime/qv4vme_moth.cpp:925
#9 0x00007ffff1d58eb3 in QV4::SimpleScriptFunction::call (that=0x7fffd3424010, callData=0x7fffd3424018) at jsruntime/qv4functionobject.cpp:564
#10 0x00007ffff1c93e14 in QV4::Object::call (this=0x7fffd3424010, d=0x7fffd3424018) at ../../include/QtQml/5.5.1/QtQml/private/../../../../../src/qml/jsruntime/qv4object_p.h:305
#11 0x00007ffff1e93cac in QQmlJavaScriptExpression::evaluate (this=0x2b7a690, context=0x1392ad0, function=..., callData=0x7fffd3424018, isUndefined=0x7fffffff4553) at qml/qqmljavascriptexpression.cpp:158
#12 0x00007ffff1e939a5 in QQmlJavaScriptExpression::evaluate (this=0x2b7a690, context=0x1392ad0, function=..., isUndefined=0x7fffffff4553) at qml/qqmljavascriptexpression.cpp:116
#13 0x00007ffff1e9c484 in QQmlBinding::update (this=0x2b7a670, flags=...) at qml/qqmlbinding.cpp:194
#14 0x00007ffff1e9cfac in QQmlBinding::update (this=0x2b7a670) at qml/qqmlbinding_p.h:97
#15 0x00007ffff1e9cab2 in QQmlBinding::expressionChanged (e=0x2b7a690) at qml/qqmlbinding.cpp:260
#16 0x00007ffff1e94d67 in QQmlJavaScriptExpressionGuard_callback (e=0x15ddbb0) at qml/qqmljavascriptexpression.cpp:361
#17 0x00007ffff1e72d0f in QQmlNotifier::emitNotify (endpoint=0x0, a=0x0) at qml/qqmlnotifier.cpp:94
#18 0x00007ffff1dfb3f6 in QQmlData::signalEmitted (object=0x33914f0, index=30, a=0x0) at qml/qqmlengine.cpp:763
#19 0x00007ffff384d31e in QMetaObject::activate (sender=0x33914f0, signalOffset=29, local_signal_index=1, argv=0x0) at kernel/qobject.cpp:3599
#20 0x00007ffff1df7906 in QQmlVMEMetaObject::activate (this=0x3391720, object=0x33914f0, index=44, args=0x0) at qml/qqmlvmemetaobject.cpp:1325
#21 0x00007ffff1df5436 in QQmlVMEMetaObject::metaCall (this=0x3391720, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at qml/qqmlvmemetaobject.cpp:841
#22 0x00007ffff1bb7432 in QAbstractDynamicMetaObject::metaCall (this=0x3391720, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at /home/martin/src/qt5/qtbase/include/QtCore/5.5.1/QtCore/private/../../../../../src/corelib/kernel/qobject_p.h:421
#23 0x00007ffff1df5e91 in QQmlVMEMetaObject::metaCall (this=0x2ca1900, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at qml/qqmlvmemetaobject.cpp:969
#24 0x00007ffff1bb7432 in QAbstractDynamicMetaObject::metaCall (this=0x2ca1900, c=QMetaObject::WriteProperty, _id=42, a=0x7fffffff6860) at /home/martin/src/qt5/qtbase/include/QtCore/5.5.1/QtCore/private/../../../../../src/corelib/kernel/qobject_p.h:421
#25 0x00007ffff3817ce1 in QMetaObject::metacall (object=0x33914f0, cl=QMetaObject::WriteProperty, idx=42, argv=0x7fffffff6860) at kernel/qmetaobject.cpp:294
#26 0x00007ffff1e14605 in QQmlPropertyPrivate::write (object=0x33914f0, property=..., value=..., context=0x3391390, flags=...) at qml/qqmlproperty.cpp:1308
#27 0x00007ffff1e13f47 in QQmlPropertyPrivate::writeValueProperty (object=0x33914f0, core=..., value=..., context=0x3391390, flags=...) at qml/qqmlproperty.cpp:1237
#28 0x00007ffff1e163ab in QQmlPropertyPrivate::writeBinding (object=0x33914f0, core=..., context=0x3391390, expression=0x3391bd0, result=..., isUndefined=false, flags=...) at qml/qqmlproperty.cpp:1597
#29 0x00007ffff1e9c567 in QQmlBinding::update (this=0x3391bb0, flags=...) at qml/qqmlbinding.cpp:198
#30 0x00007ffff1e9cfac in QQmlBinding::update (this=0x3391bb0) at qml/qqmlbinding_p.h:97
#31 0x00007ffff1e9cab2 in QQmlBinding::expressionChanged (e=0x3391bd0) at qml/qqmlbinding.cpp:260
#32 0x00007ffff1e94d67 in QQmlJavaScriptExpressionGuard_callback (e=0x15dd948) at qml/qqmljavascriptexpression.cpp:361
#33 0x00007ffff1e72d0f in QQmlNotifier::emitNotify (endpoint=0x0, a=0x0) at qml/qqmlnotifier.cpp:94
#34 0x00007ffff1dfb3f6 in QQmlData::signalEmitted (object=0x3391fc0, index=31, a=0x0) at qml/qqmlengine.cpp:763
#35 0x00007ffff384d31e in QMetaObject::activate (sender=0x3391fc0, signalOffset=31, local_signal_index=0, argv=0x0) at kernel/qobject.cpp:3599
#36 0x00007ffff384d120 in QMetaObject::activate (sender=0x3391fc0, m=0x7ffff2688b20 &lt;QQuickLoader::staticMetaObject&gt;, local_signal_index=0, argv=0x0) at kernel/qobject.cpp:3578
#37 0x00007ffff2432687 in QQuickLoader::itemChanged (this=0x3391fc0) at .moc/moc_qquickloader_p.cpp:321
#38 0x00007ffff2431370 in QQuickLoaderPrivate::incubatorStateChanged (this=0x2ca0700, status=QQmlIncubator::Ready) at items/qquickloader.cpp:666
#39 0x00007ffff24312e4 in QQuickLoaderIncubator::statusChanged (this=0x30f9e70, status=QQmlIncubator::Ready) at items/qquickloader.cpp:654
#40 0x00007ffff1e1f2ea in QQmlIncubatorPrivate::changeStatus (this=0x30f9e90, s=QQmlIncubator::Ready) at qml/qqmlincubator.cpp:701
#41 0x00007ffff1e1eab7 in QQmlIncubatorPrivate::incubate (this=0x30f9e90, i=...) at qml/qqmlincubator.cpp:368
#42 0x00007ffff1e1dd0e in QQmlEnginePrivate::incubate (this=0x13882d0, i=..., forContext=0x30f9db0) at qml/qqmlincubator.cpp:87
#43 0x00007ffff1e1a6d3 in QQmlComponent::create (this=0x15bfe90, incubator=..., context=0x2ca7160, forContext=0x0) at qml/qqmlcomponent.cpp:1068
#44 0x00007ffff24317a4 in QQuickLoaderPrivate::_q_sourceLoaded (this=0x2ca0700) at items/qquickloader.cpp:714
#45 0x00007ffff2430f10 in QQuickLoaderPrivate::load (this=0x2ca0700) at items/qquickloader.cpp:597
#46 0x00007ffff24319bf in QQuickLoader::componentComplete (this=0x3391fc0) at items/qquickloader.cpp:806
#47 0x00007ffff1ead859 in QQmlObjectCreator::finalize (this=0x31869c0, interrupt=...) at qml/qqmlobjectcreator.cpp:1207
#48 0x00007ffff1e1a094 in QQmlComponentPrivate::complete (enginePriv=0x13882d0, state=0x30b89a0) at qml/qqmlcomponent.cpp:928
#49 0x00007ffff1e1a17c in QQmlComponentPrivate::completeCreate (this=0x30b8900) at qml/qqmlcomponent.cpp:964
#50 0x00007ffff1e1a12c in QQmlComponent::completeCreate (this=0x1544040) at qml/qqmlcomponent.cpp:957
#51 0x00007ffff1e19953 in QQmlComponent::create (this=0x1544040, context=0x2bdd5f0) at qml/qqmlcomponent.cpp:791
#52 0x00007ffff24396e6 in QQuickView::continueExecute (this=0x134d440) at items/qquickview.cpp:476
#53 0x00007ffff2438617 in QQuickViewPrivate::execute (this=0x15c38c0) at items/qquickview.cpp:124
#54 0x00007ffff2438a2c in QQuickView::setSource (this=0x134d440, url=...) at items/qquickview.cpp:253
#55 0x00007fffd5c20a65 in KWin::Compositing::EffectView::init (this=0x134d440, type=KWin::Compositing::EffectView::DesktopEffectsView) at /home/martin/src/kf5/kde/workspace/kwin/kcmkwin/kwincompositing/model.cpp:613
</code>

Like in the previous example it’s a crash deep down in Qt. The first code related to code we distribute is at stack position 55. So again I needed to experiment to figure out the minimal code which triggers the crash and after some trials I figured out what causes it: setting root context properties. After reworking this to no longer needing a context property, it doesn’t crash any more.

Again this should be reported to Qt, bug so far I failed to create a simple example demonstrating the problem Of course setting a root context property works in my examples. If one of my readers have an idea what’s so special about systemsettings to trigger it, please let me know so that I can report bugs against Qt.

Summary

As we can see with these four cases: we are hit by issues further down in the stack. We can work around them, but this comes with a cost as it doesn’t fix the actual problem. Some of the problems are real corner cases, hardware dependent and cannot be reproduced by all developers.

Some thoughts on the quality of Plasma 5

Last week we got quite some criticism about the quality of KDE Plasma 5 on the Internet. This came rather surprising for us and is at least in my opinion highly undeserved. So far what we saw is that Plasma has a high quality – probably better than previous iterations of what was known as the KDE Desktop Environment – and got lots of praise for the state it is in. So how come that there is such a discrepancy between what we see and what our users see?

The chain breaks at the weakest link

Plasma is just one piece of the user experience our users get. It’s the most visible one and it’s also the one which gets all the blame if things break. But in reality there is much more to it. Plasma depends on other libraries – most importantly Qt – and on drivers (OpenGL/Mesa) and on hardware. All of that is put together by distributions. We don’t ship the software to our users, we ship it to the distributions which then integrate it with the rest of the stack. All we can do is give recommendations to our distributions about common issues and how to prevent that they happen. Now I don’t want to move the blame to the distributions, because that would also be undeserved. I just want to explain how complex the process is.

Some numbers

So let’s start with a look at some numbers. I’m looking at a combination of the applications plasmashell, kwin, krunner and systemsettings. These are the most visible to the users. The numbers will be slightly distorted, because there were also bugs going in for the old 4.11 series and only plasmashell is a 5 only product.

Over the last 365 days 1313 crash reports were reported for these products. That’s quite a number, but means it’s only about 3.5 per day. Given our large user bases that’s not that bad. If we had a huge quality problem we would get hundreds of bug reports per day. And yes we do, I had been in such situations in KWin that we got two digits numbers of duplicates per day.

Of those 1313 crash reports 127 are still unconfirmed (we need to improve here!) and 23 confirmed. The number looks high, but now we need to look at how they distribute. There are a few bugs with a high number of duplicates. Given my limited search skills I found one with 91 duplicates which got a fix released six months ago, but we still see new duplicates for it. I come back to how that happens later. A few bugs reported several times and we get close to the numbers. I just summed up the number of the ten most often reported crashers and that’s already about half of the reported crashes.

So why are there crashes reported so often and are not immediately fixed? This is exactly the weakest link I have been talking about. It’s things we don’t have an influence on.

Graphics drivers

One of the most severe issues we hit with Plasma 5 is an issue with the Intel graphics drivers. I think that this problem is the reason why people complain about instability of Plasma. The only reason. Crashes in drivers (especially Intel) are nothing new, it’s a problem which has haunted us for years. For example the most often reported crash against KWin is a crash in the Intel driver – we worked around that issue by disabling a feature for all Intel users and will never ever enable it again.

Now I know that this sounds like passing the blame to the graphics drivers and I can imagine that you say that we need to QA test. That is true, we need to QA test, also against the drivers. Just that’s not possible. I blogged about why that’s not possible back in 2011. Nothing has changed except that the number of possible hardware and driver combinations further increased.

The biggest problem for us is that the drivers a distro will ship doesn’t exist yet when we develop and test our software. Even if we would use development drivers it would be too old compared to what distros ship.

One could say it’s the responsibility of the distribution to do the QA. Yes, they as a software integrator should make sure that the software works together. And they do, but lots of it runs in VMs which don’t provide the “real” hardware. But expecting distributions to do all the combinations we cannot test, is also wrong. If there is one which can do the quality assurance of e.g. the Intel driver it’s Intel. They have the hardware, they have the developers (AFAIK there are more fulltime devs working on the Intel drivers than we have devs on Plasma), they have the QA team. I do not know how they test, but I am sure they could spot such severe driver regressions.

What to do when we hit such a severe driver issue? Well that’s difficult. If possible we can workaround like the KWin problem I mentioned above. But that still takes time till it reaches the users. In this particular case we informed distributions about a possible workaround on Xserver level and I hope ll distributions applied it. If not please complain to your distribution.

Multi Screen woes

Another source of reported instability is multi screen handling on X11. Again the problem does not lie on our side but in this case Qt. First a word of clarification: auto-testing multi-screen on X11 is extremely difficult. Virtual X servers like Xvfb do not support the randr extension which makes it impossible to mock a correct behavior in X. Also testing with real X doesn’t help as that can only test with the screens one has and physically plugging out a screen during an auto-test isn’t really a solution.

So this sounds like blaming Qt, which of course is not what I want to do. You can rightly question why we as KDE do not help Qt with it. Well we did. Especially Dan Vratil did an incredible job of improving the experience directly in Qt. There were things which one wouldn’t believe are possible. For example there is an xrandr call to read the configuration and that blocks the Xserver in the Intel driver. Now each Qt application did that and caused a freeze. When I figured out the root source for this problem I created a remote denial of service proof of concept against X server and informed the security list about this problem. I did this in January and haven’t got a reply till now and have not talked in public about this. So yeah I just did a full-disclosure. Anyway Dan worked around this problem and fixed many more.

So why is this all still an issue? This is slightly related to how Qt and Plasma releases are not synced. For example at the moment Qt 5.5 will not receive any bug fix releases any more, but all we have in distributions is Qt 5.4. This means any fixes we do will not reach users till distributions roll out Qt 5.6. It’s rather depressive to be honest. You know you fixed an issue, but it doesn’t reach your users. I think this needs works from both Qt and the distributions. We need more bug fix releases and distributions must update Qt more often. I hope that the long term release Qt 5.6 will improve the situation as that gives us also a chance to provide bug fixes and hopefully reach our users.

As it stands there seem to still be crash cases in Qt 5.6. Since recently I have a nice tool which should allow to mock multi screen setup and I want to try to dedicate some time this week to create test cases. If that works I hope we can move this forward.

Issues fixed months ago

Another problem we noticed is that fixes we created doesn’t reach our users. This is mostly to how some distributions work with the exception of rolling release distributions. They create a “stable” and “feature frozen” product. If a major version number increases it’s not going to be updated. I just described the problem with outdated Qt, that’s part of the story. The update didn’t get into the distribution and thus fixes don’t reach our users.

Even more with frameworks we don’t provide bug fix releases and that creates “problems” for distributions. They don’t roll our the new frameworks, although they fix important issues. This is slowly improving, distributions need to get used to this process and also accept that their policy doesn’t apply.

Furthermore one needs to point out that some distributions do have additional repositories to get newer software. If you use e.g. Kubuntu 15.04 I highly recommend to use that. Every non LTS release from Ubuntu is basically a testing distribution. If you use that you decided to go with faster software updates. Please use the ways to get those updates. If you don’t want that please stay with LTS.

This is something which applies to pretty much all distributions. Stay with the long term releases if you don’t want to update to newer software.

Removed features

What we also heard a lot lately are complains about that we removed features. Yes we did that, we streamlined some implementations, we decided to focus on the core, we decided to not port all X11 specific code, but allow 3rd party applications to fill that niche (e.g. SDDM or LightDM instead of KDM). Nevertheless let’s look at some of the complains.

Legacy systray

The biggest problem for our users seems to be the removal of support of the legacy system tray (xembed). This was not a move against our users, but a change we did not expect to cause so many problems. We did evaluate the situation prior to the release and saw that it was possible to do this step without loss of functionality. Nevertheless it created problems. How was that possible?

A key feature to the switch was getting the distributions in. Our distributions had to patch software in the same way as Ubuntu did. So we collected the required changes and informed distributions about it. The patches to Qt 4, which flags to enable in which repositories and so on.

With some distributions that worked awesomely, with some we had problems. In one distributions the GNOME maintainers refused to enable the required flags (boo), one distribution decided to not enable the Qt 4 patch because that’s not what they do, but they already had patches for arm64 (WTF? Support for a theoretical architecture is more important than users on existing hardware? Wtf, wtf!) If your distribution did not do the transition in a sufficient way, please complain to them and not to us.

Now there were things which got overlooked. Skype needed proper multi-arch and the package installation did not always work. Distros were informed about this problem once we noticed. Some proprietary applications used static linking destroying the “magic”. That’s unfortunate, but also not our fault. And also in the area of proprietary applications: wine applications have no fallback. Sorry about that, but we just were not aware of that and we didn’t get a feedback quickly enough.

We noticed that this is a problem and David started to work on a solution for Plasma 5.5. I’m not happy that we had to do that as it just delays solving the actual problem: we need to get rid of the old systray. I’m already seeing the bug reports about unusable systray icons on HighDPI or not working on Wayland. There is also one solution: work together to get this transition done. Bug your distributions to include all patches (if not done yet), bug your favorite projects to port. It’s about time!

Applets per virtual desktop and dashboard

Some of the feedback we got is that users are unhappy that they cannot have different applets and wallpaper per virtual desktop and also on dashboard. Yes we understand that this is seen as a regression, so let’s look at why we changed.

First of all: change is sometimes required also removing features is sometimes required. We are sorry for the users affected by such changes, but we normally don’t do that without good reason. We are extremely careful as you can see in the systray case where we coordinated with distributions months prior to the release to make the transition smooth.

For the features here in question we need to look back at the last iteration of Plasma. Some of the things we learned was that these features just didn’t really work, they were buggy and there were many conditions were it just didn’t really work. An example is KWin which just never really knew when Plasma is in dashboard mode, we failed to properly set up the state and it just resulted in lots of quirks. So when we looked at it for Plasma 5 we realized that it’s extremely close to another mode KWin could handle well: Show Desktop. So we merged those two and improved the Show Desktop experience at the same time.

Multiple virtual desktops in Plasma had always been a quirk. I remember one discussion shortly after starting to contribute to Plasma: we were not able to properly support this feature in Desktop Grid or Desktop Cube. Technically that was just impossible how Plasma worked. We were not able to solve this problem in all of Plasma 4 and we would not be able to solve it in Plasma 5. In addition it also caused problems with various other features, so overall it looked like a better idea to disable this feature in core Plasma. This goes along with the fact that we consider Virtual Desktops as a high productivity and experienced user feature. By default they are disabled.

Now we understand that this is a feature important to some users. But we need to ask you to understand that we cannot maintain all possible features for all possible users and that sometimes the quality of the complete product is more important. Furthermore Plasma is a flexible product and it should be possible to provide this as a 3rd party feature.

Not ported featured

There are a few more features which just didn’t make it to the new release. In many cases that’s because of the features do not have a maintainer. Examples for this are KHotkeys and the application menu. In both cases the port was not trivial due to changes in the underlying infrastructure, so we couldn’t manage it with our limited resources.

But that’s of course a chance for you, dear reader. If you are a developer and loved some of the features we were not able to provide, you can step up and maintain them. We are always looking for more help!

What was important to us is to not overload ourselves with more code than we can maintain. We want to provide a high quality product which we can ensure that it is high quality. This required to cut down in some areas. I think that was the correct decision and I think we reached our goal in providing a foundation for a high quality product.

Improved Workflows

Overall with our new product we put a focus on quality and adjusted our workflows to ensure that we have a high quality. As already mentioned we decided to focus on the core and by that focus only on what we can ensure to maintain. That’s just one aspect to it. Another aspect is the increased usage of automatic testing thanks to continuous integration done directly by KDE and also by our distributions. A big thanks to openSUSE for openQA and Kubuntu for the Kubuntu CI! Those two instances have caught many issues which slipped through our CI.

Additionally we release in shorter cycles (frameworks each month, Plasma every three months). This is only possible by ensuring quality through higher code review, requirements for auto tests, etc. A good overview can be found in sebas’s blog post on the topic.

Reporting the obvious bug

Yesterday we had a blog post on planetkde about issues with Plasma 5. There is one aspect which I want to pick out:

The bugs are so obvious that I’m sure they are all reported.

Don’t ever do that. If you think they are obvious that implies that also the devs see them. If the bugs are embarrassing to look at (like in this blog post mentioned not updating digital clock) you can be quite certain that the devs haven’t seen them. We use the system as well and come on if the clock doesn’t update we would notice. This implies now that the “obvious” bugs are not “obvious”. The devs are not seeing them.

Thus report them! Even those which are so clear to see that you ask yourself what the KDE devs have done to release software in that quality. Report your bugs, all of them. They are not obvious.

At QtWorldSummit

Last week I attended QtWorldSummit in Berlin to help represent KDE manning the booth and chairing some sessions. I want to thank KDE e.V. for the support to go to this great event. It is awesome to see such a huge and vibrant Qt community and to see KDE as an important part. So many talks and also keynotes have speakers with a KDE background.

KDE's booth at QtWorldSummit
KDE’s booth at QtWorldSummit

At our booth we had a clear focus on the frameworks – understandable given that it’s what is most interesting to Qt developers. But of course also many guests also asked us about other applications and Plasma. What was clearly visible at the conference was that many booths had their demo run on Plasma powered systems and in many cases it was already a Plasma 5.

From the talks I attended I noticed that Wayland is a really big thing in the Qt community. Be it smart TVs or in car entertainment systems or even the driver’s dashboard: they run Wayland and use the QtCompositor API for it. This is quite exciting to see that QtCompositor seems to be a great fit for such use cases (while it isn’t for KWin). What’s even more exciting is to learn that such systems work without having to fall back to Android drivers and solutions like libybris. For the ecosystem this is great to see: Wayland is entering the embedded world and it’s Qt which is used for it. I haven’t seen anyone talking about using X11, Android or Mir for these cases which gives us a very strong and united ecosystem. Very exiting times are in front of us with lots of opportunities also for KDE as if QtCompositor isn’t sufficient we can jump in with either KWayland or even KWin.

September update for Plasma’s Wayland porting

September was a busy months in the KDE Wayland world. We have worked hard to bring Plasma closer to a workable system and could cross off some very important milestones.

Transient window positioning

One of the biggest oddities when trying out Plasma on Wayland in the 5.4 release is the fact that menus open at random positions. The reason for this is that KWin applied it’s placing strategy on it and ignored the hints provided by the window. We have now implemented support for the so-called transient windows in KWayland and use the provided placement hint in KWayland. So now all menus open at the correct position. A useable Wayland session is much closer now.

Plasma/KWin specific extensions

Marco did quite some work for the integration of Plasma. KWin provides some Plasma specific extensions like the sliding popups effect, blur and background contrast effect, etc. We have an abstraction through KWindowSystem so that applications do not have to use low level X11 calls directly. Now we extended this abstraction to also Wayland: if the application uses the API it will work on both X11 and Wayland. Granted it will only work with compositors providing the specific protocols, but that is no difference to X11. Also there the compositor needs to implement the custom protocol.

On the client side the protocol is implemented in KWayland and the integration for KWindowSystem is provided through
a plugin provided by the kwayland-integration repository. On KWin side the protocol is also provided in KWayland allowing a very easy to use API. Of course also KWin needed small adjustments in the effects to announce support for the protocol and read the information provided by the windows. Thanks to nice wrapping in KWayland the code is cleaner and simpler than in the X11 case.

Support for multiple X Servers

A change not directly relevant for KWin went into KWindowSystem and will be released with the upcoming 5.15 KDE Frameworks release. KWindowSystem provides an X11 API which looks like it supports multiple X Servers (e.g. one application talking to multiple servers), but that has never worked as it fetched required atoms only on first connection. We refactored the relevant code to no longer have this limitation.

Current Plasma in a nestedKWin/Wayland session with grabbed input
Current Plasma in a nested KWin/Wayland session with grabbed input

Granted normally applications do not talk to multiple X servers, but there is a mode in KWin which uses just that: a nested kwin_wayland on X11. To explain: it needs to talk to the host X server for rendering two (one server) and it starts it’s own Xwayland server (second server). In case you have ever wondered why the nested kwin_wayland window released with Plasma 5.4 neither has a window icon nor a window title: that’s the reason. We couldn’t use Qt’s abstraction (wrong QPA plugin) and also not KWindowSystem as we needed to make sure the atoms get resolved for the Xwayland server. Now with this restriction removed the window has an icon and a title. Even more I added a “grab input mode” as one might know from virtual machines. While it’s easy to implement I didn’t want to implement it without having a way to tell the user what happened and what the current grab state is.

Preparing KWin for the cloud

The most exiting new feature in my opinion was born last Friday based on frustrations about testing KWin Wayland. Last week Marco and I spent quite some time investigating a few regressions (as it turned out due to adding transient window support). The way to test it was just uncomfortable. We had a test case but it meant starting KWin, waiting till it’s started, start the test application, perform some clicks and interpret whether it worked. Once we fixed that issue I started to look at a crash and the process was similar annoying. What I wanted was a way to automate the test condition, that is an autotest which we could even run in our CI system.

So on Friday I decided to dedicate my development time on a virtual framebuffer backend.This backend (to start use kwin_wayland --xwayland --virtual) doesn’t render to any device, but only “simulates” rendering by using a QImage which then isn’t used at all. Well not completely true: there is an environment variable to force the backend to store each rendered frame into a temporary directory.

Why is such a virtual backend so exiting? Well it means we can run KWin anywhere. We are not bound to any hardware restrictions like screen attached or screen resolution. With other words we can run it on servers – in the cloud. The first such instance runs on our CI servers in the form of an automated integration test. And in future there will be much more such tests.

But that is not only interesting for KWin to have it’s autotests, it’s also interesting for all other projects of the workspace as we have now a virtual Wayland server which is functional identical to the one we use. We also have a better virtual X server now as we have Xwayland instead of Xvfb (e.g. support for XRandR extension).

Once I had it implemented ideas came to me for improving our CI system: we could use it for something like OpenQA (or integrate the existing tool) and start a complete Plasma session and screenshot various points (if that sounds like an exiting project: please contact me, also if you want to do that as a Season of KDE project ).

Or integrate a remote rendering solution (e.g. VNC, rdesktop, spice, html5) and run a complete session through the web. That could be a very interesting feature for designers, translators, supporters and many other non-developers. Get the latest state of the code directly tested. We have things like Kubuntu CI which make it easy to test in a virtual machine, but wouldn’t it be even more awesome to just run the latest build of the CI system in the browser?

KWin Tests

With the help of this virtual backend we are now able to start a “complete” KWin in the autotests. This allows us to very precisely test whether a specif feature works as expected. Unit tests are great, but sometimes one wants to test the complete integration and that’s now possible.

Working auto completion tooltip in Kate on KWin/Wayland
Working auto completion tooltip in Kate on KWin/Wayland

The first problem addressed with this new possibility was a bug noticed while writing this blog post. I used Kate on Wayland and the tooltips got keyboard focus. So now we have an autotest which ensures this case works.

Lock screen security of phones

These days we can read a lot about a lock screen vulnerability in the Android system. Given that I have spent quite some thought on how we can use Plasma’s lock screen on our phone system I take the incident as an opportunity to share some thoughts about the topic. The tldr is “much ado about nothing”.

In Plasma we have an in my opinion rather secure infrastructure for the lock screen. Of course it suffers from the general problems of X11, but once it’s ported to Wayland it will be truly secure (till the first exploit is found). Given that I would like to use our lock screen architecture also on the phone. It’s secure by not letting anyone in even if the lock screen crashes (one of the problems hit in the Android exploit), by ensuring nothing else is rendered and no input is passed to any other application. So awesome! It will be secure!

But on second look we notice that the requirements on phone and desktop are different. On a phone we need to allow a few exceptions:

  • Accept phone calls even if screen is locked
  • Interact with notifications (e.g. alarm clock)
  • Allow emergency phone calls

The last item is also an important part of the puzzle for the Android exploit. These exceptions directly conflict with the requirements for our lock screen on the desktop. To quote:

Blocking input devices, so that an attacker cannot interact with the running session

It allows interacting with the running session (even more with the hardware) and it doesn’t block input devices any more.

I have over the last months spent quite some time thinking about how we can combine these requirements without compromising the security and so far I haven’t come to a sufficient solution. All I see is that if we allow applications (e.g. phone app) to bypass the lock screen, we in truth add a hole into the architecture and if there is a hole you can get through it. There will be ways to bypass the security then. No point in fooling ourselves. A phone app is not designed for the secure requirements of a lock screen.

Now phone calls are not all we need to care about on a lock screen – this could be solved by e.g. integrating the functionality into the greeter app. Users might want to take photos without having to unlock the screen (another piece of the Android exploit). It’s from a security perspective a questionable feature, but I can understand why it got added. Now this feature directly adds a huge hole into it: it writes to the file system. I can easily imagine ways to bypass the lock screen from a camera app, get to the file system, etc.

At this point we need to take a step back and think about what we want to achieve with a lock screen. On the desktop it’s clear: if there is a keyboard somewhere you should not be able to penetrate the session even if you have hours to try. But on a phone? Does this requirement hold? If I have the chance to unattended attack the lock screen of a phone, it means I own it. For desktop hardware we can say that the lock screen doesn’t protect against screw drivers. This also holds for phones. If one has enough time, it’s unlikely that one can keep the attacker out and the lock screen is most likely not the weakest link in the chain. Phones have things like finger print readers (easy to break), various easy to reverse construct passphrase systems, simple passwords, etc. So the lock screen itself is relatively easy to bypass and then we have not even looked at all the things one can do when attaching a usb cable…

Given that the requirements for phone security might be different? Maybe it’s not about blocking input devices, preventing anyone to get in? Maybe the aim is only to hold of people having access to it when unattended for a few moments?

If that’s the aim our lock screen architecture of the desktop might even be over done, adding holes to it would be wrong and we shouldn’t share the code? It also means that the Android vulnerability doesn’t matter. The exploit is a complicated process needing quite some time. The lock screen prevents access for uneducated people and also for those having just a few moments of unattended access. It only breaks in situation where it might not matter: when you already physically own it.

Comparing DPMS on X11 and Wayland

On the Plasma workspaces Display Power Management Signaling (DPMS) is handled by the power management daemon (powerdevil). After a configurable idle time it signals the X-Server through the X11 DPMS Extension. The X-Server handles the timeout, switching to a different DPMS level and restoring to enabled after an input event all by itself.

The X11 extension is a one-way channel. One can tell the X-Server to go to DPMS, but it neither notifies when it has done so nor when it restored. This means powerdevil doesn’t know when the screens are off. Granted, there is a call to query the current state, but we cannot perform active polling to figure out whether the state changed. That would be rather bad from a power management perspective.

So overall it’s kind of a black box. Our power management doesn’t know when the screens turn off, Neither does any other application including KWin. For KWin it would be rather important to know about it, because it’s the compositor and there is no point in running the compositor to render for something which will never be visible. The same is obviously true for any other application: it will happily continue to render updated states although it won’t be visible. For a power management feature this is rather disappointing.

Now let’s switch to Wayland – in this case kwin_wayland. Last week I started to implement support for DPMS in KWin. Given that KWin is responsible for DRM, we need to add support for it in KWin. Of course we want to leverage the existing code in powerdevil, so we don’t add the actual power management functionality, but rather an easy to use interface to tell KWin to enter a DPMS state on a given output.

Of course we tried to not repeat the mistake from X11 and thus the interface is bi-directional: it notifies when a given state is entered. Although in a Wayland world that’s not so important any more: KWin enables DPMS and knows that DPMS is enabled and can make internal use of this. So whenever the outputs are put into standby the compositor loop is stopped and doesn’t try to repaint any more. If the compositor is stopped windows are not being repainted and aren’t signaled that the last frame got painted. So all applications properly supporting frame callbacks stop rendering automatically, too. This means on Wayland we are able to properly put the rendering of all applications into an off state when the connected outputs are in standby mode. A huge improvement for power management.

Thoughts on Vulkan in KWin

Lately I have been asked a lot about using Vulkan in KWin: in fact almost every blog post in the last few months has questions about it and that seems to me there is something to write about it.

So the quick tldr: I don’t have any plans on adding support for Vulkan. Over the last years I ported to so many new technologies, including at least 3 incompatible OpenGL versions. I am not looking forward to another incompatible OpenGL version (whether it’s called Vulkan or OpenGL doesn’t matter to me on that).

Now let’s look at the strength of Vulkan: going closer to the hardware and doing multi-threaded rendering. KWin still performs the rendering in the main GUI thread. Qt tried to do rendering in an off-thread for QtQuick’s scene-graph, in case of KWin we also do that in the main-gui thread. Reworking our compositor to use threading is a lot of work and would also probably improve the performance with OpenGL. Anyway as long as KWin doesn’t support threaded rendering this improvement by Vulkan is rather moot.

Now let’s look on the closer to hardware: yesterday I did a blog post on how I want to use e.g. KMS Planes to get closer to hardware and bypass the OpenGL compositing. Vulkan will allow us to perform rendering closer to hardware, but if we don’t render at all, we don’t benefit from it. Vulkan is probably more power saving, but not using the rendering will save even more.

So Vulkan can only improve when the scene needs to be rendered: e.g. when you wobble a window or spin the desktop cube. I don’t think that those effects are what we should optimize for and spend our time on and even if: there are low-hanging fruits to optimize using OpenGL, we do not yet use OpenGL 3 or 4 features to improve these effects.

Overall Vulkan looks to me like a lot of work as once again we would have to add a new compositing backend, write a new low level interaction, rewrite all shaders, etc. etc. Going to Vulkan early would mean introducing more complexity to KWin, more different code paths our users might use. It sounds like a mood exercise to do so. But I also doubt that there will be useable Vulkan drivers any time soon.

Vulkan is a great technology which will make graphics much better. But I doubt that an application like KWin is the use-case it was developed for.

Now to something else!

KDE is currently running a fundraiser for the Randa Meetings with the topic “Bring Touch to KDE!”. I personally will not participate in the Randa Meetings, but many other KDE developers will go there and we need your support for it. Please help and make the Randa Meetings possible.

Fundraiser

Layered compositing

On X11 the (OpenGL) compositor renders into a single buffer through the overlay window. This is needed to get features like translucency, shadows, wobbled windows or a desktop cube as Xorg itself doesn’t have any support for such features. The disadvantage of this approach is that we basically always have to perform a “copy” of what needs to be rendered. Consider VLC is playing a fullscreen video the compositor needs to take VLC’s video pixmap and render it onto the overlay window. The compositor needs to run, evaluate the scene and then render the one window.

Of course the compositors optimize for this case by only repainting what is needed and repainting from top to bottom in the window stack so that only the one fully opaque window gets rendered. Even more they have features like unredirecting fullscreen windows or (in the case of KWin) blocking compositing while specific windows are active. Unredirecting is a nice idea but it doesn’t really cover the use case. If one has just a small window (a tool tip, menu, on-screen volume change display) on top of the VLC video window, we need to start compositing again. Although we still would like to just forward the one VLC window.

On Wayland the situation looks better. Instead of rendering to an overlay window the compositor can directly interact with the hardware through e.g. DRM. This allows to for example take the buffer provided by VLC and pass it directly to DRM and bypass the compositing all together. But modern hardware allows us to do more than Xorg ever allowed us through “layers”. Be it the raspberry pi, Android’s hwcomposer or Linux’s KMS Planes, the idea is always similar: let the hardware compose the final image together.

If we think of Plasma phone we realize that in most cases we just need three layers: top panel, bottom panel and a maximized window in between. There is no need to do complex composition of the scene at all as all that is needed is putting the three visible (and opaque) windows in the three layers.

On a desktop system it’s a little bit more complicated. Our windows are not all maximized, they have translucent window decorations, shadows, there is a panel with blurred background. Especially the latter one needs a composition through OpenGL. Nevertheless there should be situations where we can make use of layers. Especially in the case of a fullscreen or maximized window this should help making the compositor faster.

Using layers will mean a large change on how KWin’s compositor works. The infrastructure for unredirecting of fullscreen windows cannot be reused in a sensible way as it is evaluated only for fullscreen windows and before the compositor runs. That is before the effects are executed. The decision whether to put a window into a layer needs to come at a later point: once we have evaluated whether we need to compose the complete screen (you might be wobbling a window) or can short pass through layers.

What I have in mind is to give this architecture a try with the VC4 hardware of the raspberry pi which supports layers in a decent way. And to first adjust the QPainter based compositor (as it’s the simpler one) to make use of this architecture. The idea is to experiment with it to get a useful generic KWin-internal backend API, so that we can also use it for OpenGL compositor and even more for our DRM and hwcomposer backend.

For the DRM backend I want to use the new atomic modesetting feature which will be introduced in Linux 4.2. Given that we cannot depend on it anytime soon, this will still need some time till it will be implemented. Overall all of this is not going to happen tomorrow – there are still more important things to work on. But if you, dear reader, are interested on working on this, please contact me. I’m happy to walk you through the code and share my ideas.

As this blog post is mostly about optimizing the compositor by making use of new hardware features which we didn’t have on X11, I can already point out that I will do a follow up blog post on Vulkan.

Now to something else!

KDE is currently running a fundraiser for the Randa Meetings with the topic “Bring Touch to KDE!”. I personally will not participate in the Randa Meetings, but many other KDE developers will go there and we need your support for it. Please help and make the Randa Meetings possible.

Fundraiser

A Qt Platform Abstraction plugin for KWin

In Qt we have the Platform Abstraction (QPA) which allows to better interact with the used windowing system through a plugin. In case of KWin we use the “xcb” plugin on X11 and on Wayland we used to use the “wayland” plugin provided by QtWayland. For quite some time I had been thinking about migrating away from those and use an own KWin-specific plugin at least for Wayland.

What’s wrong with QtWayland?

QtWayland provides a plugin for Wayland clients, but KWin is not a Wayland client: it is the Wayland server. This created some interesting “problems” which we had to workaround in KWin. KWin just doesn’t fit the usecase of QtWayland QPA plugin and it would not be a good idea to change QtWayland in a way that it supports KWin as a first-class citizen – it’s just a too special use case. So let’s look at some of the issues which motivated me to write an own QPA plugin.

Connecting to the Wayland server

The wayland plugin is meant for Wayland clients and tries to connect to the Wayland server during the creation of the QGuiApplication. As KWin is the Wayland server the plugin wants to connect to, we needed to make sure that the Wayland server and the event loop are up and running before constructing the QGuiApplication. This added a restriction that we have to monitor the server socket from the main thread.

Blocking roundtrips

The wayland plugin performs a blocking roundtrip to the Wayland server during startup to wait for all outputs to be announced. But this happens from the main gui thread, in which also the server lives. You can guess what happens: dead-lock. We fixed this one in QtWayland, but the problem was still there: any blocking call from the main gui thread to the Wayland server would freeze KWin. We experienced this with e.g. eglInitialize, startup of dbus services (e.g. kded, kamd), rendering of Qt internal windows in case the compositor didn’t send the frame rendered yet. Overall I found this aspect way to fragile and consider especially the fact that the server is in the main thread as a problem.

BypassWindowManagerHint

KWin uses the Qt::BypassWindowManagerHint on all it’s windows because this is needed on X11. The hint is X11 specific and doesn’t make much sense on other platforms, so the Wayland plugin doesn’t show such windows at all. That is a valid approach – there is nothing I could say against it. So in KWin we needed to remove the hint for all windows on Wayland.

OpenGL Context for QtQuick

QtQuick requires an OpenGL context and if it cannot create one it will abort. That is normally not a problem but can easily become one in the case of KWin: KWin creates a “low-level” OpenGL context using the specific platform (e.g. DRM/GBM, hwcomposer, whatever). Qt(Wayland) doesn’t know which one KWin uses and might not even have code for it. QtWayland tries to create an OpenGL context using the EGL platform Wayland. In the case of Mesa everything works fine, but we noticed that on libhybris the creation of a Wayland context fails if there is already one created for hwcomposer. One could consider this as a libhybris specific issue, but I wouldn’t say so – interacting with multiple EGL platforms from one process is kind of out of the specification.

Window decorations

This is a small issue: QtWayland creates window decorations around all Qt windows. But in the context of KWin we do not want any decorations. So we had to disable them by setting the QT_WAYLAND_DISABLE_WINDOWDECORATION environment variable.

And here is the plugin

So after Akademy I decided to work on it and see whether we can get a plugin working. Now it’s in a state where it works good enough to blog about it. The plugin does not support everything that might be used by Qt, but rather reduces to the feature set used by KWin. For example we don’t need to forward any input events, because KWin (mostly) handles them internally anyway and sends them directly to the respective internal Qt window.

The plugin brings the advantage that it no longer depends on the Wayland server being started before creating the QApplication. In fact it does not even try to connect to it, but can use KWin’s internal connection. Similar we can use this to shortcut e.g. creation of low level windows, etc.

On the OpenGL side the plugin approach shows it’s true strength. As it can access KWin internals we can use the EGLDisplay used by the compositing scene and create a sharing OpenGL context. So we can bypass the windowing system completely to render QtQuick windows and this will also allow us to make our window textures available to QtQuick. So a nice improvement.

In case our compositor doesn’t use OpenGL the plugin creates an OpenGL context for the platform Wayland. This is done in a way that it doesn’t hit the blocking issue I mentioned above.

So is the plugin something you should use in other applications?

The plugin interacts with KWin internals and thus is not in any way useable for non-KWin applications. In fact it even checks whether the running binary is KWin and does nothing if that’s not the case. It will also never be turned into a generic plugin as there is no need for that (just use QtWayland).

When will this plugin get merged?

The plugin can be found in branch “qpa” on my personal kwin clone on git.kde.org. It’s not yet in a state that I could push it to master. I still need to test more and make sure that everything works as intended.

Now to something else!

KDE is currently running a fundraiser for the Randa Meetings with the topic “Bring Touch to KDE!”. I personally will not participate in the Randa Meetings, but many other KDE developers will go there and we need your support for it. Please help and make the Randa Meetings possible.

Fundraiser