Some of the feedback we got, is that we should blog more about how we improve the quality, what kind of bugs we fixed. So today I want to do that with an in-depth explanation of four crash reports I looked at and fixed this week. All of them will go be part of the upcoming 5.4.3 release. All of them were related to QtQuick with three of them being caused by a problem in QtQuick and one caused by a workaround for a QtQuick problem. As I explained in my Monday blog post we are getting hit by issues in the libraries we use, in this case QtQuick.
Closing glxgears crashes KWin
The first issue was communicated to me through IRC. After a small investigation together with the user we figured out the condition to crash it and how to reproduce it:
1. Use an aurorae window decoration theme (e.g. Plastik)
2. Open glxgears
3. close glxgears through the close button
This got reported as Bug 346857 and was nothing new to us: we have had similar problems before, which makes it a little bit sad. The problem here is that glxgears doesn’t speak the close window protocol. Normally when you click the close button the window isn’t closed directly, but the window gets notified “please close your window”. So the mechanism is asynchronous. This also explains why such an issue has not been detected during the testing: it’s not happening with default decoration and it’s not happening with “normal” applications. It can only be reproduced with applications which do not behave correctly.
So what’s the difference in this case? KWin handles the destruction in a synchronous way which causes the decoration to be destroyed in direct result of the mouse click. When the decoration is destroyed the QtQuick scene driving the decoration is also destroyed and it looks like Qt doesn’t like that:
<code>#0 0x00007ffff50fd107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007ffff50fe4e8 in __GI_abort () at abort.c:89 #2 0x00007ffff5de8291 in qt_message_fatal (context=..., message=...) at global/qlogging.cpp:1578 #3 0x00007ffff5de495c in QMessageLogger::fatal (this=0x7fffffff8ca0, msg=0x7ffff6104270 "ASSERT: \"%s\" in file %s, line %d") at global/qlogging.cpp:781 #4 0x00007ffff5dddb90 in qt_assert (assertion=0x7ffff67df129 "context() && engine()", file=0x7ffff67df0e4 "qml/qqmlboundsignal.cpp", line=183) at global/qglobal.cpp:2966 #5 0x00007ffff6667fe7 in QQmlBoundSignalExpression::function (this=0xf56080) at qml/qqmlboundsignal.cpp:183 #6 0x00007ffff6667d4c in QQmlBoundSignalExpression::sourceLocation (this=0xf56080) at qml/qqmlboundsignal.cpp:155 #7 0x00007ffff663f1fb in QQmlData::destroyed (this=0x873d90, object=0xf54eb0) at qml/qqmlengine.cpp:1709 #8 0x00007ffff663cf6b in QQmlData::destroyed (d=0x873d90, o=0xf54eb0) at qml/qqmlengine.cpp:674 #9 0x00007ffff605c6e9 in QObject::~QObject (this=0xf54eb0, __in_chrg=<optimized out>) at kernel/qobject.cpp:912 #10 0x00007ffff7265df9 in QQuickItem::~QQuickItem (this=0xf54eb0, __in_chrg=<optimized out>) at items/qquickitem.cpp:2224 #11 0x00007ffff7324866 in QQuickMouseArea::~QQuickMouseArea (this=0xf54eb0, __in_chrg=<optimized out>) at items/qquickmousearea.cpp:439 #12 0x00007ffff72d69c5 in QQmlPrivate::QQmlElement<QQuickMouseArea>::~QQmlElement (this=0xf54eb0, __in_chrg=<optimized out>) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98 #13 0x00007ffff72d69fa in QQmlPrivate::QQmlElement<QQuickMouseArea>::~QQmlElement (this=0xf54eb0, __in_chrg=<optimized out>) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98 #14 0x00007ffff605e516 in QObjectPrivate::deleteChildren (this=0xf54c90) at kernel/qobject.cpp:1946 #15 0x00007ffff605cb80 in QObject::~QObject (this=0xf56050, __in_chrg=<optimized out>) at kernel/qobject.cpp:1024 #16 0x00007ffff7265df9 in QQuickItem::~QQuickItem (this=0xf56050, __in_chrg=<optimized out>) at items/qquickitem.cpp:2224 #17 0x00007ffff72c3e22 in QQuickRectangle::~QQuickRectangle (this=0xf56050, __in_chrg=<optimized out>) at items/qquickrectangle_p.h:128 #18 0x00007ffff72d6357 in QQmlPrivate::QQmlElement<QQuickRectangle>::~QQmlElement (this=0xf56050, __in_chrg=<optimized out>) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98 #19 0x00007ffff72d638c in QQmlPrivate::QQmlElement<QQuickRectangle>::~QQmlElement (this=0xf56050, __in_chrg=<optimized out>) at ../../include/QtQml/../../src/qml/qml/qqmlprivate.h:98 #20 0x00007ffff736a976 in QQuickView::~QQuickView (this=0x6b1980, __in_chrg=<optimized out>) at items/qquickview.cpp:225 #21 0x00007ffff736a9d2 in QQuickView::~QQuickView (this=0x6b1980, __in_chrg=<optimized out>) at items/qquickview.cpp:227 </code>
Now in order to never have that happen again I created a test application and reported a bugreport against Qt. And of course we worked around the problem by delaying the handling of the close to the next event cycle. Now you might wonder how it’s possible that we allowed such a regression to sneak in again given that we had seen it before? Well that’s easily explained, up until recently we were not able to run full tests against KWin. We might have been able to unit test this area, but it would have passed, this issue needed an integration test. And that’s what I added now, so that we won’t hit it ever again. It’s probably the weirdest auto test I have ever written, involving starting glxgears (thanks to our sysadmins for installing it for this test case!), simulating the mouse click on the close button and ensuring glxgears closes without crashing KWin.
Crash when opening window decorations configuration module
The second crash I run into directly after investigating the first one. I wanted to switch back to Breeze decoration but couldn’t because the config module crashed directly. This was bug 344278 – a really nasty one with 74 duplicates. We had wonderful crash traces, but missed the important part about how to reproduce it. From the crash trace alone we were not able to figure out what’s going on. Well we saw some aspects but it wasn’t enough to properly investigate.
Now alas I had a 100 % sure way to reproduce the problem and also understood what’s going on. So here the actual steps to reproduce:
1. Open Window decoration configuration menu
2. Download many themes, many of them, the more the better
3. Select a theme which name’s first latter is far away from “B”, e.g. Plastik.
4. Apply and quit
5. Open window decoration configuration menu again
And boom! What happens is that we load all decorations and put them into a ListView, then we select the theme the user is currently using and ensure it’s centered in the list view. This results in the list view scrolling. ListView has a cache of elements and if you have too many it will throw out other elements. So our Breeze deco which gets created at the start (early in alphabet, will be shown) is kicked out again when selecting Plastik if there are too many decorations. This is the requirement to have many themes installed and also to select one far down in the alphabet. With anything else you won’t trigger it.
The bug itself got triggered through Breeze decoration because there is a Property animation running which triggers another update on the decoration after it has been kicked out. It accesses a member which got destroyed by the QtQuick engine.
All of that also means that the crash would not have been triggered if that code would have been e.g. in Oxygen and not in Breeze. Because then it would not have hit. Also just scrolling in the list after it loaded even if you have many installed, won’t trigger the crash, because then the animation is no longer running. It’s only running after loading in the preview. What I want to show here is that this is an extremely hidden corner case to find. You must have exactly the same conditions to trigger it. Given that I’m also surprised that we have so many duplicates for the report.
So how to fix it? Obvious idea: one could disable the anyway not visible animation in Breeze. But that’s not a fix, that’s just a workaround and doesn’t solve the actual problem. Other decorations might trigger that again. So we need to understand what’s going on. What we knew is that the Decoration triggers an update and crashes because the DecorationBridge is no longer valid. But that’s impossible! The contract of the KDecoration API is that the DecorationBridge will always be valid if there’s a Decoration. So how could the contract break?
For this we need to look into how the QtQuick code for rendering the previews works. Each Decoration is wrapped by a PreviewItem and each decoration gets it’s own DecorationBridge provided by a PreviewBridge. The PreviewBridge is directly constructed through QtQuick, the Decoration is loaded later on. What is now interesting is the tear down. When the PreviewItem gets destroyed by QtQuick it doesn’t delete the Decoration directly, but delays it to the next event cycle. So the Decoration outlives the QtQuick items. When the PreviewItem gets destroyed by QtQuick also the PreviewBridge gets destroyed by QtQuick and thus we have a Decoration which is still alive, but the PreviewBridge isn’t. Why was it done that way? Well to prevent crashers caused by QtQuick. So our workaround for a crash just introduced a new one. Meh.
The solution now is to also delay the deconstruction of the PreviewBridge to the next event cycle. I hope this does not again create a new crash.
Crash when exiting Screens configuration module
The third issue I looked into was also reported to me in response to my Monday’s blog post. This one was supposed to be fixed in Qt 5.5, but wasn’t. It has a clear way to reproduce it:
1. Open Systemsettings
2. Go to “Display Configuration”
3. Click “All Settings” to go back to overview
According to our users that should crash it. Just it doesn’t. This raised my interest – also that it has 73 duplications which makes it rather important. I know the user who reported this to me and know that I can fully believe what he tells me. So this crash raised my interest. It also has a very interesting crash trace:
<code>#0 0x00007ffff2e48d59 in QQuickItemPrivate::addToDirtyList (this=0xdbdcc0) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:5610 #1 0x00007ffff2e48e43 in QQuickItemPrivate::dirty (this=0xdbdcc0, type=<optimized out>) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:5594 #2 0x00007ffff2e496cd in QQuickItem::update (this=0xdbdc40) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:4088 #3 0x00007ffff2e56c0d in QQuickItem::qt_static_metacall (_o=<optimized out>, _c=<optimized out>, _id=<optimized out>, _a=<optimized out>) at .moc/moc_qquickitem.cpp:597 #4 0x00007ffff45f2ae1 in QObject::event (this=this@entry=0xdbdc40, e=e@entry=0x7fffc41f80b0) at kernel/qobject.cpp:1239 #5 0x00007ffff2e53a63 in QQuickItem::event (this=0xdbdc40, ev=0x7fffc41f80b0) at /mnt/AUR/qt5-declarative-git/src/qt5-declarative/src/quick/items/qquickitem.cpp:7294 #6 0x00007ffff6087d94 in QApplicationPrivate::notify_helper (this=this@entry=0x681dd0, receiver=receiver@entry=0xdbdc40, e=e@entry=0x7fffc41f80b0) at kernel/qapplication.cpp:3717 #7 0x00007ffff608d2c8 in QApplication::notify (this=0x7fffffffe4a0, receiver=0xdbdc40, e=0x7fffc41f80b0) at kernel/qapplication.cpp:3500 #8 0x00007ffff45c49dc in QCoreApplication::notifyInternal (this=0x7fffffffe4a0, receiver=0xdbdc40, event=event@entry=0x7fffc41f80b0) at kernel/qcoreapplication.cpp:965 #9 0x00007ffff45c7dea in sendEvent (event=0x7fffc41f80b0, receiver=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:224 #10 QCoreApplicationPrivate::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0, data=0x681430) at kernel/qcoreapplication.cpp:1593 #11 0x00007ffff45c8230 in QCoreApplication::sendPostedEvents (receiver=receiver@entry=0x0, event_type=event_type@entry=0) at kernel/qcoreapplication.cpp:1451 #12 0x00007ffff4617f63 in postEventSourceDispatch (s=0x6d7aa0) at kernel/qeventdispatcher_glib.cpp:271 #13 0x00007fffefcce9fd in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #14 0x00007fffefccece0 in ?? () from /usr/lib/libglib-2.0.so.0 #15 0x00007fffefcced8c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 #16 0x00007ffff4617fd7 in QEventDispatcherGlib::processEvents (this=0x6d5850, flags=...) at kernel/qeventdispatcher_glib.cpp:418 #17 0x00007ffff45c339a in QEventLoop::exec (this=this@entry=0x7fffffffe380, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204 #18 0x00007ffff45cb23c in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1229 #19 0x00007ffff592cbf4 in QGuiApplication::exec () at kernel/qguiapplication.cpp:1528 #20 0x00007ffff6084bb5 in QApplication::exec () at kernel/qapplication.cpp:2977 #21 0x000000000040f52b in main (argc=1, argv=<optimized out>) at /mnt/AUR/systemsettings-git/src/systemsettings/app/main.cpp:55 </code>
The interesting part here is that it’s completely inside Qt. We come from the event loop and an event is handled inside Qt and crashes. No code of the control module is executed in this trace any more.
But why am I not able to reproduce? After all there are so many users hitting it. I have the same Qt version, so it should crash. That I figured it out was pure chance. I remembered that the user has an NVIDIA system and I verified that this is still the case. I have an Intel system. So why is that important? For NVIDIA QtQuick uses threaded rendering, while for Mesa it uses the main gui thread for rendering. I had seen in the past that with threaded rendering destruction might be moved to the next event cycle. So easy thing to test:
QSG_RENDER_LOOP=threaded systemsettings5 and follow the reproduction steps and boom! Yay, I have a test case. Following the old saying: “consider a bug fixed when a developer is able to reproduce” the hardest way was solved.
So I started investigating. Let’s try with kcmshell5 instead of systemsettings. Hmm doesn’t crash. Let’s try with kscreen’s test application instead of systemsettings. Hmm, doesn’t crash. So something in systemsettings must trigger it! And I started reading code and read and read, tried here something, tried there something and come to the conclusion: systemsettings is not doing anything wrong.
Given that it must be the QtQuick code of the screen configuration. After some trials I had a minimal derivation to the QtQuick code which didn’t crash any more. Here again helped previous experience with QtQuick related crashers: I saw some usage of QtGraphicalEffects and remembered that this one used to crash KWin with the Breeze Aurorae Theme prior to Plasma 5.2. Removing the OpacityMask didn’t crash. Yay! So how to get that into a fix? The solution was actually in the debug output of QtQuick:
QSGDefaultLayer::bind: ShaderEffectSource: ‘recursive’ must be set to true when rendering recursively.
So obviously I set this missing recursive on the OpacityMask. Hmm, compile error, that doesn’t exist. So let’s make it non recursive and there we go, it doesn’t crash any more.
I would love to report this to Qt, but so far I failed with creating a simple test case. Just like with my testing with kcmshell5 and the kscreen test application I’m not able to hit the condition and I haven’t figured out yet what is different in systemsettings.
Opening effects configuration twice crashes
The last issue to look at was triggered through the previous issue. During review it was pointed out that there are more QtQuick configuration modules which crash in similar ways. So I tried them all and hit a crash if:
1. Open Systemsettings
2. Go to Desktop Behavior
3. Click Desktop Effects
4. Click All Settings
5. Repeat steps 2 and 3
Again a very interesting back trace:
Like in the previous example it’s a crash deep down in Qt. The first code related to code we distribute is at stack position 55. So again I needed to experiment to figure out the minimal code which triggers the crash and after some trials I figured out what causes it: setting root context properties. After reworking this to no longer needing a context property, it doesn’t crash any more.
Again this should be reported to Qt, bug so far I failed to create a simple example demonstrating the problem Of course setting a root context property works in my examples. If one of my readers have an idea what’s so special about systemsettings to trigger it, please let me know so that I can report bugs against Qt.
As we can see with these four cases: we are hit by issues further down in the stack. We can work around them, but this comes with a cost as it doesn’t fix the actual problem. Some of the problems are real corner cases, hardware dependent and cannot be reproduced by all developers.
7 Replies to “Looking at some crashers fixed this week”
“Again this should be reported to Qt, bug so far I failed to create a simple example demonstrating the problem”
Shouldn’t it be reported to/against Qt anyway, even without a minimal test case? Just so that it’s there, in the bug tracker, with a stack trace that other people could find if they reproduce the same issue?
Maybe someone else might be able to add more information, once the bug is there. If you document the roads you went down that didn’t go anywhere, that should help others narrow down their efforts to reproduce more reliably.
At the moment I still hope that we find the root cause to make it reproducable. Only if I fail with that it makes sense to put into bug tracker. I know how hard it can be without a way to reproduce, so I want to do a good job.
Even without the recent quality concerns, this is a great idea for blog posts anyway! Showing us how you solve real bugs, it’s a good intro to how you work with a substantial codebase like kwin, and you may even encourage some readers to start fixing bugs too!
Thanks for the post!
As a developer I am interested in what you mean by “So again I needed to experiment to figure out the minimal code which triggers the crash”.
You make a unit test that reproduces the issue with the same code and then reduce it to pinpoint the issue? Did I get this right?
Ah no, unit test would be nice. But as explained that’s tricky as I failed to get that. I removed code from QtQuick. Tried, if it still crashed removed more code, if it didn’t crash add code again. And so on, till one understands where it’s coming from.
Martin, thanks a lot for your work to improve the quality of KDE software! It’s great to hear about bugs being fixed (or worked-around: either way, the user gets a better experience).
Thank you very much for this thorough analysis! And my compliments to your attitude in general: I do not know anybody else who can turn a discussion that deteriorated to the point of “KDE/Plasma sucks; rant, rant, whine, whine” (see, even I get annoyed and I am not even a developer!) and actually turn it something positive like a bug-fixing orgy!
One question as an absolute non-expert: it seems strange to me that misbehaving applications can take the window manager with them … I would assume they live in separate processes and only communicate via some protocols. Was this always the case? Is this a “feature” of Qt5 or OpenGL?
Comments are closed.