KWin on speed

With the 5.2 release basically done, I decided to do some performance investigation and optimizations on KWin last week. From time to time I’m running KWin through valgrind’s callgrind tool to see whether we have some expensive code paths. So far I hadn’t done that for the 5.x series. Now after the switch to kdecoration2 I was really interested in the results as in the past rendering the decoration used to be a bottle neck during our compositing rendering loop.

Unfortunately callgrind doesn’t give us a good look on the performance of KWin as it neither includes GPU times nor roundtrips to the X server. Nevertheless it gives us a good look on our own CPU usage. I was rather surprised by the result as I didn’t find anything which looked bad. Nevertheless I was able to slightly optimize one method which is called whenever the X11 stacking order is changed by improving an internal algorithm which didn’t scope well with the larger than expected number of child windows of the root window.

But callgrind output wasn’t the only performance relevant thing I looked into. I investigated a really interesting bug report about the screen freezing for a short time when a new window opened. While I wasn’t able to reproduce the issue as is, I was able to reproduce a small freeze whenever a Qt 5 application opened. Interestingly only with a Qt 5 application. So I ran the same application in a Qt 4 and Qt 5 variant and only in the latter I got a freeze. Investigation showed pretty quick that KWin is not to blame, for one I got the freeze before KWin started to manage a window for it and I was able to reproduce with different window managers. With the help of xtrace I finally found the culprit and we found the appropriate bug report on Qt side. Also our KDE domain experts started to look into the issue on Qt side.

But still others were able get a small freeze whenever KWin started to manage a window. And in deed further investigation showed that the method handling the managing of a new window can take some time and can cause the compositor to drop frames. Ideally this would be solved by moving the compositor into a dedicated rendering thread but that’s quite a lot of work and might not help in that case as KWin’s main thread grabs the XServer while managing a window. So the better solution was to investigate why the method takes so long. To not drop any frames the method may not take longer than 16 msec, the shorter the better.

While managing a window KWin needs to read quite a lot of properties. Most of them are nicely read in a non-blocking way through the KWindowSystem framework, but some properties are also KWin internal and read in a blocking way. Most expensive was reading the icons which was triggering several round trips especially if the window did not specify the icons in a NETWM compliant way. This could easily cause a delay of 50 to 100 msec during managing a window. Overall the method could trigger up to 14 round trips to the X server which were not needed at all in the case of KWin. Our KWindowSystem framework got an adjustment to prevent the roundtrips if the user of the KWindowSystem framework has all required information already fetched. The result is that reading the icons is now significantly below one msec. For other roundtrip causing methods I introduced two new methods: one to perform the request, one to later read the result. This allowed to remove another set of roundtrips. My measurements showed that each roundtrip takes about half a millisecond on my system. Half an msec here, half an msec there easily adds. Unfortunately there are still some XLib calls (one to read motif hints and one to read WM_SIZE_HINTS) which ideally would get ported and as long as they are not ported delay the managing of a new Client.

Nevertheless this shows quite some nice improvements for our development version which will become 5.3 in a few months. Of course all of that would not have been possible without the switch from XLib to XCB.

12 Replies to “KWin on speed”

  1. Have you tried using vtune in addtion to callgrind? I tend to use both, because both have their own strong points. Callgrind’s information is easier digestible, but vtune doesn’t cause as much of a slowdown, making it easier to trace normal workloads.

  2. In my company, next to sample based profilers we also use nvidia nsight (on linux, there is something equivalent) which you can use to create timeline records including the GPU activity. Here, in case of opencl, this works even on non-nvidia platforms. Unfortunatelly I haven’t found anything equivalent at least when GPU’s are involved which is open-source which is a bummer. But I wouldn’t want to live without it – it’s how I really noticed how much of an issue opengl API overhead really is and how useful the new 4.5 API’s are for example.

    When employing linux-perf you can also use the flamegraph script to create timeline records which I found to be really helpful in complementing the output of sample-based profilers

  3. Hey Martin,

    can you explain why you weren’t using “const auto &it” in your optimization you’ve linked at the beginning of your article? Hopefully not too stupid to ask, but I was just curious. Thank you

    1. It’s not a foreach loop, but an iterator based. So there is no need to make it a reference and making it const would mean that the iterator cannot be incremented. It would result in the following build error:
      kwin/layers.cpp: In member function ‘KWin::ToplevelList KWin::Workspace::xStackingOrder() const’:
      kwin/layers.cpp:678:84: error: passing ‘const QList::const_iterator’ as ‘this’ argument of ‘QList::const_iterator& QList::const_iterator::operator++() [with T = KWin::Unmanaged*]’ discards qualifiers [-fpermissive]
      for (const auto it = unmanaged.constBegin(); it != unmanaged.constEnd(); ++it) {
      ^
      CMakeFiles/kwin.dir/build.make:432: recipe for target 'CMakeFiles/kwin.dir/layers.cpp.o' failed
      make[2]: *** [CMakeFiles/kwin.dir/layers.cpp.o] Error 1

  4. Hi Martin, did you take a look at the shadow in plasma bug? Whenever you dismiss a window (lets say the start menu) , the menu disappears, but the shadow that surrounds it stays there, till that area of the screen is refreshed. This is mostly visible with intel drivers but appears to be a bug in plasma, since i dont see this with other decorators. Do you see this behaviour yourself?

  5. Funny thing, I just switched to 5.2 (ye, I know I’m a pretty early adopter in the field of early adopters), and two more bugs are gone which annoyed me in 5.1.2.

    But the nicest thing is that the strange “flickering” on maximizing windows seems to be gone for me which I know is cosmetic, but it annoyed me :D.

    Is there a reason that the lockscreen is not integrated with powermanagement or activities yet? It annoys me a bit if the screen does not shut down while watching movies, but the screensaver gets online all the time – or is that up for somebody (e.g. me) to implement, or are there reasons this shouldn’t be done?

    1. These are different inhibition locks, though we should investigate whether we should inhibit screenlocker based on the powermanagement inhibitions.

  6. Want to ask a bit offtopic. Iam on Gentoo. If I start kwin_x11 through .xinitrc I dont get any themes/colors applied and a lot of Icons are missing. Do I have to start anything else? I just want to prevent to start plasmashell or is this needed for all the color/icons stuff.

    greets Paul

    1. You are missing the frameworksintegration QPT plugin. This is only loaded in a full Plasma session controlled by environment variables set in startkde.

Comments are closed.