Recently I did some refactoring around the Compositor and there was one change where Thomas was afraid that this would cause a performance regression. So I used Valgrind’s callgrind tool to verify if this is true. And yes the code had a slight performance drop, though it is luckily not in the hot code path and even if the overhead would be rather small.
But having the callgrind log I looked a little bit closer into it, which I haven’t done since the last optimization for I think 4.8. Since that is quite some time ago and I had basically forgotten how it looked like back then, I was shocked about a few results. So I knew that in the last optimization I adjusted all effects to be not active by default except the translucency (and blur) effect.
Now looking at the results I saw that the translucency effect is rather expensive and that by default the effect is not doing anything unless you are moving windows. This is of course a rather unpleasant behavior to have an expensive effect doing nothing. So I looked at the implementation and found a way to better track when the effect should be active. Unless you have enabled the effect to set decorations or inactive windows to translucent the effect is now disabled by default. Just when you start moving a window the effect gets active. And even then the effect performs better.
But there was more into it. So I noticed that there is supposed to be an animation when a window starts to move, but personally I have never seen it. Looking closer at the code I noticed that this could have never worked. I decided that an animation added to 4.1 which has never worked can be dropped which again improves a little bit the performance. We might add a better translucency effect for 4.10 which adds the animation again, but for 4.9.x there is no user visible change by removing the animation.
But still I could not fully understand why the effect is so expensive, all it does is checking the type of the window multiple times: is it a desktop? is it a dialog? and so on. That cannot be that expensive, but it is. I tracked down the expensive call in KCacheGrind and found that the check for the windowType() is expensive.
The code had quite some surprises. It gets the window type, calls the window rules to have user specific overwrites and some hacks to fix some special windows. One of the hacks was to make menus with a certain size being a top menu. This hack must have been for the time when the top menu had not yet been implemented as a kicker applet. Not only is it unlikely that anybody is using such a combination of KWin and old KDE versions also KWin has not supported the top menu at all in any 4.x release and the code got dropped a few releases ago. Which allowed us to savely remove that part.
The second hack is even more intersting. It is a workaround for a dialog in OpenOffice.org 1.x added in 2003. For this hack each time the method was called a complete string comparison had to be done in case the window is a dialog. Again the hack was quite outdated given that on a modern system you don’t have any windows with the name openoffice, but only libreoffice. Also I searched through the LibreOffice help to find the dialog in question and verified the window type: the hack is no longer required. Both hacks are removed for 4.9.2. The lesson to be learnt from that: never add hacks to your application, they stay. In general I would not accept workarounds for applications inside KWin anymore. This clearly belongs to the area of window specific rules and scripts.
But the main optimization of this method will be available in 4.10. The output of callgrind showed that this method was causing quite some expensive dynamic casts. In fact each call caused two dynamic casts to check whether it is a specific sub class and basically the method contained two interwoven implementations for the two specific sub classes. The logical step was to make the method pure virtual and implement it in the sub classes. According to the callgrind logs after the change this improves the performance quite a lot (I cannot say whether this can be noticed by a user, for this it might be too small, but it should be noticable on the battery). Given that this is not just dropping of hacks but a refactoring it cannot go into 4.9.2 as there is still the risk of a regression.
Interestingly when the method got implemented the approach was correct and also not expensive. From within the window manager code path it gets only called very few times, in my dataset it’s about 10 % of the calls coming from the window manager and it seems like most often to be called when a window gets added, so on a longer running session the amount would be even smaller. The code got expensive when it became to be used from within the effects system which is compared to the window manager a rather hot code path. Which is also something important to remember when optimizing: check whether the expected methods are in the hot path. This is now the case for KWin: the most expensive call is the one to render a window, the second most expensive the one which starts the rendering of a frame. And for those I’m already working on further optimizations for 4.10.