Splitting up KWin’s OpenGL Compositor

About two years ago I started with bringing KWin to the world of embedded devices by implementing OpenGL ES 2.0/EGL support. Till then the compositor supported only an OpenGL 1/GLX backend which allowed effects to use Shaders emulating the fixed functionality.

The initial idea was to make this a compile time replacement. For OpenGL ES 2.0 one needs compile time checks anyway and so it seemed reasonable to just ifdef the main differences: GLX vs EGL. Our compositor itself already supported Shaders so this could all be done at runtime selection.

But as those who have used OpenGL know, the difference between OpenGL 1 and OpenGL (ES) 2 is rather large. The one is using the fixed functionality and the other is using the programmable pipeline. While we had support for Shaders we actually did not yet have support for the programmable pipeline (e.g. geometries got specified in the fixed functionality way).

The best approach would have been to just go to OpenGL 2 and the programmable pipeline directly, but back then, this seemed to be not suitable. There were (and still are) quite a lot of hardware which cannot support OpenGL 2 and the free drivers just gained support for OpenGL 2. This meant that I actually planned to have the default as it used to be on OpenGL 1 and only use the OpenGL 2 when compiled for embedded.

Quite a challenge in the adjustments were the effects which already used Shaders. Of course I did not want to have two sets of Shaders – one for desktop and one for embedded. That would have increased the maintenance costs in a way I could not have justified. The Shaders therefore got adjusted to be for the programmable pipeline, to no longer emulate fixed functionality.

The result of this and other adjustments were that instead of gaining one additional backend for EGL we also got a new OpenGL 2/GLX backend and it turned into the default and OpenGL 1 into the legacy backend.

Now fast forward two years. In the meantime quite a lot has happened which I could not foresee when I started the OpenGL ES efforts. One happening is QtQuick 2 which will be used with Qt 5. KWin is already a heavy user of QtQuick 1 and of course we want to use QtQuick 2. But this will bring in a runtime dependency for OpenGL 2. So no matter whether we actually support OpenGL 1 or not, once we are using Qt 5, we will need OpenGL 2. This puts an end-of-life warning to our OpenGL 1 based legacy compositor.

Another important development is of course Wayland which will open a complete new world of compositing to us. We will in future have to composite without an X Server. It’s still more distant future than Qt 5, but still it’s something to consider.

And also it seems to become more clear that EGL is no longer only for embedded. It becomes possible to use OpenGL on top of EGL instead of GLX and that basically throws one of my assumptions from the porting over board. Concerning EGL we have also seen in the recent past that being able to do EGL is just one part of the game: the driver also needs to support doing EGL with X. Especially if the driver is only meant to be used for Android that is unfortunately not the case.

Given all that I’m very unsatisfied with the design decisions I took two years ago and the way how the OpenGL scene is split into the GLX and EGL parts.

Recently I started a refactoring session for the Compositor and did not stop there but continued into the OpenGL scene. One of the issues I wanted to tackle was to prepare KWin to be able to use EGL as a backend for OpenGL and to be able to add more EGL backends to get that problem mentioned above solved (also I recently got a Raspberry Pi and I want to see KWin running on it).

An abstract OpenGLBackend got split out of the SceneOpenGL which is responsible for creating/releasing the OpenGL context, create Textures from window pixmaps and do the buffer swapping in the window system specific way. There are now two concrete implementations: GlxBackend and EglOnXBackend (you might guess what is the next Backend I want to implement). The advantage is now that the actual compositor does no longer need to care about the windowing system abstraction.

The second major refactoring in that area was to separate the OpenGL 1 and OpenGL 2 code paths. While in the past it made sense to have just one code path, this is for quite some time no longer the case. Everywhere there were checks whether we are in OpenGL 1 or 2 mode and overall it were mutual exclusive paths with some shared code. This is now nicely separated which makes the code cleaner and hopefully easier to maintain. It also means that we can move the OpenGL 2 compositor forward without doing compromises for the OpenGL 1 compositor.

A nice addition is that first the OpenGLBackend is created (still EGL only for ES and GLX for desktop) and then the Compositor is asked whether it can use the backend. E.g. the OpenGL 2 compositor now checks whether the backend provides a direct rendering context and just refuses if this requirement is not met. This will hopefully result in some code to have better detection which driver should use OpenGL 2 and which OpenGL 1. Up to now the detection was whether the driver supports GLSL and if the Shader compilation succeeded, the GL2 compositor got used. My idea for 4.10 is to use our GLPlatform information system to tell us whether the driver should use OpenGL 2 or 1. That way we can give users of older hardware a better experience even if their hardware is – theoretically – able to do the OpenGL 2 compositing. This needs some further adjustments like an environment variable to enforce a compositor (config option already exists).

The change also nicely grouped all the OpenGL 1 code into one area which will be easier to remove once we decide that the End of Life is reached. Removing code is not as easy as one might expect and that is actually quite a project. To go to that direction I plan to introduce a new build option to build KWin without OpenGL 1 support, which is basically the same as the build option for OpenGL ES, but the existing build option could be used for actual differences between OpenGL 2 and OpenGL ES 2.0. And of course EGL needs to be uncoupled from the OpenGL ES build option – it is totally valid to compile the EGL backend for OpenGL.

16. September 2012

This week in KWin (2012, week 37)

This week I want to dedicate my summary to the following persons:

Salva
Giuseppe
Andrea
Christian
Dirk
Francesco
Dario
Alexander
Lilian
Bernhard

I hope I haven’t forgotten anyone. They all worked on porting our effects configuration to KConfigXT and quite a number already got merged into master this week. This is truly an amazing community. I have not expected that ten people would turn up to tackle the complete project in just one week.

The “disadvantage” is that it basically blocked my work. I was mostly involved in reviewing the patches and merging them in. It took me nevertheless about a quarter of hour to merge one of the changes as some effects are special and needed adjustments.

Other reviews got stuck in the queue and because of that the bug list for this week just returned an empty list.

16. September 2012

Einführung in Kubuntu auf der Ubucon 2012

Auf der diesjährigen Ubucon, welche vom 19. bis 21. Oktober in Berlin stattfindet, werde ich eine Einführung in Kubuntu und insbesondere in die KDE Plasma Workspaces geben.

Dies ist eine hervorragende Gelgenheit mal in Kubuntu reinzuschnuppern und das nicht nur wenn man mit den Entwicklungen bei anderen Umgebungen nicht so ganz zufrieden ist 😉 Als Entwickler an den Workspaces kann ich bestimmt auch alten KDE Plasma Nutzern noch einige Tricks beibringen um Plasma besser zu bedienen.

Da ich plane die Einführung hauptsächlich als Live-Vorführung durchzuführen, kann ich anbieten, dass ich eure Wünsche in der Einführung berücksichtige. Wenn es etwas gibt, was ihr angesprochen haben wollt, dann hinterlasst doch einfach einen Kommentar hier im Blog und ich werde schauen, dass ich das einbauen kann. Natürlich keine Garantie 🙂

Freue mich schon viele interessierte Gesichter in Berlin zu treffen.

Da das Programm der Ubucon anscheinend noch nicht online ist, hier mal der Abstract zu meinem Vortrag wie ich ihn im Call-For-Papers eingereicht hatte:

Kubuntu – eine Einführung in die KDE Plasma Workspaces

Kubuntu ist die von der Community herausgegebene Ubuntu Variante mit Fokus auf Software der KDE Community. Anstatt dem unter Ubuntu verwendeten Unity kommt hier eine der KDE Plasma Workspaces zum Einsatz – je nach Größe des Bildschirmes entweder eine speziell angepasste Netbook Oberfläche oder ein klassischer Desktop.

In diesem Vortrag wird die grundlegende Verwendung von Kubuntu und der KDE Plasma Workspaces vorgestellt. KDE Plasma ist ein sehr mächtiger Desktop, der dem Nutzer unglaublich viele Möglichkeiten gibt, schnell, effizient und mit Freude seine Arbeit am Rechner zu vollbringen. KDE Plasma ist eine Oberfläche, die nicht Aufmerksamkeit verlangt, sondern dem Nutzer hilft seine Arbeit durchzuführen.

Hierzu bietet Plasma interessante neue Konzepte wie Aktivitäten an. Diese erlauben dem Nutzer seine Arbeit thematisch zu sortieren. Dabei wird aber diese Funktion dem Nutzer nicht aufgezwungen, sondern es ist eine nützliche Funktion, welche man benutzen kann, wenn man denn möchte.

Neben der Einführung in die KDE Plasma Workspaces, werden auch einige der für den täglichen Gebrauch wichtigsten Anwendungen von KDE erklärt, wie z.B. den KDE Dateimanager Dolphin oder den Bildbetrachter Gwenview, welcher auch Bilder direkt nach Facebook, Flicker und Co senden kann.

14. September 2012

The Importance of Developer Sprints

Back in 2009 I was one of those lucky hackers who met in Randa, Switzerland, for a Plasma sprint. I just had started my Master Thesis, so actually I had better and more important things to do than to go to a developer sprint a few hundreds of kilometers away.

Almost half a year later we had the next Plasma development sprint in Nuremberg. I was just finishing my Master Thesis, so again I had more important things to do. I remember that I spent as much time as possible reading and correcting my Thesis.

Since then attending developer sprints no longer conflicted with my studies, but they conflicted with my work. To go to a developer sprint requires to take off days from work, spending your holidays to do work. This year I spent already two weeks – one for the sprint in Spain and one week for Akademy.

Next week I will attend the XDC in Nuremberg and I’m basically out of holidays left for sprints, because I also want to go on non-work holidays this year. That’s the reason why I unfortunately cannot attend the sprint in Randa.

Why do I write all this. Well, I want to explain that going to a sprint is nothing close to holidays. It’s actually quite some hard work and you do quite some compromises to go there. You do that, because you know how important these sprints are. Nobody benefits more from a sprint than the users. Issues which are hardly possible to discuss on a remote media like mailing lists or IRC are solved in few minutes. Developers can start hacking together on an issue to make it go away. Developers can easily see shortcomings in the software like finding anti-patterns in the software usage (yes Martin I’m looking at your systray). Last sprint I took some time to just walk around the desk with all the notebooks to study the Plasma configuration of each hacker and noticed that nobody uses a default setup. That’s quite some important finding to make Plasma better for our users, but also shows the power of configuration of Plasma.

These are just some very few examples for the importance of a sprint. And again the most important part of it is that the users benefit from the sprint. The time will be used to make our software better. If you are a KDE user, I’m quite sure that you want to support such efforts.

At the moment we are raising money to make this sprint possible. At the time of this writing just 144 people donated money. And that’s something that surprises me. KDE software may be free of charge, but the development is nevertheless expensive and the software has quite some worth: according to David A. Wheeler’s SLOCCount it would take 22 Person-Years to develop an application like KWin.

Please think about whether you want to support the efforts of the KDE hackers and make this important sprint possible. It is not a sprint for us, it’s a sprint for you. We want to deliver better software to you.

If you like me think it’s important to constantly support KDE, you could also Join the Game, KDE’s supporting member program. Although I’m donating quite some source code each year to KDE I’m also a supporting member. Oh and if you are living in Germany: KDE e.V. is a gemeinnütziger Verein.

10. September 2012

The relevance of game rendering performance benchmarks to compositors

In my essay about the methodology of game rendering performance benchmarks and how you should not draw any conclusions from incomplete data I did not cover a very important aspect about game rendering performance: the relevance of such benchmarks for OpenGL based compositors.

Like my post from Saturday this post represents my personal opinion and applies to all game rendering benchmarks though the publication of one specific one triggered the writing.

As you might guess from my way of writing and the caption, the conclusion I will provide at the end of this essay, is that such benchmarks are completely irrelevant. So I expect you to run straight to my comment section and tell me that games are extremely relevant and that the lack of games for Linux is the true reason for the death of Linux on the Desktop and Steam and Valve and Steam! Before doing so: I do know all these reasons and I considered them in this evaluation.

The first thing to note is that OpenGL based compositing on X11 introduces a small, but noticeable overhead. This is to be expected and part of the technology in question. There is nothing surprising in that, everybody working in that area knows that. When implementing an compositor you consider the results of that and KWin’s answer to that particular problem is: “Don’t use OpenGL compositing when running games!” For this we have multiple solutions like:

Alt+Shift+F12
KWin Scripts to block compositing
KWin Rules to block compositing
Support for a Window Property to block compositing

In the best case the game uses the provided property to just say “I need all resources, turn off your stupid compositing”. It looks like we will soon have a standardized way for that, thanks to the work done by GNOME developers.

Of course we could also optimize our rendering stack to be better for games. In the brainstorm section the question was already raised why unredirection of fullscreen windows is not enabled by default given the results published in the known benchmark (one of the reasons why I don’t like these benchmarks, so far every time we got bug reports or feature requests based on the incorrect interpretation of the provided data).

Optimizing the defaults means to adjust the settings for the needs of a specific group of users. Unredirection of fullscreen windows has disadvantages (e.g. flickering, crashes with distribution/driver combinations when unlocking the screen) and one has to consider that carefully. Gaming is just one of many possible tasks you can do with a computer. I hardly do games, I use my computer for browsing the Internet, writing lengthy blog posts, hanging out in social networks, sometimes watching TV or a video on youtube and hacking code. All these activities would not benefit from optimizing for games, the introduced flickering would clearly harm my watching TV activity. So if you think games are important, please step back and think how many activities you do and how much of it is gaming.

The next point I want to consider in the discussion is the hardware, especially the screen. As such benchmarks are published to be applied for general consumption we can assume standard hardware and a standard screen renders at 60 Hz. This is an extremely important value to be remembered during the discussion.

Now I know that many people think it’s important to render as many frames as possible, but that is not the case. If you do not reach 60 frames per seconds, it’s true – the more the better. If you reach 61 frames per seconds, you render one frame which will never end up on the screen. It’s not about that it’s more frames than your eye could see, it’s about the screen not being able to render more than 60 frames per seconds physically. Rendering more than 60 frames per second is a waste of your resources, of your CPU, of your GPU of energy, of your money.

Given that we can divide the benchmark results in two categories: those which are below 60 fps and those which are above 60 fps.

A compositor should not have any problems rendering at 60 frames per seconds. That’s what it’s written for and there is nothing easier than rendering a full-screen game. It’s the most simple task. Rendering from top-to-bottom all opaque windows, stopping after rendering the first one (the game) because nothing would end on the screen anyway. Perfect. It’s really the most simple task (considering it uses a sane RGB visual). If a compositor is not able to render that at 60 frames per seconds, something is fundamentally broken.

Let’s start with looking at the first category. I already pointed out on Saturday the one run benchmark where the result has been 10 frames per seconds. This is a result we can discard. It’s not representing the real world. Nobody is going to play a game at 10 frames per seconds. That just hurts your eyes, you won’t do it. The overhead introduces by the compositor does not matter when being in the category “too slow”. We can consider anything underneath 60 frames per seconds as “the hardware is not capable of running that game”, the user should change the settings or upgrade. This will turn the game into having more than 60 frames per seconds.

Before I want to discuss the second category I want to point out another hardware limitation: the GPU. This is a shared resource between the compositor and the game. Both want to use the rendering capabilities provided by the GPU, both want to upload their textures into the RAM of the GPU. Another shared resource is the CPU, but given modern multi-core architectures that luckily hardly matters.

Looking at the data provided we see at least one example where the game renders with more than 120 frames per seconds. That is twice the amount of frames than the hardware is capable to render. The reason for this is probably that the game is run in a kind of benchmark mode to render as many frames as possible. Now that is a nice lab situation but not a real world situation. In real world the game would hopefully cap at 60 frames per second, if not at least I would consider it as a bug.

But what’s the result of trying to provide as many frames as possible. Well it produces overhead. In that case the compositor get’s approximately twice as many damage events from the game than there need to be. Instead of signaling once a frame it’s twice. That means the event needs to be processed twice. It doubles the complete computational overhead to schedule a new frame. It does not only keep the compositor busy, but also the X-Server. So part of the shared resource (CPU) is used in a completely useless way. This of course includes additional context switches and additional CPU usage which would otherwise be available to the game.

Of course given that the game runs at “give me everything which is possible”, the available resources for the compositor are lower. This can of course result in frame drop in the compositor and also in badly influencing the run game.

But it is not a real world situation. The problem that the compositor does not get enough resources or takes away the resources from the game is introduced by the game running at a too high frame rate. So to say it is an academic example. Such benchmarks matter to game developers like Valve who really need to know how fast the game can go, but I’m quite confident that they also know about the side-effects of having a compositor (and other applications) and run it on a blank X-Server which I suggested as the control in my post from Saturday.

Given that I can only conclude that such benchmarks show data which are not relevant to the real world. It’s a lab setup and as so often a lab setup doesn’t fit to reality.

And this shows another problem of the benchmark. It shows nice numbers, but it does not answer the only valid question: is the game playable. In the end does it matter to the user whether a game renders at two frames more or less depending on which compositor is used as long as it doesn’t affect the game play? I would say it doesn’t matter.

It would be possible to setup a benchmark which could highlight regressions in a compositor. But the result would in most cases several bars all at 60 fps and if not, it’s a reason to report a bug and not to do a news posting about it. As an example for a benchmark done right, I want to point to the benchmark provided by Owen Taylor from GNOME Shell fame. Nowadays I know that the data provided in this benchmark nicely shows the performance problem which we fixed in 4.8. Back when the benchmark was published I thought it’s an issue with the benchmark and tried it with all available rendering backends of KWin and always got the same problem (and also studied the code). So yes proper done benchmarks, considering real world situations can be helpful, but even then it needs someone with expertise from the field to interpret the provided data. That’s also an important aspect missing in most benchmarks. A “here we see that KWin is two frames faster than Compiz”, is no interpretations.

9. September 2012

This week in KWin (2012, week 36)

Between writing about game performance benchmarks I also have to publish the report on the activity last week in KWin development.

The major issue this week has been an issue introduced in KWin 4.9.1. Under certain circumstances it was possible that KWin completely froze. From the perspective of a compositor that is the worst bug you can think of. I’m very sorry for introducing this issue and want to apology for any inconveniences.

Luckily the bug report hit us about release time and we were able to notify the packagers the same day and provide a fix for the issues the next day. In best case most distributions have never provided the faulty package to their users.

Apart from that as a reader of my blog you probably already know what happened this week. Some nice performance improvements hit 4.9.2 and 4.10.

Summary

Crash Fixes

Critical Bug Fixes

306260: KWin freezes when navigating between windows
This change will be available in version 4.9.2
Git Commit

Bug Fixes

293044: Kwin + opengl compositing make firefox scrolling jerky.
306457: m_vBlankTime in Options is not initialized
This change will be available in version 4.9.2
Git Commit
306262: Translucency Effect needs isActive() implementation
This change will be available in version 4.9.2
Git Commit
306225: workspace.displayHeight is wrong
This change will be available in version 4.9.2
Git Commit
306263: Animations in Translucency Effect are not working
This change will be available in version 4.9.2
Git Commit
306449: transparency bug in active window
Git Commit

New Features

303756: Allow Scripts to add menus to useractions menu
This change will be available in version 4.10
Git Commit

Tasks

306384: Toplevel::windowType() needs performance improvements
This change will be available in version 4.10
Git Commit
306383: Toplevel::windowType() contains superfluous hacks
This change will be available in version 4.9.2
Git Commit

9. September 20129. September 2012

Help KWin to maintain the Effect Configurations

Last week I asked for help for some non coding tasks and I was positively surprised by the feedback. Thanks a lot, this community just rocks. By the way there are always possibilities to contribute to KDE, for example we are still looking for about 5,000 EUR to finance the very important Randa Sprint.

Given the great result of last weeks experiment I decided to try to setup another task again. This time I want to ensure that people do not work on the same tasks, so I setup a project wiki page to claim the task one is working on.

So what is it about: the configuration interfaces of the KWin Effects are written more or less manually. That is the logic to load, save, set to default of configuration values is written in the code. But KDE supports a great framework called KConfigXT which does all of this automatically. All it needs is to describe the configuration options in an XML file containing the name of an option, the type and the default value. All these information are already present in the current implementation. So it’s just taking the values and bringing them into another format.

The advantage is that we can use that XML description to generate code to be used in the Effect and the configuration. This ensures that typos do not introduce bugs, but also it removes quite some boilerplate code copied in each configuration module. Last but not least it allows us to think about new ways to do the configuration if we ever want to.

So please grab one of the 22 remaining effects to port. I already did the port of the Translucency Effect as an example on how to do it. It’s really not a difficult task and it is very important for the maintenance of KWin. The wiki page includes detailed instructions on how to perform such a task.

8. September 2012

Why I don’t like game rendering performance benchmarks

It’s benchmark season again and as I have raised some concerns about the results of the published benchmark, I was asked to properly explain my concerns without making it look like a rant. So this is what I try with this blog post.

Given the results of the published benchmark, I could go “Wooohooo, KWin’s the fastest!”, but instead I raise concerns. I don’t see that in the data and I hope nobody else sees that in the published data.

First a little bit about my background. After finishing my computer science studies I have been working the last two and a half years in a research department, not as a researcher, but as a developer to support research. Our main tasks are to store and manage scientific results that is experimental data.

You cannot work for more than two years in research without having that influence how you see such data. For me a benchmark is very similar to a scientific experiment.

First you have a hypothesis you come up with (e.g. “Compositing on X11 influences the game rendering performance”), then you start to setup an experiment to prove your hypothesis (e.g. “running PTS on various hardware and distributions”), then you run your experiment multiple times to have statistically relevant data and last but not least you validate your gathered results and go back to step one if something doesn’t fit. All these steps must be properly documented, so that others can reproduce the results. Being able to reproduce the results is most important.

If you don’t follow these steps your experiment/benchmark is not useful to show anything. And then you should not publish it. I’m personally not a fan of the attitude of science to not publish failure, but you should at least make clear that your setup has failures.

Now I want to assume that the published benchmark is a “paper” and that I would have the task to review it.

The first thing I would point out is that the gathered data is not statistically relevant. It is not clear how the environment influenced the results. The benchmark has been performed only on a “Ubuntu 12.10 development snaphot” on one Core i7 “Ivy Bridge” system. This means we don’t know whether the fact that it is a development snapshot has any influence on the result. Maybe there are debug builds? Maybe temporary changed defaults? Also it’s testing unreleased software (e.g. Compiz) against released software (e.g. KWin). So here we have multiple flaws in the experimental setup:

Only one operating system
Only one hardware used
Comparing software in different development stages

Also the fact that it uses an “Ubuntu 12.10 development snapshot” means that one cannot reproduce the results independently as one doesn’t know the exact state of the software in use.

I want to further stress the point of the operating system. I think this is in fact the major flaw of the setup. Looking at e.g. the performance improvements of OpenSUSE 12.2 due to switching the toolchain that is something which can quite influence the results. So we don’t know whether Ubuntu is good or bad to do such benchmarks, we just don’t know. It would have needed to run the benchmark on multiple distributions to perform the same results (yes obviously that’s not possible as Compiz only exists for Ubuntu). Especially for the tests of GNOME Shell this is relevant as Ubuntu is focusing only on Compiz and one doesn’t know how that influences the performance of other systems. Also in general the desktop environments are tested here, but hardly any distribution ships a pure KDE SC version. They all do adjustments, changing settings and so on. One has to gather enough data to ensure that the results are not becoming faulty.

The point of the multiple hardware is of course obvious. The differences between hardware are too large to not be considered. A computer is a highly complex system, an operating system a highly non-deterministic system where changing one piece can have quite some influence. Maybe the Intel drivers are just not suited for gaming, I don’t know and the benchmark neither.

Now let’s move forward and have a look at the individual experiments. The first thing which strikes me is that the standard deviation is missing. This tells me quite a lot about the experimental setup. Given that it doesn’t tell how often the experiment was run (that is how many data sets go into one graph) and the standard deviation not being provided, I assume that the experiment was just run once. This would mean that the experiment is completely flawed and that the gathered data does not provide any statistical significance. If it were a paper I would stop the review here and notify the editor. Just think about Nepomuk starting in the background to index your files while you run the benchmark or the package manager starting to update your system. This would have quite some influence on your result, but you cannot be sure that this happened in the given data.

But let’s assume I continue with looking at the data. Now I think back of the hypothesis we have and I notice that while we have quite some data sets on the influence of desktop environments on the game rendering performance, one important data set is missing: the control. For the given hypothesis only one control can be thought of: running just an X-Server without any desktop environment and run the test there. This would be a very good control as it ensures that there is no overhead introduced by any desktop environment. But it’s missing. Again if I would review this as a paper I would stop here and notify the editor.

Let’s continue nevertheless. I now want to pick out the data for Nexuiz v2.5.2 on resolution 1920×1080. The values are in the range of 9.73 fps (KWin) and 12.79 fps (KWin with unredirection). The latter value is quite higher than the others, so let’s look at the second best: 10.15 fps (LXDE). So we have results of 10 fps vs. 10 fps vs. 10 fps vs 10 fps. Now I’m not a gamer, but I would not want to play a game at 10 fps. For me the result of this specific experiment does not show any difference in the various systems, but just that the hardware is not capable of playing such a game.

At this point I want to stop the analysis of this benchmark. I think it is clear that the this benchmark does not “Prove Ubuntu Unity Very Slow to KDE, GNOME, Xfce, LXDE”, heck that tile is so wrong that I don’t know where to start with. There is no “prove” and there is nothing which shows it to be slow, just look at the example given above: the difference between the frames per seconds is in the non noticeable area. Furthermore it’s just about game rendering performance and only on the one system using a pre-release of Ubuntu. So maybe as a title “Benchmark on a development snapshot of Ubuntu 12.10 shows Unity to slow down game rendering performance on an Intel Ivy Bridge compared to KDE, GNOME, XFCE, LXDE”, yes not very catchy I agree 🙂

My point here is that this doesn’t prove anything and I care about that, because given the methodology of these benchmarks it’s quite likely that the next time a benchmark is published it “proves” KDE to be slowest and then FUD is spread about KDE just like when there was a benchmark “proving” that KDE uses more RAM. You are in a much better position to highlight the flaws of the benchmarks if you are the “winner” of the benchmark, otherwise people tell you that you are in the “denial” stage.

Maybe the most sane approach to handle these benchmarks is to detect the PTS in KWin and to go to benchmark mode, just like games. I have to think about that.

7. September 20127. September 2012

Performance Improvements in KWin 4.9.2 and 4.10

Recently I did some refactoring around the Compositor and there was one change where Thomas was afraid that this would cause a performance regression. So I used Valgrind’s callgrind tool to verify if this is true. And yes the code had a slight performance drop, though it is luckily not in the hot code path and even if the overhead would be rather small.

But having the callgrind log I looked a little bit closer into it, which I haven’t done since the last optimization for I think 4.8. Since that is quite some time ago and I had basically forgotten how it looked like back then, I was shocked about a few results. So I knew that in the last optimization I adjusted all effects to be not active by default except the translucency (and blur) effect.

Now looking at the results I saw that the translucency effect is rather expensive and that by default the effect is not doing anything unless you are moving windows. This is of course a rather unpleasant behavior to have an expensive effect doing nothing. So I looked at the implementation and found a way to better track when the effect should be active. Unless you have enabled the effect to set decorations or inactive windows to translucent the effect is now disabled by default. Just when you start moving a window the effect gets active. And even then the effect performs better.

But there was more into it. So I noticed that there is supposed to be an animation when a window starts to move, but personally I have never seen it. Looking closer at the code I noticed that this could have never worked. I decided that an animation added to 4.1 which has never worked can be dropped which again improves a little bit the performance. We might add a better translucency effect for 4.10 which adds the animation again, but for 4.9.x there is no user visible change by removing the animation.

But still I could not fully understand why the effect is so expensive, all it does is checking the type of the window multiple times: is it a desktop? is it a dialog? and so on. That cannot be that expensive, but it is. I tracked down the expensive call in KCacheGrind and found that the check for the windowType() is expensive.

The code had quite some surprises. It gets the window type, calls the window rules to have user specific overwrites and some hacks to fix some special windows. One of the hacks was to make menus with a certain size being a top menu. This hack must have been for the time when the top menu had not yet been implemented as a kicker applet. Not only is it unlikely that anybody is using such a combination of KWin and old KDE versions also KWin has not supported the top menu at all in any 4.x release and the code got dropped a few releases ago. Which allowed us to savely remove that part.

The second hack is even more intersting. It is a workaround for a dialog in OpenOffice.org 1.x added in 2003. For this hack each time the method was called a complete string comparison had to be done in case the window is a dialog. Again the hack was quite outdated given that on a modern system you don’t have any windows with the name openoffice, but only libreoffice. Also I searched through the LibreOffice help to find the dialog in question and verified the window type: the hack is no longer required. Both hacks are removed for 4.9.2. The lesson to be learnt from that: never add hacks to your application, they stay. In general I would not accept workarounds for applications inside KWin anymore. This clearly belongs to the area of window specific rules and scripts.

But the main optimization of this method will be available in 4.10. The output of callgrind showed that this method was causing quite some expensive dynamic casts. In fact each call caused two dynamic casts to check whether it is a specific sub class and basically the method contained two interwoven implementations for the two specific sub classes. The logical step was to make the method pure virtual and implement it in the sub classes. According to the callgrind logs after the change this improves the performance quite a lot (I cannot say whether this can be noticed by a user, for this it might be too small, but it should be noticable on the battery). Given that this is not just dropping of hacks but a refactoring it cannot go into 4.9.2 as there is still the risk of a regression.

Interestingly when the method got implemented the approach was correct and also not expensive. From within the window manager code path it gets only called very few times, in my dataset it’s about 10 % of the calls coming from the window manager and it seems like most often to be called when a window gets added, so on a longer running session the amount would be even smaller. The code got expensive when it became to be used from within the effects system which is compared to the window manager a rather hot code path. Which is also something important to remember when optimizing: check whether the expected methods are in the hot path. This is now the case for KWin: the most expensive call is the one to render a window, the second most expensive the one which starts the rendering of a frame. And for those I’m already working on further optimizations for 4.10.

7. September 20127. September 2012

Never forget your users or extending the User Actions Menu by KWin Scripts

KWin’s userbase is quite diverse. We have users being in the range from the absolute Computer beginner to the Kernel hacker. Such a diverese user base can be quite challenging for the development. On the one hand you need to keep the UI as simple as possible so that even not so experienced users can use the computer without problems. On the other hand you should not dumb down the application to not upset your very vocal Kernel hackers. There is always the tempation to have the user interface too complex given that one as a developer is an advanced user and does not really see the needs of a normal user.

In the case of KWin we are in the lucky (or unlucky) situation that our target user group should not even know that an application called “KWin” exists. The only user visible place where the name “KWin” is written is the crash dialog – and I do hope that a normal user never sees that one 😉 Of course that means that we never get bug reports or feature suggestions from “normal” users. This means we have to look at each suggested feature and each extension with the thought about our primary user group. In our mission statement we clearly define that KWin should go out of the way and the user should not notice that there is a window manager. This is something to keep always in ones mind.

But nevertheless we have and want to have advanced features. In the mission statement it’s written down that KWin provides a steep learning curve for advanced features. Think for example about things like window specific rules which are not really easy to define whithout understanding of window management.

From time to time it happens that features for advanced users are in direct conflict with our goals and what our primary user group needs. An example for that is the menu to change the window’s opacity in the window decoration menu. To be honest: it’s a quite geeky feature and soooo 200* (look what we can do: translucent windows!). It’s almost an embarrassing feature as it changes the complete window to being translucent instead of adjusting just the background. Not only is this feature quite geeky, it also clutters an already cluttered menu even more.

Based on that the menu got removed after some discussion a few releases ago, but we acknowledged the need of the experienced users and added shortcuts and mouse action support to the window decoration for changing the opacity. So we removed one way at a user visible place and replaced it with several ways to alter the opacity.

Nevertheless over the time I have seen a few complaints about the removal of the feature and it made me think what we can do about it. Of course we don’t want to bring it back, but we also don’t want to anger our advanced users. Since 4.9 we have the wonderful possibility to have scripts and so I decided to extend the scripting support to allow adding entries to it from scripts.

This means since todays git master you can execute the following script to get your Window Opacity menu back:

registerUserActionsMenu(function (client) {
  var opacity = Math.round(client.opacity*100);
  return {
    text: "Window Opacity",
    items: [{
      text: "100 %",
      checkable: true,
      checked: opacity == 100,
      triggered: function () {
 	        client.opacity = 1.0;
      }
    }, {
      text: "75 %",
      checkable: true,
      checked: opacity == 75,
      triggered: function () {
	         client.opacity = 0.75;
      }
    }, {
      text: "50 %",
      checkable: true,
      checked: opacity == 50,
      triggered: function () {
	        client.opacity = 0.5;
      }
    }, {
      text: "25 %",
      checkable: true,
      checked: opacity == 25,
      triggered: function () {
	         client.opacity = 0.25;
      }
    }, {
      text: "10 %",
      checkable: true,
      checked: opacity == 10,
      triggered: function () {
	         client.opacity = 0.1;
      }
    }
    ]
  };
})

I intend to make this script part of 4.10, but before I can do so we need i18n support inside the scripts or the titles of the menus cannot be translated.

Martin's Blog

Posts