In my essay about the methodology of game rendering performance benchmarks and how you should not draw any conclusions from incomplete data I did not cover a very important aspect about game rendering performance: the relevance of such benchmarks for OpenGL based compositors.
Like my post from Saturday this post represents my personal opinion and applies to all game rendering benchmarks though the publication of one specific one triggered the writing.
As you might guess from my way of writing and the caption, the conclusion I will provide at the end of this essay, is that such benchmarks are completely irrelevant. So I expect you to run straight to my comment section and tell me that games are extremely relevant and that the lack of games for Linux is the true reason for the death of Linux on the Desktop and Steam and Valve and Steam! Before doing so: I do know all these reasons and I considered them in this evaluation.
The first thing to note is that OpenGL based compositing on X11 introduces a small, but noticeable overhead. This is to be expected and part of the technology in question. There is nothing surprising in that, everybody working in that area knows that. When implementing an compositor you consider the results of that and KWin’s answer to that particular problem is: “Don’t use OpenGL compositing when running games!” For this we have multiple solutions like:
- KWin Scripts to block compositing
- KWin Rules to block compositing
- Support for a Window Property to block compositing
In the best case the game uses the provided property to just say “I need all resources, turn off your stupid compositing”. It looks like we will soon have a standardized way for that, thanks to the work done by GNOME developers.
Of course we could also optimize our rendering stack to be better for games. In the brainstorm section the question was already raised why unredirection of fullscreen windows is not enabled by default given the results published in the known benchmark (one of the reasons why I don’t like these benchmarks, so far every time we got bug reports or feature requests based on the incorrect interpretation of the provided data).
Optimizing the defaults means to adjust the settings for the needs of a specific group of users. Unredirection of fullscreen windows has disadvantages (e.g. flickering, crashes with distribution/driver combinations when unlocking the screen) and one has to consider that carefully. Gaming is just one of many possible tasks you can do with a computer. I hardly do games, I use my computer for browsing the Internet, writing lengthy blog posts, hanging out in social networks, sometimes watching TV or a video on youtube and hacking code. All these activities would not benefit from optimizing for games, the introduced flickering would clearly harm my watching TV activity. So if you think games are important, please step back and think how many activities you do and how much of it is gaming.
The next point I want to consider in the discussion is the hardware, especially the screen. As such benchmarks are published to be applied for general consumption we can assume standard hardware and a standard screen renders at 60 Hz. This is an extremely important value to be remembered during the discussion.
Now I know that many people think it’s important to render as many frames as possible, but that is not the case. If you do not reach 60 frames per seconds, it’s true – the more the better. If you reach 61 frames per seconds, you render one frame which will never end up on the screen. It’s not about that it’s more frames than your eye could see, it’s about the screen not being able to render more than 60 frames per seconds physically. Rendering more than 60 frames per second is a waste of your resources, of your CPU, of your GPU of energy, of your money.
Given that we can divide the benchmark results in two categories: those which are below 60 fps and those which are above 60 fps.
A compositor should not have any problems rendering at 60 frames per seconds. That’s what it’s written for and there is nothing easier than rendering a full-screen game. It’s the most simple task. Rendering from top-to-bottom all opaque windows, stopping after rendering the first one (the game) because nothing would end on the screen anyway. Perfect. It’s really the most simple task (considering it uses a sane RGB visual). If a compositor is not able to render that at 60 frames per seconds, something is fundamentally broken.
Let’s start with looking at the first category. I already pointed out on Saturday the one run benchmark where the result has been 10 frames per seconds. This is a result we can discard. It’s not representing the real world. Nobody is going to play a game at 10 frames per seconds. That just hurts your eyes, you won’t do it. The overhead introduces by the compositor does not matter when being in the category “too slow”. We can consider anything underneath 60 frames per seconds as “the hardware is not capable of running that game”, the user should change the settings or upgrade. This will turn the game into having more than 60 frames per seconds.
Before I want to discuss the second category I want to point out another hardware limitation: the GPU. This is a shared resource between the compositor and the game. Both want to use the rendering capabilities provided by the GPU, both want to upload their textures into the RAM of the GPU. Another shared resource is the CPU, but given modern multi-core architectures that luckily hardly matters.
Looking at the data provided we see at least one example where the game renders with more than 120 frames per seconds. That is twice the amount of frames than the hardware is capable to render. The reason for this is probably that the game is run in a kind of benchmark mode to render as many frames as possible. Now that is a nice lab situation but not a real world situation. In real world the game would hopefully cap at 60 frames per second, if not at least I would consider it as a bug.
But what’s the result of trying to provide as many frames as possible. Well it produces overhead. In that case the compositor get’s approximately twice as many damage events from the game than there need to be. Instead of signaling once a frame it’s twice. That means the event needs to be processed twice. It doubles the complete computational overhead to schedule a new frame. It does not only keep the compositor busy, but also the X-Server. So part of the shared resource (CPU) is used in a completely useless way. This of course includes additional context switches and additional CPU usage which would otherwise be available to the game.
Of course given that the game runs at “give me everything which is possible”, the available resources for the compositor are lower. This can of course result in frame drop in the compositor and also in badly influencing the run game.
But it is not a real world situation. The problem that the compositor does not get enough resources or takes away the resources from the game is introduced by the game running at a too high frame rate. So to say it is an academic example. Such benchmarks matter to game developers like Valve who really need to know how fast the game can go, but I’m quite confident that they also know about the side-effects of having a compositor (and other applications) and run it on a blank X-Server which I suggested as the control in my post from Saturday.
Given that I can only conclude that such benchmarks show data which are not relevant to the real world. It’s a lab setup and as so often a lab setup doesn’t fit to reality.
And this shows another problem of the benchmark. It shows nice numbers, but it does not answer the only valid question: is the game playable. In the end does it matter to the user whether a game renders at two frames more or less depending on which compositor is used as long as it doesn’t affect the game play? I would say it doesn’t matter.
It would be possible to setup a benchmark which could highlight regressions in a compositor. But the result would in most cases several bars all at 60 fps and if not, it’s a reason to report a bug and not to do a news posting about it. As an example for a benchmark done right, I want to point to the benchmark provided by Owen Taylor from GNOME Shell fame. Nowadays I know that the data provided in this benchmark nicely shows the performance problem which we fixed in 4.8. Back when the benchmark was published I thought it’s an issue with the benchmark and tried it with all available rendering backends of KWin and always got the same problem (and also studied the code). So yes proper done benchmarks, considering real world situations can be helpful, but even then it needs someone with expertise from the field to interpret the provided data. That’s also an important aspect missing in most benchmarks. A “here we see that KWin is two frames faster than Compiz”, is no interpretations.