Why Blur Does Not Work in Kubuntu Natty With Intel

Over the last week we received quite some complaints about blur not working after an upgrade to the latest beta of Kubuntu Natty. So far we could not make anything out of it. All users had already been using Plasma Workspaces 4.6.2 in Maverick and were often using the Xorg Edgers drivers. So you would assume that they had more or less the same software versions like before the upgrade.

Furthermore we had not changed anything. We haven’t touched blur effect for some time – a git log does only show changes relevant to our development version. So a very strange happening. Now finally we received a bug report enlightening us. Apparently the Intel driver changed the renderer string in a minor driver release, causing our direct rendering check to disable direct rendering and by that blur cannot be enabled any more.

Here on my notebook running openSUSE 11.4 the renderer string looks like the following: "Mesa DRI Intel(R) Ironlake Mobile GEM 20100330 DEVELOPMENT" Our code to test whether direct rendering is supported looks like this:

// Assume that direct rendering works with DRI2 drivers

const GLubyte *renderer = glGetString(GL_RENDERER);

if (strstr((const char *)renderer, "DRI2"))

return 0;

// The Intel driver doesn’t have DRI2 in the renderer string

if (strstr((const char *)renderer, "GEM"))

return 0;

The code has last been changed on June 13th 2010! The section for Intel driver has been included end of May of last year. So the code is around for nearly a year. Now according to the bug report the Intel driver dropped the "GEM" from the renderer string. With the result that direct rendering does not get enabled anymore.

This change has happened in a minor release of Mesa. And I’m asking: Why? Why change what is not broken? Why introduce such a change in a minor version? Why oh why? I would accept it for a major version. We would have known beforehand and could have adjusted for our next major version. But we cannot change something like that in a minor version. And it would not help the situation. Natty will ship 4.6.2 and our next bug fix release will be after the release of Natty and I won’t accept a patch to that code as I fear that it could break for other Intel users.

I must say that I am very disappointed by this change and very, very sad. After all the trouble of the 4.5 release caused by the broken state of graphics drivers on Linux the driver developers again broke one of their most important downstreams. Maybe the message has not yet come through: nobody will use your drivers on Linux to run fancy games if the basic compositors do not work. So it is completely unacceptable to break Mutter, Compiz or KWin. Those three applications are your most important targets and whatever you do: DON’T BREAK THEM!

The sad state is that the driver developers do not even do the most simple regression test after such a change like starting into Plasma Workspaces and see if everything is working fine. It is also disappointing to see that they do not know what parts of their driver string is parsed by their most important clients. I’m now doing this kind of stuff for more than three years and changing the driver strings seems to be a hobby of the free drivers. The only driver which has in that time not change the pattern is NVIDIA. Apparently they recognize that customers may parse this information and rely on it. But of course as we all are open source evangelists it would be completely unacceptable to recommend the use of the NVIDIA driver or to recommend people to buy NVIDIA cards because their driver is bad, bad, bad just by the fact that it is closed!

Now we all remember the time of the 4.5 release and how the drivers announced support for what they don’t support. At that time the driver developers justified themselves with we should have known better, asked or at least assume that if the drivers say they support version X, that they in fact only support version X-2. Now my question: how should we handle such custom adjustments for broken drivers if the drivers change how they can be recognized?

The last time people complained that we did not communicate with the driver developers, but that I wrote an angry blog post just like this one. So I need to explain: Nobody informed us that the driver string will change. Now I am kind of stupid, I know. Even in my worst nightmares I would not expect that the developers change the version string in minor revision. Still I could talk to them now instead of writing angry blog posts? Sadly I cannot. First of all we were notified about the change too late to do anything. As just explained we were notified after 4.6.2 and no chance to adjust before Natty. Furthermore I was ill last week, are sitting in a plane to the US right now, so I can write blog posts but cannot use Internet and I will stay more or less disconnected throughout the next week.

Luckily KWin is prepared for driver fuckups and there are workarounds to this problem. An affected user can start KWin with "KWIN_DIRECT_GL=1 kwin –replace &" to enforce direct rendering and skip the checks.

A few weeks ago I suggested on our mailing lists to enforce compositing. Overall the response from my fellow developers was that they expected further driver fuckups in the future causing a hard time for our users. I disagreed and pointed out the improved driver situation and that after the 4.5 fiasco the message has got through. How could I have been so naive? It looks like we will never be able to have a decent composited sAetup with this mess of a stack :-(

=-=-=-=-=
Powered by Blogilo

78 thoughts on “Why Blur Does Not Work in Kubuntu Natty With Intel”

  1. *Au* What a mess. May it be an idea to move those check-logic from hardcoded C++ code to QtScript/XML/whatever so it’s possible to replace/modify those logic even after a release using GHNS2?

  2. I told you. You can’t enforce compositing. The drivers will be broken at some point. Currently i have a intel machine where i have to disable compositing, or else firefox/konqueror and some other parts of the desktop won’t update correctly. This is with kubuntu 10.10. it worked perfectly fine on all previous kubuntu releases, and i’m sure such things will happen again. As a user i need a easy way to set the compositing options by myself.

  3. Please correct me if I’m wrong, but isn’t the renderer string simply a label? This seems like a typical web dev scenario where a user agent string is used to determine functionality instead of functionality testing. Is the renderer string really meant to be used this way?

    1. Aaah, that’s brilliant. I can imagine:

      on first boot of your brand new linux:
      “Please sit back and relax! In the next 4 hours, your video card will be tested for 3D compatibility. We have no reliable way to determine what works so you will have to stay here and confirm for every step that indeed it looks like we describe. Expect about 20-30 crashes of your graphics system & a few full system lockups. Oh, and we might break your graphics hardware, sorry about that in advance. Warranty usually doesn’t cover it.”

      Ok, I might be exaggerating a little. But do a “glxinfo” or “glxgears –info” on your commandline to get an idea of what capabilities have to be tested. And much of that needs to be confirmed visually as kwin might figure out if Xorg crashes (or the kernel) but it doesn’t know if you see a black or garbled screen… IOW yes, the drivers report what the support and not and yes, they should do that properly :D

    2. Imagine a browser which is used by 30 to 50 % of your users. When doing the functional test, the browser might crash. No matter what you test, the functional test will evaluate to “supported” with no chance to know if it is true or not. So what will you do? Probably refer to the user agent, right?

      1. Hook sigabrt and any other relevant signal and evaluate to “not supported”?

        Yeah, more work for the developer. Tough.

        1. … yeah, very good idea if the kernel oopses or X dies. the only solution is for the driver developers to start testing before releases.

  4. Well I’m not surprised, I remember reading complaints from Linus about the poor state of the merge request he has from graphic developers (well to be fair not only them)..

  5. +1!!!!!

    It also happens in Debian Unstable with intel drivers, no blur.

    Anyway, I’m surprised with the incredible speed improvements in KWin. Both my intel netbook and my nouveau laptop are totally smooth with 4.6.1, thank you very very much!!

  6. Thanks for that enlightning blog post. So, I wasn‘t the only one experiencing this:)

    I always hated this method of not chaning any crucial things in minor version, even if it would bring major improvement and then you have to wait another half of a year. But the more I follow the development process of applications and workspaces, the more I understand why things are as they are and that it definitly makes sense.
    And since I postponed shipping my products till August anyway, now I definitly have a reason for that, I WANT them shipped with blur no matter what! :P

    1. your comments on IRC were in fact the main help I had to understand the problem properly. Thanks for that.

  7. Hi,

    I tried to tell/warn you about this on:
    https://bugs.kde.org/show_bug.cgi?id=243181#c66

    but you answered with a very confusing comment stating that 4.6 should be used with Mesa >= 7.10. And now that you possible updated to the Mesa 7.10.x release which dropped the GEM string, you’ve actually hit the bug yourself. Sigh.

    Anyway,

    7.9.1 was carrying the GEM string but 7.9.2 dropped it and as I’ve tried to explain the effects which generates window thumbnails segfaults.

    What I’ve got from this sad story is:

    1. Dropping an harmless identifier which several upstream projects may rely on, during a stable release is insane,

    2. Relying on a bare string for being able to check/test/enable/disable 3D stuff is bad but seems the only way now as you have justified your arguments quite well.

    But,

    I think that this should have been caught by compositing people once the Mesa people dropped that in their repository. That way, they would have a chance to revert that and spin another stable release.

    Thanks,
    Regards

    1. You replied to an old and outdated bug report. Sorry that we did not really consider it as many “users” do exactly that. I get between 20 and 50 bug mails each day and correctly get all information from it is difficult. What I replied is even correct: we told you to use 7.10 and your comment did not imply that they broke 7.10 as well.

  8. “Natty will ship 4.6.2 and our next bug fix release will be after the release of Natty and I won’t accept a patch to that code as I fear that it could break for other Intel users.” Does it mean that KDE 4.6.3 will not ship any fix for the bug ?

    1. no I do not consider to fix this in the 4.6 cycle as it might intoduce regressions. I am also unwilling to fix for 4.7, but I accept patches for 4.7

      1. For Natty and as a short term solution, could perhaps the *buntu developers be persuaded to include a patch to add back the missing part of the string to the x.org driver? It would solve the problem for the Natty release and screen the users from such crap.

        1. That’s my hope and one of the main reasons why I wrote that blog post as I am not able to contact the *buntu devs before the release.

          1. I strongly doubt the Mesa change will ever get reverted. See Eric Anholt’s comment. The renderer string is not intended to be parsed by applications at all, instead you should use APIs which check for DRI2 availability specifically.

            1. What would Linus say: never break your userspace. It doesn’t matter if we should not parse it, we did it, so Mesa has at least to provide compatibility for minor releases. I am not risking to break our code and the fix in Mesa is simple.

              1. > so Mesa has at least to provide compatibility for minor releases

                That is not going to help Fedora 15, which is shipping a snapshot of Mesa 7.11.

                In addition, the change was introduced in 7.9.2. The 7.10.x releases Kubuntu is having the issue with are already the next release series. Even if it got reverted in a hypothetical 7.9.3 release, it probably wouldn’t help any distro affected by this.

          2. The reason for the suggestion to patch the x.org driver is simple, purely time-frame. As Martin say there is no time or opportunity to properly test and fix this in KDE. As it sounds you have the same problem in Fedora, time.

            From the rest of the discussion, I see you have two alternatives.
            1. Make a Fedora specific patch to KDE/KWin, and try to get as much testing of it as possible before release.Not exactly risk free, and may introduce unintentionally consequences.
            2. Patch back the string into the x.org driver, returning it to a working well known and tested state. Nearly risk free, and you free up resources to properly implement and test a better solution for 4.7.

        1. The patch does not solve anything as that assumes that all Intel drivers work with direct rendering. This cannot be assumed. If that were the case we would not have gone for the “GEM” check. The GEM check indicates that only a subset of all Intel drivers properly supports direct rendering. So applying that patch carries the risk that it will break KWin completly for some users. And we already have such patches in our bugtracker.

          In an ideal world we would not need such hacks, but we have not an ideal world and we need not only to support todays but also yesterdays drivers.

          1. So I had another look at this as a quick fix. AFAICT, the DRI1 i810 driver does not have “Intel” in its rendering string, but “i810″ or “i815″ (with varying prefixes and suffixes, but nothing with “Intel” in it), see http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i810/i810context.c

            So any current Intel driver shipped in any current GNU/Linux distribution should be the DRI2 version. In other words, the patch should be perfectly safe to ship in a current GNU/Linux distribution.

  9. Eventually that was going to happen, relying on GL_RENDERER for checking GEM support with a simple strstr() gets you *nowhere* … that code looks more like what you do when you want to test something quicky and not write the entire thing, not something that you would give in a stable release to actual users, it’s playing a “break my code” game.

    The people working with the Intel driver at mesa should be ashamed of this ..

    1. That code looks like exactly what you get when you have to expect that the driver crashes when trying to use it. That’s why it is in an own application and not in KWin itself. And that’s how code looks like if you don’t have the time to perform all required checks because the complete desktop start has to wait for that programm to be executed.

      Everything else falls into the area of newer break the userspace. It doesn’t matter if the renderer is not supposed to be used. If we use it, it may never be broken. Talk with Linus about breaking userspace ;-)

  10. How about, instead of making these vendor checks hardcoded, describe them in some sort of text file that could be updated through System Preferences or something?

    1. been there – that was our blacklist for 4.5. We were not able to maintain it. We did not get the information from the users to properly update it.

      1. Well, I still think an unmaintained list of checks is better than an unmaintainable list of checks.
        I mean, besides updating through system settings, the other way to update such a list is simply to release a new KDE version including it, which is the same level of maintenance as having them hardcoded.
        Thus, it would be maintained at least as much as the current solution.

        1. if you want to develop the infrastructure on how to evaluate which driver is currently working and to distribute the updates through GHNS in a way that it gets applied to all users automatically and in a secure way, please contact us. If not, sorry don’t have the time to develop something like that ;-)

  11. Does the “Disable functionality checks” solves this issue? Or is there something that could I do, like an end user, to have my full working KWin, with these fucked up intel drivers, without starting kwin with KWIN_DIRECT_GL=1 on every boot?

  12. If you wanted to know about DRI2, you should have used DRI2QueryVersion instead of relying on some unrelated, we-can’t-seem-to-scream-loud-enough-that-it’s-not-ABI string that happened to change to a different value at about the same time as DRI2 came about.

        1. thanks, but it won’t fix the problems with Mesa 7.11 :-( There are more changes required. The code also needs to be tested with fglrx and nvidia blobs. Which in the end means that I won’t accept it for 4.6 :-( But feel free to include in Fedora.

            1. we have the kwinglplatform code which more or less is a parser to recognize the chip and enable features based on what the chip supports and not what the extension says what is supported. It is our fix to the problems of 4.5. Due to the changes in the renderer strings it is possible that this code got broken. I don’t know it for sure as we did not yet see any bug reports about it. In the end it is like crashes: blur might be enabled on systems not capable of rendering blur in a decent manner and so on and so on.

    1. Uhm, are you SURE that DRI2QueryVersion is the right API to call here?

      I’m not sure I understand the code in the X server, but what I see is that DRI2QueryVersion always returns success if the DRI2 extension is loaded at all. So I don’t understand what value this offers over just checking that DRI2 is available in the server, and in particular I wonder if we don’t need to do more work (e.g. call DRI2Connect) to actually be sure that DRI2 is actually supported by the driver, not just by the X server.

      Then there’s the issue that maybe we don’t want to check for DRI2 at all, but for something different, see https://bugs.kde.org/show_bug.cgi?id=270942#c9

    1. That’s what they did now, according to a comment signed “Mile”. But that’s a distro-specific hack which doesn’t really solve the problem for everyone else, unless it’s applied to Mesa upstream (both 7.10 and 7.11).

  13. Please do fix this for 4.6.3:
    * Fedora 15 will be shipping an even newer Mesa (7.11 20110412 snapshot) than Natty’s (7.10.2).
    * Fedora 15 will either ship KDE SC 4.6.3 or have it pushed as an early (possibly 0-day) bugfix update.
    * If you point us to the fix to backport, we will also happily backport your fix to our 4.6.2 packaging as soon as it’s ready. (You’re complaining about the lack of communication between Mesa and KWin, so please give a good example by communicating with us. You can reach all distro packagers through kde-packager at kde dot oh are gee or Fedora packagers specifically through kdebase-workspace-owner at fedoraproject dot oh are gee.)
    * Fixing that renderer string check to work with the current Mesa is a bugfix and as such perfecly acceptable (in fact, even desired) for a bugfix release of KDE.

    Kubuntu isn’t the only distribution affected by this!

    1. Kevin, sorry but I am just in the US for a business trip – I do not have the time to contact you and it is nothing we can fix in KWin. It needs to be fixed by reverting the offending commit in Mesa. If we “fix” this in KWin there is the chance that other drivers will break, which is an unacceptable risk for a minor release.

      1. This can and should be fixed in KWin by using the DRI2QueryVersion API instead of your string parsing hacks, as pointed out by the Mesa developer Eric Anholt.

        If you don’t come up with a proper patch, it is very likely that Fedora 15 will ship with something like that http://sarvatt.com/downloads/patches/kdebase-workspace.patch patch linked by J. Carlos. (Sorry, I don’t give a darn if the ancient DRI1 i810 driver breaks. We only ever use it for GMA chips older than 915, and even then it’s not installed by default, all DRI1 drivers are in a subpackage which is not installed by default, you get software rendering by default on such ancient chips in Fedora. It shall also be noted that we disable desktop effects by default anyway, exactly because drivers are always broken, so this only affects what happens if they get enabled.) But it is clear that the patch is a hack, just like your existing code.

        1. It is a change I am not willing to accept for a minor revision. In opposite to the Mesa devs I do care of not breaking in minor revisions. Such changes are dangerous especially if you do not have the hardware to test.

          1. Which change is the one you don’t want? Using DRI2QueryVersion or the s/GEM/Intel/ hack or both?

            Using DRI2QueryVersion is the proper fix and should be exactly equivalent to the existing code except for fixing the problem with the latest Mesa Intel driver.

            1. both! The risk of breaking systems is too high for a minor release! Please accept this decision by someone who run into such issues several times. I am not risking our users updating to a new minor revision. There is only one solution: revert the Mesa change and get a proper fix for 4.7 and Mesa 7.11. But not playing catch-up with during minor revisions!

              1. It is you who will have to accept our decision, the decision what to ship in our distribution is not yours to make.

                So if you want us to ship something that works well, it is in your best interest to ship the best possible patch, or we may well end up shipping with whatever hack works best for us.

                1. quite simple: I don’t have the time to do a fix before Fedora will be released. It is not my fault that upstream broke us. I do not have the possibility to write the patch and not the possibility to test it. Sorry but I cannot do anything about it. All I could do is ring the alarm – which I did yesterday.

                  1. (The threading/sorting behavior of this comment board is bizarre, reposting this as a reply.)

                    Our change deadline for Fedora 15 is May 9, that’s almost 3 weeks to try to find a solution. (And if not, there’s always updates. Yes, I would push this kind of change as an update, without hesitating even.)

  14. Our change deadline for Fedora 15 is May 9, that’s almost 3 weeks to try to find a solution. (And if not, there’s always updates. Yes, I would push this kind of change as an update, without hesitating even.)

    1. This week I am at a business trip in the United States, I will be back at home on Saturday and leave to Tokamak on Monday where I will only have the NVIDIA notebook. The week afterwards I have to be at a conference for work – I do not see any chance to work on that before May 9.

  15. The distros can ship mesa packages which have the “GEM” word in the driver string. Doesn’t that solve their problem?

    Of course, it is not the long-term fix for the bug.

  16. This issue is now fixed in natty:

    Version 7.10.2-0ubuntu2:

    [ Felix Geyer ]
    * Add 114_intel_dri_renderer_string.diff: Re-add “GEM” to the dri renderer
    string of the intel driver. Removing it breaks KDE’s detection for the blur
    effect. (LP: #753370)

  17. thank you martin … you helped me understand and over come a few rendering issues on my machine :)

  18. just wanted to add here that since i upgraded to openSuSE 11.4, it’s the first time I get the “blur” effects to work properly with my integrated Intel (965m) graphics. since then, kwin runs perfectly smooth and i experienced no glitches so far.
    it’s a nice counter-example that distribution upgrades can sometimes actually improve your desktop experience :)

  19. It seems a little weird that the entire graphics world is still stuck in this mentality that we need to whitelist or blacklist certain drivers or renderer strings because user trial-and-error is the only way to know what functionality they support.

    In reality, anybody should be able to write a GL application without caring what kind of drivers or hardware it is going to be deployed on, instead using what OpenGL tells the application is actually supported on the driver. If this is broken on the driver, it is never a good idea to work around that by disabling parts for certain driver strings that we don’t know for sure work – instead we should ask the manufacturer of that driver to actually fix their code or only report things in the gl extensions which actually work. If we continue to work around things like this then it only hides the real bugs that the driver developers need to fix and it means that whenever they do something small and possibly unrelated which breaks our workaround, it means that those bugs are revealed in their drivers.

  20. You guys, seriously.

    I’m a nontechnical KDE desktop user. Do you really want to make me patch and compile kdebase-workspace, just because of some squabble between the driver and the desktop developers? Do you expect nontechnical users to even understand this? If I were not as experienced, something like this could drive me away from open source. Just like they say above, don’t break userspace over something like this.

    I side with the distributions that are pushing out teh patch, and with those who say that KDE should accept the patch to work with the new mesa drivers, rather than insisting that the stack be reverted.

    1. I thought I made it clear that we cannot change that in 4.6 without risking breakage for other users of older drivers. We changed the code for 4.7 by requiring the latest Mesa version.

      Now about unexperienced users: they should not be affected as it is the distro’s job to ensure the code works together.

      1. It appears to me that there are patches posted right here in your thread which fix the issue, and that the answer to all of your technical concerns about supporting older drivers falls within your purvey as maintainer of this package. But even when other people do the coding for you, you come back and refuse to accept their patches, either on the ground that you are too busy, or that you don’t want to.

        This leaves regular nontechnical users with no redress, and it will really cost the open source world population. I can point to beginning ubuntu users who are not using newer versions of your desktop environment, or who are not using linux at all, because of this issue.

        If your position now is that it’s the job of the distros to patch your code, you are, I might say, forcing them to do your job because you refuse to, and making everyone’s lives more difficult. Fixes should be made upstream, and otherwise it would be treading on your toes, so it should not be their job to fix flaws which you intentionally leave there.

        1. It appears to me that there are patches posted right here in your thread which fix the issue, and that the answer to all of your technical concerns about supporting older drivers falls within your purvey as maintainer of this package. But even when other people do the coding for you, you come back and refuse to accept their patches, either on the ground that you are too busy, or that you don’t want to.

          Did you check whether any of the patches works? Did you check whether the patches do not introduce any regressions? Do you seriously know the KWin code better than I do? Did you know that I fixed the broken check in 4.7 and that my fix is not based on any of the patches as I did a proper fix?

          This leaves regular nontechnical users with no redress, and it will really cost the open source world population. I can point to beginning ubuntu users who are not using newer versions of your desktop environment, or who are not using linux at all, because of this issue.

          Wow that’s some bullshit. Ubuntu did revert the offending patch in Mesa so no Ubuntu user can be affected by this issue and won’t be affected.

          If your position now is that it’s the job of the distros to patch your code, you are, I might say, forcing them to do your job because you refuse to, and making everyone’s lives more difficult. Fixes should be made upstream, and otherwise it would be treading on your toes, so it should not be their job to fix flaws which you intentionally leave there.

          Yes fixes should be done upstream. Upstream for us is Mesa. The regression in Mesa should have been fixed in Mesa. It was not. So I fixed the situation in our next major version by introducing a slight regression compared to our last major version. I did exactly what you asked for.

          It was impossible to “fix” this in the lifetime of 4.6 as it would have introduced a regression which violates KDE’s policy of doing regression free minor releases. This is something you have to accept. It is a pity that a regression was introduced in a minor release of Mesa without being spotted, but it is no justification to knowingly introduce a regression in our software.

  21. Even then choosing between XAA and EXA was.quite contentious EXA would thrash memory badly while XAA would.effectively disable acceleration for pixmaps as soon as it ran out of its. tiny off-screen space…In moving towards our eventual goal of a KMS GEM DRI2 world weve felt.obligated to avoid removing options until that goal worked best for as many.people as possible. So instead of forcing people to switch to brand new.code that hasnt been entirely stable or fast weve tried to make sure that.each release of the driver has at least continued to work with the older.options…However some of the changes weve made have caused performance regressions.in these older options which doesnt exactly make people happy the old.code runs slow and the new code isnt quite ready for prime time in all.situations.

  22. Did you do anything about that or *they*? Since I cannot get blur to work at all on 4.7. Even using this KWIN_DIRECT_GL=1 does not make it work :( whereas it does just fine on 4.6

Comments are closed.