About network access, fuzzy specifications and non-POSIX calls with window managers

KWin supports a feature to recognize windows from a remote host. If KWin recognizes such a window it adds the host name (as provided by the property WM_LOCALE_NAME) to the caption. This is a very handy feature in case you work with remote system and use X11 network transparency. But it is also a feature hardly known or needed by most users.

Unfortunately this feature does not work properly for LibreOffice, because LibreOffice uses the FQDN instead of the hostname and KWin checked for the hostname. Some time ago the problem got fixed by using getdomainname to work around the LibreOffice situation. But this does not work in all cases and introduced issues with non-Linux systems as it is a non-POSIX call[1].

Of course one might ask why we don’t fix LibreOffice if only their usage of FQDN is causing a problem instead of fixing KWin. In this case it’s quite simple: LibreOffice is doing it right, everyone else is doing it wrong. Let me quote a section from the NETWM specification:

If _NET_WM_PID is set, the ICCCM-specified property WM_CLIENT_MACHINE MUST also be set. While the ICCCM only requests that WM_CLIENT_MACHINE is set “ to a string that forms the name of the machine running the client as seen from the machine running the server” conformance to this specification requires that WM_CLIENT_MACHINE be set to the fully-qualified domain name of the client’s host.

Of course the specification does not say anything about the case that the client’s host is the localhost. I would assume that the specification only considers the remote host case, but this is my personal interpretation based on the overall fuzziness of the specification. Given that it doesn’t say anything of the local system case, the interpretation of LibreOffice is absolutely correct by providing the FQDN. Also I can understand that one doesn’t want to maintain two code paths.

Now the fun part is that it looks like everyone else is not NETWM compliant in that point. In preparation for this blog post I forwarded an GTK+ based and a Qt (4) based application to another system and looked at the properties. _NET_WM_PID is set and WM_CLIENT_MACHINE contains the hostname, not the FQDN.

The fact that LibreOffice was always considered as a remote system had been unnoticed for quite some time. Personally I’m surprised by that and can only assume that users think that it was supposed to be called like that. I myself try to not use office applications and if I have to I use applications of the Calligra suite. But apparently it got noticed in the Trinity fork and a patch had been prepared there. After my last rant, the developer who wrote the patch, proposed it for inclusion on ReviewBoard. The patch used getaddrinfo to resolve the domain name for the provided client’s hostname. We decided against the patch because it is a blocking call to the network and in KWin we don’t do blocking calls. A blocking call in a window manager is pretty bad as it means that you can no longer interact with your windowing system in any way that would require a window manager. A blocking call in the compositor is deadly as the screen does not get updated any more. The system appears as being frozen. That’s why we have a clear no blocking call policy. This problem has been communicated to the Trinity developers and I suggested them to revert the patch [2].

I had put quite some thought into the problem and realized that it’s not going to be an easy fix and requires some internal rework to ensure that KWin can properly resolve whether a window is on the local machine even if it does not provide the hostname. Recently I sat down and turned my thoughts into working code. The general idea is to split out the complete hostname resolving into an own class to have this encapsulated. This class can provide whether the hostname it is encapsulating is on the local machine, so that we don’t have to query again and again – a problem I noticed when looking at the code: whenever the information was needed, it was queried again, which makes the interaction quite difficult if we are not allowed to do sync calls.

The resolving works as it used to be. So first the hostname comparison is used. If this does not work and the hostname looks like a FQDN then we do the resolving with getaddrinfo, but through a helper class and in a background thread. When the information is finally available a signal is emitted that it’s a local system allowing the external world to react on it to e.g. update the window’s caption. Interestingly when I worked on the code and started with the existing patch I noticed that it did not work correctly. Now a remote window with a FQDN was considered local if the name is resolvable. So overall we have now four approaches to get it right (initial code, first fix, Trinity fix and my fix) which shows that this is quite a non-trivial task and I wanted to be sure that it does not break ever again and if it does that we understand it. Therefore I wrote a unit test to cover the cases. I’m rather happy about that test as it is the first unit test I added which actually talks with X. So far I only wrote test code for non-X11 code.

Unfortunately there are now still cases where the information will be provide too late and a local system will be considered local. For example also the rule system and session management may need this information. Here not much is possible to be done. For window rules it does not really matter as by default we do not match the host at all and even if it’s more likely to try to match remote than local system. Also an adjustment of the rule to match a FQDN would work.

Overall this has been one of the most interesting bug fixes I had worked on recently which motivated this blog post. As this is a rather large change it is not going to be seen in 4.10, but only available in master (4.11). A decision made due to the long time the issue had not been noticed, which implies that it’s not a real issue for our users.

[1] This is a classic example for why I think it doesn’t make sense to state that we support non-Linux Unix systems. We break the code without noticing, because nobody is even trying to compile the code on non-Linux systems (also no CI system). The non-Linux system always have to catch up and fix our stuff and then they have a hard time to get the changes back upstream. This issue got reported to us with a patch attached. But I did not accept the patch as I spotted a possible issue and unfortunately there had not been any further trying on improving the patch. I couldn’t do it as I lacked the operating system to test it. Probably it was easier to carry the patch downstream than trying to get it into shape for upstream inclusion.

[2] That a commit entered Trinity which did not pass code review in KWin due to being dangerous is not surprising. This is no Trinity bashing, but it’s something to be expected when working on a foreign code base in an area that you don’t know. How should a Trinity developer know that you are not allowed to do blocking calls in a window manager? And even if you consider that as being obvious: how should a developer get the feeling to see a piece of code and ask himself instantly “is this blocking?”. This requires experience for working with a window manager and is something I explained to the Trinity developers in my very first mail to them where I suggested to drop their fork of KWin, because of exactly such reasons:

Working on a window manager and compositor comes with great responsibility. It is one of the most complex parts of the desktop environment and introduced bugs affect all users and can be really harmful and very difficult to debug. Developing a window manager is not trivial and you have to understand how the window manager works

And this is not a problem just for Trinity, it’s a problem for all such forks, be it Mate’s Metacity fork or Cinnamon’s Mutter fork “Muffin” or now Consort’s fork of whatever window manager they use at base (probably a GTK3 version of Metacity). I can only suggest to work together with the upstreams and unfork what can be unforked.

7 Replies to “About network access, fuzzy specifications and non-POSIX calls with window managers”

  1. I wonder what happens if the remote system has no proper DNS. Is it possible for this property to end up being an IP address?

    1. good question. It probably depends on how the toolkit implements such a case. But much more interesting is: how would KWin handle such a case? Have to try that one.

  2. >Therefore I wrote a unit test to cover the cases. I’m rather happy about that >test as it is the first unit test I added which actually talks with X

    This is an integration test (tests two or more componentes together)
    A unit test would mock the x server and tests ONLY your code 😉

  3. But what can really be “unforked” if you are unhappy enough with the direction of the project to fork it in the first place? If you dislike the actual architecture, or the architecture no longer supports features you want, and upstream is uninterested in your point of view, well you get a fork.

Comments are closed.