One of the most important work areas in KWin is currently the porting from XLib to XCB. The main reason for this is that Qt 5 threw out the existing XLib based code and replaced it by a new XCB-based implementation using QPA. This is overall a good thing. But it means that applications which used native features like the X11-event filter need to be adjusted. There are a few areas which need porting, which I summarized for KWin (hopefully completely) in this wiki page. If you use some native interaction it might make sense to check the list.
Technically we would not be required to port everything to XCB, but it gives us quite some advantages to do so. XCB is based directly on the X11 protocol which means we see better in the code what is going on. Also XCB is a complete asynchronous API which means we wait less for the X Server. This brings a noticeable improvement.
I want to present an example from KWin to show the difference between the synchronous XLib approach and the asynchronous XCB approch.
XWindowAttributes attr; Window dummy; Window* windows = NULL; unsigned int count = 0; QRect screen = QRect(0, 0, displayWidth(), displayHeight()); QVector<Window> proxies; XQueryTree(display(), rootWindow(), &dummy, &dummy, &windows, &count); for (unsigned int i = 0; i < count; ++i) { if (m_screenEdgeWindows.contains(windows[i])) continue; if (XGetWindowAttributes(display(), windows[i], &attr)) { if (attr.map_state == IsUnmapped) continue; const QRect geo(attr.x, attr.y, attr.width, attr.height); if (geo.width() < 1 || geo.height() < 1) continue; // some other checks proxies << windows[i]; } }
So what in this code is synchronous? In line 7 we ask the X Server for all windows – of course synchronous. Then we go through the list and get for each window we are interested in the window attributes (line 11). In the X protocol that are two requests to the server – one for the attributes and one for the geometry. Overall this means we have a synchronous call for each window we have to check. I recently looked at a delay when ending Present Windows or Desktop Grid and noticed that this for loop can take up to 500 msec.
Now let’s look at the same thing done with XCB (note: I did not compile it):
xcb_connection_t *c = connection(); xcb_query_tree_cookie_t cookie = xcb_query_tree_unchecked(c, rootWindow()); QVector<xcb_window_t> proxies; QVector<xcb_get_window_attributes_cookie_t> attribsCookies; QVector<xcb_get_window_geometry_cookie_t> geometryCookies; QRect screen = QRect(0, 0, displayWidth(), displayHeight()); xcb_query_tree_reply_t *tree = xcb_query_tree_reply(c, cookie, NULL); xcb_window_t *windows = xcb_query_tree_children(tree); for (unsigned int i=0; i<tree->children_len; i++) { attribsCookies << xcb_get_window_attributes_unchecked(c, windows[i]); geometryCookies << xcb_get_geometry_unchecked(c, windows[i]); } for (unsigned int i=0; i<attribsCookies.size(); i++) { if (m_screenEdgeWindows.contains(windows[i])) { xcb_discard_reply(c, attribCookies.at(i)); xcb_discard_reply(c, geometryCookies.at(i)); continue; } xcb_get_window_attributes_reply_t *attrib = xcb_get_window_attributes_reply(c, attribsCookies.at(i), NULL); if (attrib->map_state == XCB_MAP_STATE_UNMAPPED) { xcb_discard_reply(c, geometryCookies.at(i)); free(attrib); continue; } xcb_get_window_geometry_reply_t *geometry = xcb_get_window_geometry_reply(c, geometryCookies.at(i)); const QRect geo(geometry->x, geometry->y, geometry->width, geometry->height); if (geo.width() < 1 || geo.height() < 1) { free(attrib); free(geometry); continue; } // some other checks; proxies << windows[i]; free(attrib); free(geometry); } free(tree);
This example shows nicely the advantage of having an asynchronous API. We first have one loop to send all requests to the X-Server and then fetch the reply in a second loop. But we also see quite some disadvantages. First of all the_naming_is_just_awfull() and becomes worse when working with extensions. It’s nothing special for XCB, it’s a common problem to all C APIs, I always have to cringe when seeing such code. Also the method arguments are rather unhandy. We need to pass an xcb_connection_t* parameter to each method call. Fair enough the same had been the case with XLib, but still we have just one connection to the X-Server.
If we do not want to check for errors, we have to use the _unchecked variant of the method. It’s again a general C API problem by not having the possibility to overload methods.
And a real problem which is nicely visible in the code fragment is that the reply calls return a pointer and pass the responsibility for the pointer to the client of the API. For someone being used to a memory managed programming language like C++ with Qt that’s really painful. I expect to get back const references and not pointers I have to free.
Also if we do not need the reply for a request, we need to discard it, otherwise we would leak a reply. This makes the whole code segment rather unhandy.
We did not really like it and worked on improving this in general over multiple iterations. The whole thing can easily become memory managed by using a QScopedPointer with a QScopedPointerPodDeleter. That way we don’t have to call free on the replies any more. When the QScopedPointer goes out of scope the memory will be freed implicitly.
But that’s not all we did to improve it and I show now the same code segment how it looks like after applying our Wrapper classes:
QVector<Xcb::WindowId> ownWindows = windows(); Xcb::Tree tree(rootWindow()); QVector<Xcb::WindowAttributes> attributes(tree->children_len); QVector<Xcb::WindowGeometry> geometries(tree->children_len); Xcb::WindowId *windows = tree.children(); QRect screen = QRect(0, 0, displayWidth(), displayHeight()); QVector<Xcb::WindowId> proxies; int count = 0; for (unsigned int i = 0; i < tree->children_len; ++i) { if (ownWindows.contains(windows[i])) { // one of our screen edges continue; } attributes[count] = Xcb::WindowAttributes(windows[i]); geometries[count] = Xcb::WindowGeometry(windows[i]); count++; } for (int i=0; i<count; ++i) { Xcb::WindowAttributes attr(attributes.at(i)); if (attr.isNull() || attr->map_state == XCB_MAP_STATE_UNMAPPED) { continue; } Xcb::WindowGeometry geometry(geometries.at(i)); if (geometry.isNull()) { continue; } const QRect geo(geometry.rect()); if (geo.width() < 1 || geo.height() < 1) { continue; } // further checks proxies << attr.window(); }
It’s a RAII approach ensuring like the QScopedPointer that the memory is freed once the object goes out of scope. In addition we got completely rid off the connection we have to pass to the method calls and we do no longer have to call discard: our Wrapper class is taking care of it. What you can also see is that the fact that the API is asynchronous is completely hidden. It still is async, the request is done in the constructor and the reply is fetched once you try to access the data for the first time.
Overall it makes the code much cleaner and easier to use and ensures that we do not leak memory even if we use loops with various code paths jumping out of them. Also we know that the code works (unit tested) and ensures that we do not repeat ourselves.
In the background we have a templated class which allows us to easily wrap any XCB request/reply group. So far we only support method calls operating on one xcb_window_t argument, but once we enabled C++11 in KWin we can easily wrap any command through variadic templates.
If you find this interesting: we have lots of work for XCB porting. Just come to #kwin on freenode and ping me (mgraesslin) or send a mail to kwin@kde.org.
“the_naming_is_just_awfull()” There’s nothing wrong with underscores. In fact, I find them more pleasing to the eye, less shaky, calmer, closer to the English langage. Stroustrup, the C++ STL and boost use underscores, Ruby and Python use underscores. It’s not like camel case is better or something. You can argue about C’s lack of namespaces (the Q prefixes for Qt types aren’t better, btw), I’ll give you that, but not about underscores.
I consider it as awfull because my code lines get to long. In the xcb only example I split the query tree into two parts. Normally we would do that in a single call resulting in something like: tree = xcb_query_tree_reply(connection(), xcb_query_tree_unchecked(connection(), rootWindow()), NULL);
ScopedCPointer
That’s an example which still fitt’s on the screen, so let’s take another one from the composite extension: overlay = xcb_composite_get_overlay_window_reply(connection(), xcb_composite_get_overlay_window_unchecked(connection(), rootWindow()), NULL);
ScopedCPointer
It just doesn’t fit in a line any more and that makes it awfull.Yes it’s the lack of namespacing which is worked around by name_othername_ and that’s something I don’t like. And yes I find it bad that Qt doesn’t use namespaces and I would have considered it a huge improvment if all classes would have dropped the Q prefix. E.g. QString -> Qt::String
That I prefer camel case over underscores is of course personal taste.
Thinking about it more: disliking underscores is in the context of our software development not personal taste. All the libraries we use, use camel case, so using xcb with underscores is a complete violation of the coding style we use. People might discard that as not important, but in my opinion it’s important to have an easy to read code base. Consistency in the coding is important. Either everything underscore or everything camel case. But no some things that way, some thing another way.
In the context of Qt and KDE you’re right, but in the context of the Linux plumbing layer or the kernel it’s perfectly in style.
Re long lines: nothing wrong with breaking a statement into multiple lines or assigning the result of a function call to a local variable.
Obviously I’m only talking about Qt and KDE. After all I’m a KDE developer and do not care develop on the Linux plumbing layer or the kernel (wouldn’t touch that as it’s C and not C++).
I find breaking long lines as bad coding style. It just shows that something is wrong with the coding. Either too many indentation levels or too many arguments passed to a method. I have not seen a case where it wouldn’t show that serious refactoring is needed. Same is obvious true for assigning to local variables. If I assign a variable to use it only once because otherwise the line gets too long, it is just a sign that something is wrong in general.
As you can see from these comments: the look of code is very important to me.
How’s the port to Canonicals new display server progressing?
Well it got a little bit pushed back as I was too long laughing about a nice comment on reddit:
Seriously: I doubt that Canonical will succeed if they go for an own display server. I always have to remember how MS promised Unity running on Wayland in 12.04. I cannot see how toolkits will add support for a system used by only one distribution and I cannot think how software will be ported to it and I cannot imagine driver vendors adjusting just for Ubuntu. It will be difficult enough to get Wayland support, but that’s at least pushed by Intel.
It would probably mean the end to all alternative Ubuntu distributions like Kubuntu, Lubuntu, Mint, etc. Overall good luck, but I doubt we will ever see anything of it.
My personal guess is that it’s just a rebranded/forked Wayland, just like the Ubuntu Kernel.
Oh and of course it will be no big deal to adopt KWin to any other windowing system once we have ported to Wayland. But I don’t see that happening for an Ubuntu system given that it’s not possible to run Ubuntu software outside Ubuntu and given the way how Unity evolves I doubt that Canonical actually has interest in making it possible to run their stack on something else than Unity.
They are heading in the same “fork the whole middleware” like Android. Systemd is the best example, so going for an own display manager is consequent in the current strategy. It’s sad to see this split in the Linux world happening, but it’s nothing really surprising.
The XCB code (both versions) still looks very synchronous to me, i.e. I don’t see any jump/exit back to the event loop.
As far as I can tell the only difference is that the XCB versions are using command pipelining.
yeah it blocks once you access the reply
In the beginning you mention that you see a delay of up to 500ms for the xlib code. What’s the corresponding delay for the xcb code?
in the “I can no longer properly measure it” department