Server : Apache System : Linux server1.cgrithy.com 3.10.0-1160.95.1.el7.x86_64 #1 SMP Mon Jul 24 13:59:37 UTC 2023 x86_64 User : nobody ( 99) PHP Version : 8.1.23 Disable Function : NONE Directory : /usr/share/doc/systemd/ |
A few hints on supporting kdbus as backend in your favorite D-Bus library. ~~~ Before you read this, have a look at the DIFFERENCES and GVARIANT_SERIALIZATION texts you find in the same directory where you found this. We invite you to port your favorite D-Bus protocol implementation over to kdbus. However, there are a couple of complexities involved. On kdbus we only speak GVariant marshaling, kdbus clients ignore traffic in dbus1 marshaling. Thus, you need to add a second, GVariant compatible marshaler to your library first. After you have done that: here's the basic principle how kdbus works: You connect to a bus by opening its bus node in /sys/fs/kdbus/. All buses have a device node there, it starts with a numeric UID of the owner of the bus, followed by a dash and a string identifying the bus. The system bus is thus called /sys/fs/kdbus/0-system, and for user buses the device node is /sys/fs/kdbus/1000-user (if 1000 is your user id). (Before we proceed, please always keep a copy of libsystemd next to you, ultimately that's where the details are, this document simply is a rough overview to help you grok things.) CONNECTING To connect to a bus, simply open() its device node and issue the KDBUS_CMD_HELLO call. That's it. Now you are connected. Do not send Hello messages or so (as you would on dbus1), that does not exist for kdbus. The structure you pass to the ioctl will contain a couple of parameters that you need to know, to operate on the bus. There are two flags fields, one indicating features of the kdbus kernel side ("conn_flags"), the other one ("bus_flags") indicating features of the bus owner (i.e. systemd). Both flags fields are 64bit in width. When calling into the ioctl, you need to place your own supported feature bits into these fields. This tells the kernel about the features you support. When the ioctl returns, it will contain the features the kernel supports. If any of the higher 32bit are set on the two flags fields and your client does not know what they mean, it must disconnect. The upper 32bit are used to indicate "incompatible" feature additions on the bus system, the lower 32bit indicate "compatible" feature additions. A client that does not support a "compatible" feature addition can go on communicating with the bus, however a client that does not support an "incompatible" feature must not proceed with the connection. When a client encountes such an "incompatible" feature it should immediately try the next bus address configured in the bus address string. The hello structure also contains another flags field "attach_flags" which indicates metadata that is optionally attached to all incoming messages. You probably want to set KDBUS_ATTACH_NAMES unconditionally in it. This has the effect that all well-known names of a sender are attached to all incoming messages. You need this information to implement matches that match on a message sender name correctly. Of course, you should only request the attachment of as little metadata fields as you need. The kernel will return in the "id" field your unique id. This is a simple numeric value. For compatibility with classic dbus1 simply format this as string and prefix ":1.". The kernel will also return the bloom filter size and bloom filter hash function number used for the signal broadcast bloom filter (see below). The kernel will also return the bus ID of the bus in a 128bit field. The pool size field specifies the size of the memory mapped buffer. After the calling the hello ioctl, you should memory map the kdbus fd. In this memory mapped region, the kernel will place all your incoming messages. SENDING MESSAGES Use the MSG_SEND ioctl to send a message to another peer. The ioctl takes a structure that contains a variety of fields: The flags field corresponds closely to the old dbus1 message header flags field, though the DONT_EXPECT_REPLY field got inverted into EXPECT_REPLY. The dst_id/src_id field contains the unique id of the destination and the sender. The sender field is overridden by the kernel usually, hence you shouldn't fill it in. The destination field can also take the special value KDBUS_DST_ID_BROADCAST for broadcast messages. For messages intended to a well-known name set the field to KDBUS_DST_ID_NAME, and attach the name in a special "items" entry to the message (see below). The payload field indicates the payload. For all dbus traffic it should carry the value 0x4442757344427573ULL. (Which encodes 'DBusDBus'). The cookie field corresponds with the "serial" field of classic dbus1. We simply renamed it here (and extended it to 64bit) since we didn't want to imply the monotonicity of the assignment the way the word "serial" indicates it. When sending a message that expects a reply, you need to set the EXPECT_REPLY flag in the message flag field. In this case you should also fill out the "timeout_ns" value which indicates the timeout in nsec for this call. If the peer does not respond in this time you will get a notification of a timeout. Note that this is also used for security purposes: a single reply messages is only allowed through the bus as long as the timeout has not ended. With this timeout value you hence "open a time window" in which the peer might respond to your request and the policy allows the response to go through. When sending a message that is a reply, you need to fill in the cookie_reply field, which is similar to the reply_serial field of dbus1. Note that a message cannot have EXPECT_REPLY and a reply_serial at the same time! This pretty much explains the ioctl header. The actual payload of the data is now referenced in additional items that are attached to this ioctl header structure at the end. When sending a message, you attach items of the type PAYLOAD_VEC, PAYLOAD_MEMFD, FDS, BLOOM_FILTER, DST_NAME to it: KDBUS_ITEM_PAYLOAD_VEC: contains a pointer + length pair for referencing arbitrary user memory. This is how you reference most of your data. It's a lot like the good old iovec structure of glibc. KDBUS_ITEM_PAYLOAD_MEMFD: for large data blocks it is preferable to send prepared "memfds" (see below) over. This item contains an fd for a memfd plus a size. KDBUS_ITEM_FDS: for sending over fds attach an item of this type with an array of fds. KDBUS_ITEM_BLOOM_FILTER: the calculated bloom filter of this message, only for undirected (broadcast) message. KDBUS_ITEM_DST_NAME: for messages that are directed to a well-known name (instead of a unique name), this item contains the well-known name field. A single message may consists of no, one or more payload items of type PAYLOAD_VEC or PAYLOAD_MEMFD. D-Bus protocol implementations should treat them as a single block that just happens to be split up into multiple items. Some restrictions apply however: The message header in its entirety must be contained in a single PAYLOAD_VEC item. You may only split your message up right in front of each GVariant contained in the payload, as well is immediately before framing of a Gvariant, as well after as any padding bytes if there are any. The padding bytes must be wholly contained in the preceding PAYLOAD_VEC/PAYLOAD_MEMFD item. You may not split up basic types nor arrays of fixed types. The latter is necessary to allow APIs to return direct pointers to linear arrays of numeric values. Examples: The basic types "u", "s", "t" have to be in the same payload item. The array of fixed types "ay", "ai" have to be fully in contained in the same payload item. For an array "as" or "a(si)" the only restriction however is to keep each string individually in an uninterrupted item, to keep the framing of each element and the array in a single uninterrupted item, however the various strings might end up in different items. Note again, that splitting up messages into separate items is up to the implementation. Also note that the kdbus kernel side might merge separate items if it deems this to be useful. However, the order in which items are contained in the message is left untouched. PAYLOAD_MEMFD items allow zero-copy data transfer (see below regarding the memfd concept). Note however that the overhead of mapping these makes them relatively expensive, and only worth the trouble for memory blocks > 512K (this value appears to be quite universal across architectures, as we tested). Thus we recommend sending PAYLOAD_VEC items over for small messages and restore to PAYLOAD_MEMFD items for messages > 512K. Since while building up the message you might not know yet whether it will grow beyond this boundary a good approach is to simply build the message unconditionally in a memfd object. However, when the message is sealed to be sent away check for the size limit. If the size of the message is < 512K, then simply send the data as PAYLOAD_VEC and reuse the memfd. If it is >= 512K, seal the memfd and send it as PAYLOAD_MEMFD, and allocate a new memfd for the next message. RECEIVING MESSAGES Use the MSG_RECV ioctl to read a message from kdbus. This will return an offset into the pool memory map, relative to its beginning. The received message structure more or less follows the structure of the message originally sent. However, certain changes have been made. In the header the src_id field will be filled in. The payload items might have gotten merged and PAYLOAD_VEC items are not used. Instead, you will only find PAYLOAD_OFF and PAYLOAD_MEMFD items. The former contain an offset and size into your memory mapped pool where you find the payload. If during the HELLO ioctl you asked for getting metadata attached to your message, you will find additional KDBUS_ITEM_CREDS, KDBUS_ITEM_PID_COMM, KDBUS_ITEM_TID_COMM, KDBUS_ITEM_TIMESTAMP, KDBUS_ITEM_EXE, KDBUS_ITEM_CMDLINE, KDBUS_ITEM_CGROUP, KDBUS_ITEM_CAPS, KDBUS_ITEM_SECLABEL, KDBUS_ITEM_AUDIT items that contain this metadata. This metadata will be gathered from the sender at the point in time it sends the message. This information is uncached, and since it is appended by the kernel, trustable. The KDBUS_ITEM_SECLABEL item usually contains the SELinux security label, if it is used. After processing the message you need to call the KDBUS_CMD_FREE ioctl, which releases the message from the pool, and allows the kernel to store another message there. Note that the memory used by the pool is ordinary anonymous, swappable memory that is backed by tmpfs. Hence there is no need to copy the message out of it quickly, instead you can just leave it there as long as you need it and release it via the FREE ioctl only after that's done. BLOOM FILTERS The kernel does not understand dbus marshaling, it will not look into the message payload. To allow clients to subscribe to specific subsets of the broadcast matches we employ bloom filters. When broadcasting messages, a bloom filter needs to be attached to the message in a KDBUS_ITEM_BLOOM item (and only for broadcasting messages!). If you don't know what bloom filters are, read up now on Wikipedia. In short: they are a very efficient way how to probabilistically check whether a certain word is contained in a vocabulary. It knows no false negatives, but it does know false positives. The parameters for the bloom filters that need to be included in broadcast message is communicated to userspace as part of the hello response structure (see above). By default it has the parameters m=512 (bits in the filter), k=8 (nr of hash functions). Note however, that this is subject to change in later versions, and userspace implementations must be capable of handling m values between at least m=8 and m=2^32, and k values between at least k=1 and k=32. The underlying hash function is SipHash-2-4. It is used with a number of constant (yet originally randomly generated) 128bit hash keys, more specifically: b9,66,0b,f0,46,70,47,c1,88,75,c4,9c,54,b9,bd,15, aa,a1,54,a2,e0,71,4b,39,bf,e1,dd,2e,9f,c5,4a,3b, 63,fd,ae,be,cd,82,48,12,a1,6e,41,26,cb,fa,a0,c8, 23,be,45,29,32,d2,46,2d,82,03,52,28,fe,37,17,f5, 56,3b,bf,ee,5a,4f,43,39,af,aa,94,08,df,f0,fc,10, 31,80,c8,73,c7,ea,46,d3,aa,25,75,0f,9e,4c,09,29, 7d,f7,18,4b,7b,a4,44,d5,85,3c,06,e0,65,53,96,6d, f2,77,e9,6f,93,b5,4e,71,9a,0c,34,88,39,25,bf,35 When calculating the first bit index into the bloom filter, the SipHash-2-4 hash value is calculated for the input data and the first 16 bytes of the array above as hash key. Of the resulting 8 bytes of output, as many full bytes are taken for the bit index as necessary, starting from the output's first byte. For the second bit index the same hash value is used, continuing with the next unused output byte, and so on. Each time the bytes returned by the hash function are depleted it is recalculated with the next 16 byte hash key from the array above and the same input data. For each message to send across the bus we populate the bloom filter with all possible matchable strings. If a client then wants to subscribe to messages of this type, it simply tells the kernel to test its own calculated bit mask against the bloom filter of each message. More specifically, the following strings are added to the bloom filter of each message that is broadcasted: The string "interface:" suffixed by the interface name The string "member:" suffixed by the member name The string "path:" suffixed by the path name The string "path-slash-prefix:" suffixed with the path name, and also all prefixes of the path name (cut off at "/"), also prefixed with "path-slash-prefix". The string "message-type:" suffixed with the strings "signal", "method_call", "error" or "method_return" for the respective message type of the message. If the first argument of the message is a string, "arg0:" suffixed with the first argument. If the first argument of the message is a string, "arg0-dot-prefix" suffixed with the first argument, and also all prefixes of the argument (cut off at "."), also prefixed with "arg0-dot-prefix". If the first argument of the message is a string, "arg0-slash-prefix" suffixed with the first argument, and also all prefixes of the argument (cut off at "/"), also prefixed with "arg0-slash-prefix". Similar for all further arguments that are strings up to 63, for the arguments and their "dot" and "slash" prefixes. On the first argument that is not a string, addition to the bloom filter should be stopped however. (Note that the bloom filter does not contain sender nor receiver names!) When a client wants to subscribe to messages matching a certain expression, it should calculate the bloom mask following the same algorithm. The kernel will then simply test the mask against the attached bloom filters. Note that bloom filters are probabilistic, which means that clients might get messages they did not expect. Your bus protocol implementation must be capable of dealing with these unexpected messages (which it needs to anyway, given that transfers are relatively unrestricted on kdbus and people can send you all kinds of non-sense). If a client connects to a bus whose bloom filter metrics (i.e. filter size and number of hash functions) are outside of the range the client supports it must immediately disconnect and continue connection with the next bus address of the bus connection string. INSTALLING MATCHES To install matches for broadcast messages, use the KDBUS_CMD_ADD_MATCH ioctl. It takes a structure that contains an encoded match expression, and that is followed by one or more items, which are combined in an AND way. (Meaning: a message is matched exactly when all items attached to the original ioctl struct match). To match against other user messages add a KDBUS_ITEM_BLOOM item in the match (see above). Note that the bloom filter does not include matches to the sender names. To additionally check against sender names, use the KDBUS_ITEM_ID (for unique id matches) and KDBUS_ITEM_NAME (for well-known name matches) item types. To match against kernel generated messages (see below) you should add items of the same type as the kernel messages include, i.e. KDBUS_ITEM_NAME_ADD, KDBUS_ITEM_NAME_REMOVE, KDBUS_ITEM_NAME_CHANGE, KDBUS_ITEM_ID_ADD, KDBUS_ITEM_ID_REMOVE and fill them out. Note however, that you have some wildcards in this case, for example the .id field of KDBUS_ITEM_ID_ADD/KDBUS_ITEM_ID_REMOVE structures may be set to 0 to match against any id addition/removal. Note that dbus match strings do no map 1:1 to these ioctl() calls. In many cases (where the match string is "underspecified") you might need to issue up to six different ioctl() calls for the same match. For example, the empty match (which matches against all messages), would translate into one KDBUS_ITEM_BLOOM ioctl, one KDBUS_ITEM_NAME_ADD, one KDBUS_ITEM_NAME_CHANGE, one KDBUS_ITEM_NAME_REMOVE, one KDBUS_ITEM_ID_ADD and one KDBUS_ITEM_ID_REMOVE. When creating a match, you may attach a "cookie" value to them, which is used for deleting this match again. The cookie can be selected freely by the client. When issuing KDBUS_CMD_REMOVE_MATCH, simply pass the same cookie as before and all matches matching the same "cookie" value will be removed. This is particularly handy for the case where multiple ioctl()s are added for a single match strings. MEMFDS memfds may be sent across kdbus via KDBUS_ITEM_PAYLOAD_MEMFD items attached to messages. If this is done, the data included in the memfd is considered part of the payload stream of a message, and are treated the same way as KDBUS_ITEM_PAYLOAD_VEC by the receiving side. It is possible to interleave KDBUS_ITEM_PAYLOAD_MEMFD and KDBUS_ITEM_PAYLOAD_VEC items freely, by the reader they will be considered a single stream of bytes in the order these items appear in the message, that just happens to be split up at various places (regarding rules how they may be split up, see above). The kernel will refuse taking KDBUS_ITEM_PAYLOAD_MEMFD items that refer to memfds that are not sealed. Note that sealed memfds may be unsealed again if they are not mapped you have the only fd reference to them. Alternatively to sending memfds as KDBUS_ITEM_PAYLOAD_MEMFD items (where they are just a part of the payload stream of a message) you can also simply attach any memfd to a message using KDBUS_ITEM_PAYLOAD_FDS. In this case, the memfd contents is not considered part of the payload stream of the message, but simply fds like any other, that happen to be attached to the message. MESSAGES FROM THE KERNEL A couple of messages previously generated by the dbus1 bus driver are now generated by the kernel. Since the kernel does not understand the payload marshaling, they are generated by the kernel in a different format. This is indicated with the "payload type" field of the messages set to 0. Library implementations should take these messages and synthesize traditional driver messages for them on reception. More specifically: Instead of the NameOwnerChanged, NameLost, NameAcquired signals there are kernel messages containing KDBUS_ITEM_NAME_ADD, KDBUS_ITEM_NAME_REMOVE, KDBUS_ITEM_NAME_CHANGE, KDBUS_ITEM_ID_ADD, KDBUS_ITEM_ID_REMOVE items are generated (each message will contain exactly one of these items). Note that in libsystemd we have obsoleted NameLost/NameAcquired messages, since they are entirely redundant to NameOwnerChanged. This library will hence only synthesize NameOwnerChanged messages from these kernel messages, and never generate NameLost/NameAcquired. If your library needs to stay compatible to the old dbus1 userspace, you possibly might need to synthesize both a NameOwnerChanged and NameLost/NameAcquired message from the same kernel message. When a method call times out, a KDBUS_ITEM_REPLY_TIMEOUT message is generated. This should be synthesized into a method error reply message to the original call. When a method call fails because the peer terminated the connection before responding, a KDBUS_ITEM_REPLY_DEAD message is generated. Similarly, it should be synthesized into a method error reply message. For synthesized messages we recommend setting the cookie field to (uint32_t) -1 (and not (uint64_t) -1!), so that the cookie is not 0 (which the dbus1 spec does not allow), but clearly recognizable as synthetic. Note that the KDBUS_ITEM_NAME_XYZ messages will actually inform you about all kinds of names, including activatable ones. Classic dbus1 NameOwnerChanged messages OTOH are only generated when a name is really acquired on the bus and not just simply activatable. This means you must explicitly check for the case where an activatable name becomes acquired or an acquired name is lost and returns to be activatable. NAME REGISTRY To acquire names on the bus, use the KDBUS_CMD_NAME_ACQUIRE ioctl(). It takes a flags field similar to dbus1's RequestName() bus driver call, however the NO_QUEUE flag got inverted into a QUEUE flag instead. To release a previously acquired name use the KDBUS_CMD_NAME_RELEASE ioctl(). To list acquired names use the KDBUS_CMD_CONN_INFO ioctl. It may be used to list unique names, well known names as well as activatable names and clients currently queuing for ownership of a well-known name. The ioctl will return an offset into the memory pool. After reading all the data you need, you need to release this via the KDBUS_CMD_FREE ioctl(), similar how you release a received message. CREDENTIALS kdbus can optionally attach various kinds of metadata about the sender at the point of time of sending ("credentials") to messages, on request of the receiver. This is both supported on directed and undirected (broadcast) messages. The metadata to attach is selected at time of the HELLO ioctl of the receiver via a flags field (see above). Note that clients must be able to handle that messages contain more metadata than they asked for themselves, to simplify implementation of broadcasting in the kernel. The receiver should not rely on this data to be around though, even though it will be correct if it happens to be attached. In order to avoid programming errors in applications, we recommend though not passing this data on to clients that did not explicitly ask for it. Credentials may also be queried for a well-known or unique name. Use the KDBUS_CMD_CONN_INFO for this. It will return an offset to the pool area again, which will contain the same credential items as messages have attached. Note that when issuing the ioctl, you can select a different set of credentials to gather, than what was originally requested for being attached to incoming messages. Credentials are always specific to the sender's domain that was current at the time of sending, and of the process that opened the bus connection at the time of opening it. Note that this latter data is cached! POLICY The kernel enforces only very limited policy on names. It will not do access filtering by userspace payload, and thus not by interface or method name. This ultimately means that most fine-grained policy enforcement needs to be done by the receiving process. We recommend using PolicyKit for any more complex checks. However, libraries should make simple static policy decisions regarding privileged/unprivileged method calls easy. We recommend doing this by enabling KDBUS_ATTACH_CAPS and KDBUS_ATTACH_CREDS for incoming messages, and then discerning client access by some capability, or if sender and receiver UIDs match. BUS ADDRESSES When connecting to kdbus use the "kernel:" protocol prefix in DBus address strings. The device node path is encoded in its "path=" parameter. Client libraries should use the following connection string when connecting to the system bus: kernel:path=/sys/fs/kdbus/0-system/bus;unix:path=/var/run/dbus/system_bus_socket This will ensure that kdbus is preferred over the legacy AF_UNIX socket, but compatibility is kept. For the user bus use: kernel:path=/sys/fs/kdbus/$UID-user/bus;unix:path=$XDG_RUNTIME_DIR/bus With $UID replaced by the callers numer user ID, and $XDG_RUNTIME_DIR following the XDG basedir spec. Of course the $DBUS_SYSTEM_BUS_ADDRESS and $DBUS_SESSION_BUS_ADDRESS variables should still take precedence. DBUS SERVICE FILES Activatable services for kdbus may not use classic dbus1 service activation files. Instead, programs should drop in native systemd .service and .busname unit files, so that they are treated uniformly with other types of units and activation of the system. Note that this results in a major difference to classic dbus1: activatable bus names can be established at any time in the boot process. This is unlike dbus1 where activatable names are unconditionally available as long as dbus-daemon is running. Being able to control when activatable names are established is essential to allow usage of kdbus during early boot and in initrds, without the risk of triggering services too early. DISCLAIMER This all is so far just the status quo. We are putting this together, because we are quite confident that further API changes will be smaller, but to make this very clear: this is all subject to change, still! We invite you to port over your favorite dbus library to this new scheme, but please be prepared to make minor changes when we still change these interfaces!