New website and xmpp.rs v0.4 release
Posted 2023-06-05 00:00:00 ‐ 15 min read
After more than 3 years and half since the last release (v0.3 in septembre 2019), we are happy to announce xmpp-rs version 0.4.0. If you don't know about xmpp-rs, we are building Rust crates for interoperable and federated instant messaging and other social networking applications.
We are also glad to announce the creation of this very website: xmpp.rs, which is now home to several XMPP-related libraries for the Rust programming language, and hopefully will be used to bring attention to other projects in this space.
xmpp.rs is a collection of libraries which all had releases in the past 3 years, but just as a reminder, let us introduce them to you:
- xmpp (v0.4) is the high-level starting point for writing XMPP applications ; this crate is sometimes also known as xmpp-rs,
- tokio-xmpp (v3.3) is the low-level networking crate for asynchronous XMPP communications on the tokio runtime,
- xmpp-parsers (v0.19.2) contains the data structures used in XMPP applications (eg. messages or user blocklists) and code for (de)serializing with minidom,
- minidom (v0.15.2) is a lightweight DOM library to represent tree-like elements parsed from XML,
- jid (v0.9.3) is a JabberID (JID) parser that validates strings into correct addresses
Since it's been so long since the last release of the high-level library, it's impossible to list everything that's been happening under the hood. However, we'll try to highlight the main changes in every one of those crates. For more information about the project and its future, stay tuned for new articles.
xmpp (v0.4)
In the xmpp crate, the Event type represents meaningful abstractions about clients and their interactions with the server.
id field in messages
In the v0.4 release, we have refined the ChatMessage
and RoomMessage
, for personal and group messages respectively, to expose the message id
as received from the client/server.
Modern XMPP features such as message reactions or retractions require that the client knows of a unique identifier for the message in order to interact with it... which is also useful for moderators to delete abusive content. None of these features are facilitated by the xmpp crate at the moment, but the building blocks are getting there.
Exposing and keeping message IDs around is also useful for history synchronization and deduplication. If I'm building a client with the xmpp crate, I can now request from MAM (the history manager) any messages sent after a specific one I know about, without risking duplicate messages in the user interface.
New events
We also introduced two new events for this release:
ServiceMessage
represents a direct message from a MUC (groupchat server or room) that was not sent by a participant in the groupchat but by the service itself; this is for example used by the biboumi IRC gateway to notify the user when there is an IRC mode change.HttpUploadedFile
represents a file uploaded by a client via HTTP ; it's a very simple event containing only the URL of the file.
tokio-xmpp (v3.3)
tokio-xmpp is a lower-level crate for building XMPP applications. It does not expose higher-level abstractions like the xmpp crate does, only providing utilities for sending and receiving abstract XMPP/XML messages (stanzas). The number of (breaking) fixes and optimizations in the past 3 years is too long to enumerate, but we can highlight one main change: custom connection options.
The next release will also contain code for automatic stanza ID assignment.
Custom connection options
In the XMPP ecosystem, we are big fans of DNS SRV records, because they empower us to host specific services on a domain on separate machines depending on the protocol, and to configure failover. However, there are situations in which you want to connect to a specific service without resorting to DNS, and in that case, you can now specify the server to connect to manually with.
For example, if you'd like to connect to the virtualhost example.com
on localhost, you can now do:
use jid::Jid;
use tokio_xmpp::{AsyncClient, AsyncConfig, AsyncServerConfig};
use std::str::FromStr;
let server_cfg = AsyncServerConfig::Manual {
host: "127.0.0.1",
port: "5222"
};
let client_cfg = AsyncConfig {
jid: Jid::from_str("username@example.com").expect("Invalid JID"),
password: "mysupersecret",
server: server_cfg,
};
let mut client = AsyncClient::new_with_config(client_cfg);
In addition to this new new_with_config
method, the classic DNS-resolving new
is still available for simpler setups:
use tokio_xmpp::AsyncClient;
let mut client = AsyncClient::new(
"username@example.com",
"mysupersecret"
).expect("Invalid JID");
xmpp-parsers (v0.19.2)
The xmpp-parsers crate contains definitions for data structures related to the XMPP protocol. If the high-level xmpp crate does not cover your needs, this is certainly where you can dig. In the past releases, we have implemented PartialEq
for most elements, as well as added many many new data structures and variants for existing enums.
PartialEq support
We have implemented the PartialEq
trait for most data structures contained in the crate. This allows easy equality comparison without resorting to tricks.
For the moment this excludes Iq, but support will be added in a future release.
New data structures
It would be impossible to explain and detail all the new XMPP features represented by the new data structures we introduced. Or rather, it would be possible, but would take an entire book dedicated to explaining the XMPP protocol, which would be a fun experiment but is not why we're here today. So let's make a simple non-exhaustive list of all the new possibilities we've added lately:
- the muc::user::Status::ServiceErrorKick variant represents a user getting kicked due to a technical error (such as a server-to-server communications failure) instead of an intentional kick,
- the extdisco module describes External Services Discovery (XEP-0215) for automatically discovering groupchat servers, gateways and other services registered on the server,
- the csi module describes Client State Indications (XEP-0352) to let clients tell the server whether they want to receive all interactions (Active) or only important notifications (Inactive), in order to save bandwidth and battery on mobile devices,
- the http_upload module describes HTTP file uploads (XEP-0363) as implemented by XMPP clients when P2P file exchange is not possible,
- the mam_prefs modules describes Message Archive Management Preferences (XEP-0441) for user-controlled message retention ; note that actual usage of this specification is currently being debated in the XMPP ecosystem and misusing it can lead to lost messages,
- the occupant_id module describes an OccupantID (XEP-0.421), a unique pseudonymous identifier for usage in groupchats,
- the rtt describes in-band real-time text (XEP-0301) for live transcription and collaborative note-taking applications,
- the bookmarks2 module describes PEP-based bookmarking (XEP-0402 aka "Bookmarks2") for user preferences in regards to groupchats (preferred nickname, room password, autojoin policy),
- the openpgp module describes public keys for OpenPGP for XMPP (XEP-0373 aka OX) ; note that the actual IM encryption via PGP (XEP-374) is not supported yet,
- the cert_management module describes TLS client certificates for authentication via SASL (XEP-0257),
- the pubsub module describes Publish-Subscribe (XEP-0060), a modern building block for asynchronous interactions on the XMPP network (eg. social networking),
- the mix module describes Mediated Information eXchange (XEP-0369), an alternative groupchat standard based on PubSub,
- so many new jingle modules (as seen in the crate docs) for peer-to-peer interactions between clients (file transfers, audio-video conferencing, etc).
The next release will also contain support for Message Reactions (XEP-0444).
minidom (v0.15.2)
Minidom is the XML (de)serialization library used in the lower layers of our XMPP crates. It is a very opinionated library that explicitly focuses on correctly implementing a subset of XML features that are used for XMPP, instead of supporting every XML feature possible.
Under the hood, we've switched our lower-level XML backend from quick-xml to rxml, an experimental crate that, by design, supports a smaller subset of the XML standards. This is great from a security perspective because rxml cannot be victim of Billion laughs attacks or XXE processing attacks, while from a correctness perspective, rxml only supports UTF-8 encoding which is what both Rust strings and XMPP protocol mandate. Finally, migrating to rxml allows us to use the same XML parser in both minidom and tokio-xmpp, reducing our dependencies globally.
Apart from this migration, we had three major changes to the minidom crate: PartialEq implementation for Element and Node, optional XML declaration in writers, and optional namespace declaration in readers.
PartialEq for Element/Node
PartialEq implementations for Element and Node have been changed to ensure namespaces match even if the objects are not structurally equivalent in Rust.
Optional XML declaration in writers
Previously, the default behavior of Element::write_to
was to include an XML declaration (<?xml version="1.0"?>
) in the output. However, this is strictly optional in the XMPP protocol, as it only ever contains XML.
We have therefore decided to change this default behavior to omit the XML declaration. If you want to keep the explicit declaration, you can now use the Element::write_to_decl
method.
Additional namespaces in readers
On the other hand, when deserializing messages from XML, it can be useful to specify a namespace manually. For example, if the namespace was previously declared in a parent element, but is no longer available in the current reader context. The new method Element::from_reader_with_prefixes
allows to do just that.
jid (v0.9.3)
jid is a very simple crate for representing JabberIDs. However, there's subtlety in this field, because there's different types of JIDs. In the XMPP protocol, a JID can have up to three parts: username@example.com/device
.
- the node, or local part, designates a specific account on the server, just like in the email world,
- the domain part designates the server or service,
- the resource part (
device
in our example) designates a specific client that's connected to the account ; contrary to the local/domain parts, the resource part is short-lived, although most clients reuse the same resource part when reconnecting over time.
However, some JIDs can be even shorter... for example a XMPP server or gateway such as irc.jabberfr.org
is in itself a valid JID to/from which requests and messages can flow. This is used in a variety of cases, including:
- announcements from your server operator, eg. for planned maintenance,
- registrating an account on an XMPP server, or announcing your status (presence) once logged in.
This shorter form of service/server JID is sometimes called "domain JID", however that's not a specific type of JID. The JID RFC doesn't specifically define the different types of JID, however the usual slang is given in examples:
- a bare JID has an optional local part, a required domain part, and no attached resource/device (eg.
username@example.com
orexample.com
), - a full JID is a bare JID with an attached resource (eg.
username@example.com/device
orexample.com/foobar
).
In the jid crate, we have chosen to have two different types for those two cases: BareJid
and FullJid
. Domain JIDs to interact with servers and gateways are therefore of BareJid
type.
Since the xmpp v0.3 release, we had three major changes in the jid crate: optional serde support, equality comparison, and initial implementation of unicode normalization for JIDs.
Optional serde support
It is now possible to (de)serialize JIDs using serde, the go-to solution in the Rust ecosystem. This can be enabled with the serde
feature in your crate's dependencies, like so:
[dependencies]
jid = { version = "*", features = [ "serde" ] }
For now, the (de)serialization format follows the actual data structure. For example, in JSON, the Jid username@example.com/device
would be represented as:
{"Full":"username@example.com/device"}
You will notice that the serialized version contains information about the enum variant, to indicate whether it's a FullJid or BareJid. This is controlled by the default enum representation settings in serde (enum tagging). The next release will introduce a breaking change so that the Jid enum serializes to an actual Jid string like the individual variants.
Equality comparison
It is now possible to check if JIDs are equal using the rust equality operator ==
, as we have implemented the PartialEq trait. You can compare Jid
with FullJid
or BareJid
. However, you cannot naively compare a FullJid
with a BareJid
because that could introduce logic error. For that purpose, you should extract the BareJid
from your FullJid
(dropping the resource part), like so:
let id1 = FullJid::new("username", "example.com", "device");
let id2 = BareJid::new("username", "example.com");
// We cannot compare a FullJid with a BareJid directly,
// but we can extract a BareJid from the FullJid for comparison
let bare_id1 = BareJid::from(id1);
if bare_id1 == id2 {
println!("Of course it is equal!");
}
Initial implementation of unicode normalization (stringprep)
Contrary to older protocols that need to negociate a charset between clients and servers, XMPP has from the beginning used UTF-8 so that characters from all languages can be represented throughout. That's why the XMPP ecosystem avoided all the mojibake fun that WWW and IRC had not so long ago.
However, fully supporting unicode raised some additional questions that needed answers, because there are many ways to represent the same string, or to craft strings that are visually indistinguishable yet different from a binary perspective... and that's why Unicode normalization (aka equivalence/canonicalization) has been standardized.
String normalization is not just used in XMPP though. For example, domain names in the DNS protocol are not case sensitive, so example.com
is exactly the same domain as Example.Com
.
There's two main usecases to implementing unicode normalization in any protocol:
- prevent common mistakes and confusion by asserting that
user
andUSER
are the same account on a given server, - prevent homograph attacks where different yet visually-identical characters are used to impersonate someone.
In the XMPP world, there's 3 normalization processes defined by the standards:
- STRINGPREP, the older Unicode normalization standard, is mandated by older XMPP RFCs for node/resource parts.
- PRECIS, the newer Unicode standard, is mandated by newer XMPP RFCs for node/resource parts.
- IDNA normalization is mandated for the domain names.
In the next jid crate release, we will include initial support unicode normalization by enabling the stringprep
feature on the crate. Protection against homograph attacks are not covered by this implementation.
Conclusion
That's all for today, which is already quite a lot. We hope this article gave you a good overview of some XMPP features and how we approach them in our crates. We are aware of some ergonomic and technical limitations in our current implementations, and any feedback is always welcome.
If you'd like to build your next social/messaging application in Rust using standard technologies, feel free to reach out.