Rewriting the GNU Coreutils in Rust

As movement toward memory-safe languages, and Rust in particular, continues to grow, it is worth looking at one of the larger scale efforts to port C code that has existed for decades to Rust. The uutils project aims to rewrite all of the individual utilities included in the GNU Coreutils project in Rust. Originally created by Jordi Boggiano in 2013, the project aims to provide drop-in replacements for the Coreutils programs, adding the data-race protection and memory safety that Rust provides.

Many readers will be familiar with the Coreutils project. It includes the basic file, process, and text manipulation programs that are expected to exist on every GNU-based operating system. The Coreutils project was created to consolidate three sets of tools that were previously offered separately, Fileutils, Textutils, and Shellutils, along with some other miscellaneous utilities. Many of the programs that are included in the project, such as rm, du, ls, and cat, have been around for multiple decades and, though other implementations exist, these utilities are not available for platforms like Windows in their original form.

Collectively, the Coreutils programs are seen as low-hanging fruit where a working Rust-based version can be produced in a reasonable amount of time. The requirements for each utility are clear and many of the them are conceptually straightforward, although that’s not to suggest that the work is easy. While a lot of progress has been made to get uutils into a usable state, it will take some time for it to reach the stability and maturity of Coreutils.

The use of Rust for this project will help to speed this process along since a huge swathe of possible memory errors and other undefined behavior is eliminated entirely. It also opens the door to the use of efficient, race-free multithreading which has the potential to speed up some of the programs under certain conditions. The uutils rewrite also provides an opportunity to not just reimplement Coreutils but to also enhance the functionality of some of the utilities to yield a better user experience, while maintaining compatibility with the GNU versions. For example, feature requests that have long been rejected in the Coreutils project, like adding a progress bar option for utilities like mv and cp, are currently being entertained in this Rust rewrite.

What has been done so far

On the project’s GitHub page, a table can be found with the utilities divided into three columns: “Done”, “Semi-Done”, and “To-Do”. At the time of this writing, only 23 of the 106 utilities being worked on are not yet in the “Done” column, with 16 of them marked as “Semi-Done” and seven under the “To-Do” column. The utilities under “To-Do” have either not been worked on at all or are currently undergoing their initial implementation (like with pr and chcon). Those in the “Semi-Done” column are missing options that have not yet been implemented, or their behavior is slightly different from their GNU counterparts in certain situations. For example:

  • tail does not support the -F or --retry flags.
  • more fails when input is piped to it as in cat foo.txt | more.
  • install is missing both the -b and --backup flags.
  • Several utilities do not support non-UTF-8 arguments, although this is largely being mitigated by migrating from getopts to clap for command-line argument parsing.

It is important to keep in mind that just because a program is marked as “Done” doesn’t mean that all of the tests are passing or that the utility is as performant or memory-efficient as the GNU version. For example, there are open issues to improve the performance of factor (roughly 5x slower) and sort (between 1.5x to 6x slower). In some other cases, the uutils versions are faster than their GNU equivalents. An example is the cp utility in which a measurable performance improvement has been reported, primarily due to the use of the sendfile() and copy_file_range() system calls, which are not being used in the GNU version despite several proposals to do so.

At the moment, only 142 of 624 tests in the Coreutils test suite are passing compared to around 546 tests passing with the GNU version. However, it should be noted that many of the errors are due to differences in the output of the commands.

A separate table also exists to show all of the platforms and architectures that the uutils project currently supports. The major operating systems (Linux, macOS, and Windows) are well accounted for across various architectures although quite a few utilities are currently not building on Windows. FreeBSD, NetBSD, and Android also compile most binaries except for a handful of utilities including chroot, uptime, uname, stat, who, and others. The rows for Redox OS, Solaris, WebAssembly, and Fuchsia are currently all blank, which reflects the lower priority assigned to those platforms at the moment.

The uutils project, currently at version 0.0.6, has already been packaged in the repositories for various Linux distributions and packaging systems. Notably, Sylvestre Ledru, a director at Mozilla and prolific contributor to the Debian and Ubuntu projects, has led the way in getting the project packaged for Debian as an alternative to the GNU Coreutils. Its current state is deemed good enough to get a system with GNOME up and running, install a thousand of the most popular Debian packages, and to build Firefox, the Linux kernel, and LLVM/Clang. Additionally, uutils is present in the repositories for Arch Linux (Community), Homebrew for macOS, and the Exherbo Linux distribution.

Licensing

An important aspect of the uutils project to be aware of is its licensing. All of the utilities in the project are licensed under the permissive MIT License, instead of the GPLv3 license of GNU Coreutils. This potentially makes it more attractive for use in places where software licensed with GPLv3 is not adopted due to its restrictions on Tivoization among other things. The decision to use the MIT License is not without its critics; some who commented in a GitHub issue about the choice would rather see a copyleft license applied to a project of this sort.

The main criticism echoes arguments over FOSS licensing in the past: a non-copyleft license is harmful to the freedoms of end users since it allows a person or organization to incorporate any part of the project into a device or in the distribution of other software without providing the source code so it is impossible to study, change, or improve it. There is also a concern that the license choice is being made to maximize Rust usage without regard for other effects; replacing GPL-licensed tools with alternatives under a more-permissive license is seen by some as a step backward.

Contributing to uutils

The best way to follow the development of the uutils project is through its GitHub repository and official Discord server. Details on how to get started with contributing to the project can be found in a document that is included in the repository.

A lot of work remains to be done to get uutils into a production-ready state. The project has been positioned as a good way get into Rust development and there is a list of issues for newcomers as a place to begin. The current focus of the project appears to be full compatibility with the GNU Coreutils and improving the test coverage before tackling other problems. Things like removing unnecessary dependencies, improving performance, and decreasing memory use are better suited to being addressed after the compatibility issues have been ironed out.