Unionfs: User- and Community-Oriented Development of a Unification File System

David Quigley, Josef Sipek, Charles P. Wright, and Erez Zadok
Stony Brook University
`{dquigley,jsipek,cwright,ezk}@cs.sunysb.edu`
Appears in the proceedings of the 2006 Ottawa Linux Symposium (OLS 2006)

Abstract

Unionfs is a stackable file system that virtually merges a set of directories (called branches) into a single logical view. Each branch is assigned a priority and may be either read-only or read-write. When the highest priority branch is writable, Unionfs provides copy-on-write semantics for read-only branches. These copy-on-write semantics have lead to widespread use of Unionfs by LiveCD projects including Knoppix and SLAX. In this paper we describe our experiences distributing and maintaining an out-of-kernel module since November 2004. As of March 2006 Unionfs has been downloaded by over 6,700 unique users and is used by over two dozen other projects. The total number of Unionfs users, by extension, is in the tens of thousands.

1 Introduction

Unionfs is a stackable file system that allows users to specify a series of directories (also known as branches) which are presented to users as one virtual directory even though the branches can come from different file systems. This is commonly referred to as namespace unification. Unionfs uses a simple priority system which gives each branch a unique priority. If a file exists in multiple branches, the user sees only the copy in the higher-priority branch. Unionfs allows some branches to be read-only, but as long as the highest-priority branch is read-write, Unionfs uses copy-on-write semantics to provide an illusion that all branches are writable. This feature allows Live-CD developers to give their users a writable system based on read-only media.

There are many uses for namespace unification. The two most common uses are Live-CDs and diskless/NFS-root clients. On Live-CDs, by definition, the data is stored on a read-only medium. However, it is very convenient for users to be able to modify the data. Unifying the read-only CD with a writable RAM disk gives the user the illusion of being able to modify the CD. Maintaining an identical system configuration across multiple diskless systems is another application of Unionfs. One simply needs to build a read-only system image, and create a union for each diskless node.

[Wrapfs layers over a single directory.] figures/sipek-linear.png

[Unionfs layers over multiple directories.] figures/sipek-unionfs.png

Figure 1: User processes issue system calls, which the kernel's virtual file system (VFS) directs to stackable file systems. Stackable file systems in turn pass the calls down to lower-level file systems (e.g., tmpfs or NFS).

Unionfs is based on the FiST stackable file system templates, which provide support for layering over a single directory [12]. As shown in Figure 1(a), the kernel's VFS is responsible for dispatching file-system-related system calls to the appropriate file system. To the VFS, a stackable file system appears as if it were a standard file system, but instead of storing or retrieving data, a stackable file system passes calls down to lower-level file systems that are responsible for data storage and retrieval. In this scenario, NFS is used as the lower-level file system, but any file system can be used to store the data (e.g., Ext2, Ext3, Reiserfs, SQUASHFS, isofs, and tmpfs). To the lower-level file systems, a stackable file system appears as if it were the VFS. This makes stackable file system development difficult, because the file system must adhere to the conventions both of file systems (for processing VFS calls) and of the VFS (for making VFS calls).

As shown in Figure 1(b), Unionfs extends the FiST templates to layer over multiple directories, unify directory contents, and perform copy-on-write. In this example, Unionfs is layered over two branches: (1) a read-write tmpfs file system and (2) a read-only NFS file system. The contents of these file systems are virtually merged, and if operations on the NFS file system return the read-only file system error code (EROFS) then Unionfs transparently copies the files to the tmpfs branch.

We originally released Unionfs in November 2004, after roughly 18 months of development as a research project [10,11]. We released Unionfs as a standalone kernel module because that was the most expedient way for users to begin using it and it required less initial effort on our part. Unionfs was quickly adopted by several LiveCDs such as SLAX [7] (December 2004) and Knoppix [5] (March 2005). As of March 2006, Unionfs has been downloaded by over 6,700 users from 81 countries and is distributed as part of other projects. Our mailing list currently has 336 subscribers with 53 of them subscribed to our CVS update list. Unionfs is an integral part of several LiveCDs, so the actual number of Unionfs users is much larger.

Maintaining Unionfs out side of the kernel has both benefits and complications. By maintaining the tree out side of the kernel, our user base is expanded: users can still use their vendor's kernel, and we are able to support several kernel versions. We were also able to release Unionfs more frequently than the kernel. However, this makes our code more complex since we must deal with changing interfaces between kernel versions. It also raises questions about the point at which support for a particular older kernel version should be dropped. At this point, Unionfs has become established enough that we are moving towards a release that is ready for mainline kernel submission.

Unionfs has complex allocation schemes (particularly for dentry and inode objects), and makes more use of kmalloc than other file systems. One hurdle we had to overcome was lack of useful memory-allocation debugging support. The memory-allocation debugging code in recent -mm kernels does not provide sufficient debugging information. In our approach, we log kmalloc and dentry allocations, and then post-process the log to locate memory leaks and other errors.

In our efforts to move toward kernel inclusion we have come across many aspects that conflict with maintaining an out-of-kernel module. One of the main issues is the ability to separate research code from practical code. Features such as persistent inodes and atomically performing certain operations increase code complexity, conflicting with the mantra "less code is better code." We also had to change the way we separate file system components to provide simpler and more easily maintainable code. In addition to this, we also have to keep up with changes in kernel interfaces such as the change of locking primitives introduced in Linux 2.6.16.

The rest of this paper is organized as follows. In Section 2 we describe Unionfs use cases. In Section 3 we describe the challenges of maintaining an out-of-tree module. In Section 4 we describe some limitations of Unionfs. In Section 5 we present a brief performance evaluation of Unionfs. Finally, we conclude in Section 6.

2 Use Cases

We have identified three primary use cases for Unionfs. All of these common use cases leverage Unionfs's copy-on-write semantics. The first and most prevalent use of Unionfs is in LiveCDs. The second is using Unionfs to provide a common base for several NFS-mounted machines. The third is to use Unionfs for snapshotting.

LiveCDs. LiveCDs allow users to boot Linux without modifying any data on a hard disk. This has several advantages:

Users can try Linux without committing to it [5,7].
Special-purpose open-source software can be distributed to non-technical users (e.g., for music composition [4]).
System administrators can rescue machines more easily [2].
Many similar machines can be set up without installing software (e.g., in a cluster environment [9], or at events that require certain software).

The simplest use of Unionfs for LiveCDs unifies a standard read-only ISO9660 file system with a higher-priority read-write tmpfs file system. Current versions of Knoppix [5] use such a configuration, which allows users to install and reconfigure programs.

Knoppix begins its boot sequence by loading an initial RAM disk (initrd) image of an Ext2 file system and then executing a shell script called /linuxrc. The linuxrc script first mounts the /proc and /sys file systems. Next, Knoppix loads various device drivers (e.g., SCSI, IDE, USB, FireWire) and mounts a compressed ISO9660 image on /KNOPPIX. After the Knoppix image is mounted, the Unionfs module is loaded. Next a tmpfs file system is mounted on /ramdisk. Once Unionfs is mounted, this RAM disk becomes the destination for all of the changes to the CD-ROM. Next, the directory /UNIONFS is created and Unionfs is mounted on that directory with the following command:

mount -t unionfs \
  -o dirs=/ramdisk=rw:/KNOPPIX=ro \
  /UNIONFS /UNIONFS

The -t unionfs argument tells the mount program that the file system type is Unionfs. The option specifies the directories that make up the union. Directories are listed in a order of priority, starting with the highest. In this case, the highest-priority directory is /ramdisk, which is read-write. The /ramdisk directory is unified with /KNOPPIX, which is read-only. The first /UNIONFS argument is a placeholder for the device name in /proc/mounts, and the second /UNIONFS argument is the location where Unionfs is mounted. Finally, linuxrc makes symbolic links from the root directory to /UNIONFS. For example, is a link to . At this point the linuxrc script exits, and init is executed.

Other LiveCDs (notably SLAX [7]) use Unionfs both for its copy-on-write semantics and as a package manager. A SLAX distribution consists of several modules, which are essentially SQUASHFS file system images [6]. On boot, the selected modules are unified to create a single file system view. Unifying the file systems makes it simple to add or remove packages from the LiveCD, without regenerating entire file system images. In addition to the SQUASHFS images, the highest-priority branch is a read-write tmpfs which provides the illusion that the CD is read-write.

SLAX uses the pivot_root system call so that the root file system is indeed Unionfs, whereas Knoppix creates symbolic links to provide the illusion of a Unionfs-rooted CD. SLAX also begins its boot sequence by loading an Ext2 initrd image and executing linuxrc, which mounts /proc and /sys. Next, SLAX mounts tmpfs on /memory. The next step is to mount Unionfs on /union with a single branch /memory/changes using the following command:

mount -t unionfs \
 -o dirs=/memory/changes=rw
 unionfs /union

Aside from the branch configuration, the major difference between this command and the one from Knoppix is that instead of using /UNIONFS as a placeholder, the text unionfs is used instead. We recommend this approach (or better yet, the string none), because it is less likely to be confused with an actual path or argument.

rootfs / rootfs rw 0 0
/dev/root /mnt/live ext2 rw,nogrpid 0 0
/proc /mnt/live/proc proc rw 0 0
tmpfs /mnt/live/memory tmpfs rw 0 0
unionfs / unionfs rw,dirs=/mnt/live/memory/changes=rw:...:/mnt/live/
-> memory/images/02_core.mo=ro:/mnt/live/memory/images/01_kernel.mo=ro
-> 0 0
/dev/hdb /mnt/live/mnt/hdb iso9660 ro 0 0
/dev/hdb /boot iso9660 ro 0 0
/dev/loop0 /mnt/live/memory/images/01_kernel.mo squashfs ro 0 0
/dev/loop2 /mnt/live/memory/images/02_core.mo squashfs ro 0 0
... Figure 2: The /proc/mounts file on SLAX after linuxrc is executed. Note that the Unionfs line has been split (denoted by ->). For brevity, we exclude seven additional SLAX packages.

After mounting the mostly empty Unionfs, SLAX performs hardware detection. The next step is to load the SLAX modules, which are equivalent to packages. The first step in loading a module is to mount the SQUASHFS image on /memory/images. After the SQUASHFS image is mounted, SLAX calls our unionctl to insert the module into the Union. The following command is used to insert SLAX's kernel module:

unionctl /union --add \
  --after 0 --mode ro \
  /memory/images/01_kernel

The --after 0 argument instructs Unionfs to insert the new directory, /memory/images/01_kernel, after the first branch, and the --mode ro argument instructs Unionfs to mark this branch read-only. This process is repeated for each module. SLAX then creates /union/proc, /union/sys, /union/dev, /union/tmp, and /union/mnt/live. SLAX then changes the present working directory to /union and unmounts /sys. Next, Unionfs is made the root file system using pivot_root:

pivot_root .  mnt/live

This command makes Unionfs the root file system, and remounts the initial RAM disk on /union/mnt/live. Finally, SLAX starts init using Unionfs as the root file system:

/usr/bin/chroot . sbin/init

After this procedure, SLAX produces the /proc/mounts file seen in Figure 2.

NFS-mounted machines. Another use of Unionfs is to simplify the administration of diskless machines. A set of machines can share a single read-only NFS root file system. This enables administrators to maintain a common image for all of the machines. This root file system is then unified with a higher-priority read-write branch so that users can customize the machine or save data. If persistence is not required, then a tmpfs file system can be used as the highest priority branch. If persistence is required, then a read-write NFS mount or a local disk could be used for the user's files.

/bin client1(ro) client2(ro)
/store/client1 client1(rw)
/store/client2 client2(rw)
Figure 3: The contents of /etc/exports on the server which contains the clients' binaries.

mount -t nfs server:/bin /mnt/nfsbins
mount -t nfs server:/store/`hostname -s` /mnt/persist
mount -t unionfs none /bin -o dirs=/mnt/persist:/mnt/nfsbins=nfsro
Figure 4: Creating a union with two NFS-based shares for binaries and persistent data.

Figure 3 shows a sample NFS /etc/exports file for a diskless client configuration. To ensure that none of the clients can tamper with the shared binaries on the server, we export the /bin directory read-only. We then export the persistent storage folder for each client individually. This ensures that one client cannot tamper with the persistent folder of another.

Figure 4 shows the commands used to create a union from a shared binary directory and to provide a persistent backing store for that directory on a second NFS mount. The first command mounts /bin for our client. The next command mounts the persistent data store for our client based on its hostname. Finally, we create a union containing the exported /bin and /store/`hostname -s` directories and mount it at /bin on our local client. To have a full system that is exported via NFS, one simply exports / instead of just /bin. This permits a full system to be exported to the diskless clients. However, such a set requires the additional steps present in LiveCDs which allows you to use /proc and /dev.

Snapshotting. The previous usage scenarios all assumed that one or more components of the union were read-only by necessity (either enforced by hardware limitations or the NFS server). Unionfs can also provide copy-on-write semantics by logically marking a physically read-write branch as read-only. This enables Unionfs to be used for file system snapshots. To create a snapshot, the unionctl tool is used to invoke branch management ioctls that dynamically modify the union without unmounting and remounting Unionfs. First, unionctl is used to add a new high-priority branch. For example, the following command adds /snaps/1 as the highest priority branch to a union mounted on /union:

unionctl /union --add /snaps/1

Next, unionctl is called for each existing branch to mark them as read-only. The following command will mark the branch /snaps/0 read-only:

unionctl /union --mode /snaps/0 ro

Any changes made to the file system take place only in the read-write branch. Because the read-write branch has a higher priority than all the other branches, users see the updated contents.

3 Challenges

While developing Unionfs we encountered several issues that we feel developers should address before they decide whether or not to aim for kernel inclusion. Backward compatibility, changes in kernel interfaces, and experimental code are three such issues. In section 3.1, we consider the advantages and disadvantages of maintaining a module outside of the mainline kernel. In section 3.2, we discuss the implications of developing a module that is aiming for inclusion in the mainline Linux kernel.

3.1 Developing an out of kernel module

When first releasing Unionfs, we wanted to ensure that as many people as possible could use it. To accommodate this, we attempted to provide backward compatibility with past kernel versions. Initially, when Unionfs supported Linux 2.4 it was easy to keep up with changing kernels, since most of the changes between kernel versions were bug fixes.

In December of 2004, Unionfs was ported to Linux 2.6 which introduced additional complications. VFS changes between 2.4 and 2.6 (e.g., file pointer update semantics and locking mechanisms) required #ifdefed sections of code to provide backward compatibility with Linux 2.4. In addition, since we were supporting Linux 2.6, we had to be conscious of the fact that the 2.6 kernel interfaces could change between versions.

The benefit of supporting multiple kernel versions was that we could enable the use of Unionfs on many different platforms. Although LiveCD creators mostly preferred Linux 2.6 kernels, we found that some of them were still working with 2.4. In addition, several people were using Unionfs for embedded devices, which at the time tended to use 2.4 kernels. However, providing backward compatibility came with a few disadvantages and raised the question of how far back we would go. Because there is no standard kernel for LiveCD developers, there were bug reports and compatibility issues across many different kernel versions.

Although Unionfs supported multiple kernel versions, we had to choose which versions to focus on. We increased the minimum kernel version Unionfs required if: (1) it would make us #ifdef code that was already #ifdefed for backward compatibility, or (2) if it made the code overly complex. After Unionfs was ported to Linux 2.6, we found ourselves repeatedly raising the minimum kernel version due to the large number of interface-breaking changes. For example, 2.6.11 introduced the unlocked_ioctl operation. The most invasive change has been 2.6.16's new mutex system. Even though we have stopped supporting backward compatibility, users often submit backward-compatibility patches which we apply but do not support.

Along with backward compatibility came increased code complexity. Although backward compatibility does not generally add much code, the readability of the code decreased since we kept many sections of #ifdefed code. Moreover, it made debugging more difficult as Unionfs could run in more environments. In February of 2005, we decided to drop support for Linux 2.4 to reduce the size and complexity of the code. By placing the restriction that Unionfs will only support Linux 2.6, we were able to cut our code base by roughly 5%. Although this is not a large percentage, this increased maintainability greatly since it lowered the number of environments that we had to maintain and test against. By removing Linux 2.4 from our list of supported kernels, we eliminated eleven different kernel versions that we were supporting. This also allowed us to remove a number of bugs that were related to issues with backward compatibility and which applied to Linux 2.4 only. Before dropping support for a specific kernel version, we release a final version of Unionfs that supports that kernel version.

Even though we removed 2.4 support from Unionfs, it did not end the problems of backward compatibility. With Linux 2.6, a new development process was introduced where code and interface changes that would previously have been introduced in a development kernel are placed into the stable branch. Linux 2.6.16 introduced a new set of locking mechanisms where semaphores were replaced with mutexes. Although this is one of the larger changes we have seen, there are many such changes that force us to deal with backward compatibility within the 2.6 branch itself. This led us to decide in February of 2006 to drop backward compatibility completely and only work with the latest release candidate so that we can closely follow the kernel on our path to inclusion. Since we make a release of Unionfs before every major change we still have working copies of Unionfs for Linux 2.4 and earlier versions of Linux 2.6.

3.2 Kernel Inclusion

In our efforts to prepare Unionfs to be submitted to the kernel mailing list for discussion, we had to address three major issues. First, due to the incremental nature of Unionfs's development, the code base needed large amounts of reorganization to conform to kernel coding conventions. Second, Unionfs user-space utilities use older methods for interfacing with the kernel that needed to be replaced by newer more desired methods, such as the use of configfs and sysfs. Finally, features that were placed in Unionfs for research interests needed to be removed to make the code base more practical.

Since the Linux kernel is a massive project with people contributing code to every component, there are very strict guidelines for the way code should be organized and formatted. While reviewing the code base for Unionfs, we realized that some of the functions were unnecessarily long. Even now, due to the complex fan-out nature of Unionfs, many of the functions are longer than we would like due to loops and conditionals.

When looking into the methods available for a user-mode process to communicate with our file system, we noticed one trend. Every time a person introduces an ioctl, there is an objection and a suggestion to find a better way of handling what is needed. Because Unionfs uses several ioctls for adding branches, marking branches read-only, and identifying which branch a file is on, we decided that other methods should be explored. The preferred methods for modifying and viewing kernel object states are configfs and sysfs. Although both are good options, they both have shortcomings that prevented us from using them.

In the case of configfs, the major concern was that the module is optional. This issue could be addressed by marking configfs to be selected by Unionfs, but that ignores a larger issue. Many of the users of Unionfs are using it in embedded devices and on LiveCDs. If we use configfs to control Unionfs's configuration, we are forcing those users to use a larger kernel image that exceeds their memory and storage constraints. With sysfs we came across the issue of not having any file-system-related kernel objects defined by sysfs. To use sysfs, we would have to design and implement a complete set of VFS kernel objects for sysfs and submit them for kernel inclusion in addition to Unionfs.

To solve our problem of using ioctls for branch manipulation, we decided to use the remount functionality that already exists in the kernel. Remount allows one to change the configuration of a mount while leaving its files open so processes can continue to use the files after the remount operation is complete. This lets us provide the ability to change branch configurations easily without the need for ioctls, by parsing the new options that are passed in and applying the differences between the new and old options. However, this still requires us to maintain two ioctls for querying files and another for debugging.

As of this writing, we are addressing a problem associated with crossing mount points within a union. The most common occurrence of this problem is when a LiveCD performs a pivot_root or a chroot to a Unionfs mounted path. Currently LiveCD developers mount Unionfs and then they proceed to move the mount points for /proc and /sys to /unionfs/proc and /unionfs/sys, respectively. After this they pivot_root to the union so that proc and sys are visible. The reason that this problem exists is that currently Unionfs stacks on top of the superblock for each branch. This presents a problem because it does not give us access to the data structures that permit us to cross mount points. Our solution to this problem is to redo how Unionfs stacks on top of the branches by stacking on a dentry and a vfsmount structure. This will give us the additional information that is needed to build the structures necessary to cross mount points. Even with the ability to cross mount points, it is not advised to stack on pseudo file systems such as sysfs and procfs. Since sysfs and procfs are not only accessed through Unionfs, but rather are also manipulated directly by the kernel, inconsistencies can arise between the caches used by Unionfs and these file systems.

Because Unionfs started as a research project, it had many theoretically interesting features from a design perspective, which users did not need in practice. Unionfs contains functionality for copyup, this occurs when a file that exists on a read-only branch is modified. When the file is modified Unionfs attempts to copy the file up to the nearest read-write branch. Some of the early features of Unionfs included several copyup modes, which allowed copyup to take the permissions of the current user, the original permissions of the file, or a set of permissions specified at mount time.

In addition, there were several delete modes which performed one of three actions:

delete=whiteout (default) locates the first instance of the file and unlinks only that instance. This mode differs from delete=first in that it will create a whiteout for that file in the branch it removed the file from.
delete=all finds every instance of the file across all branches and unlink them.
delete=first located the first instance of the file and unlinked only that instance without creating a whiteout.

In the case of the delete mount option we found that no one was using the delete=first and delete=all options and that the delete=whiteout option was strongly preferred. Because our user base is predominantly composed of LiveCD developers, delete=first was removed and delete=all is only present if Unionfs is compiled with UNIONFS_DELETE_ALL defined.

We also had several modes to describe permissions with which a whiteout was to be created. When a file is deleted Unionfs will create a .wh.name file where name is the name of the file. This tells Unionfs that it should remove this file from the view presented to the user. These options were removed since we found that copyup=currentuser and copyup=mounter went completely unused by our users:

copyup=preserve (default) creates the new file with the same permissions that existed on the file which was unlinked.
copyup=currentuser creates the new file with the UID, GID, and umask taken from the current user.
copyup=mounter creates the new file with UID, GID, and permissions specified in the options string of the Unionfs mount.

Although the extra options were interesting research concepts, they were not practical for what our users were using Unionfs for and only served to increase code complexity.

Another instance of where ideas that are good for research purposes fail in practice is in the creation of whiteouts. Initially, when a whiteout was created while removing a file, the whiteout was created atomically via rename and was then truncated. This was done so that if the process failed half-way through, there would not be any ambiguity about whether the file existed. This added additional complexity to the code without sufficient gains in either performance or functionality. Since then, we have removed atomic whiteout creation due to the inherent difficulty of maintaining the semantics of open, yet deleted, files.

4 Limitations

During the development of Unionfs, we had to make certain design decisions to help the overall implementation. Such decisions often impose limitations. We have identified three such limitations in Unionfs: modification of lower-level branches, mmap copyup with dynamic branch management, and scalability. We discuss each in detail below.

Modification of lower-level branches. The current design of Unionfs and other stackable file systems on Linux results in double caching of data and meta-data. This is an unfortunate side-effect of the way the Linux VFS is implemented-there is no easy coordination between objects cached at different levels [1]. This forces us to maintain a list of lower VFS objects for each upper object. For example, a Unionfs inode contains an array of pointers to all the corresponding inodes on the underlying branches. Unionfs has to copy certain information from the underlying inode for a file to the Unionfs inode: metadata information such as file size, access permissions, group and owner, and so on.

Since Unionfs expects the underlying inode (and therefore the file) to have certain properties about the file (e.g., have a size consistent with that saved in the Unionfs inode) it is possible for inconsistencies to appear if a process modifies the lower inode directly without going through Unionfs. We encourage our users to avoid modifying the lower branches directly. This works well in scenarios where many of the branches are stored on read-only media (e.g., LiveCDs). However, there are some people who want to use Unionfs to provide a unified view of several frequently changing directories. Moreover, if users delete or rename files or directories, then Unionfs points to the older object, again yielding an inconsistent view.

mmap copyup with dynamic branch management. When Unionfs was first implemented in early 2004, only a bare-bone functionality existed: the full set of system calls was not implemented. Some of these system calls, in particular mmap, are required for certain programs to function properly. The mmap system call allows programs to map portions of files into a process's address space. Once a file is mmapped, a process can modify it by simply writing to the correct location in memory. Currently, Unionfs does not natively implement mmap operations, but rather passes them down unchanged to the lower-level file system. This has the advantage of preventing double caching of data pages and its associated performance and consistency pitfalls. However, this comes with the drawback that Unionfs does not receive notification of readpage or writepage calls, so it cannot perform copyup during a commit_write. The problem occurs when a process tries to modify a page backed by a file on a read-only medium. Just like in the regular open-for-write case, we must copyup the file to a writable branch and then perform the correct address space operations.

In March 2006, Shaya Potter, a Unionfs user and contributor, released a partial implementation of mmap. The major problem with it is the lack of copyup functionality while using mmap. Additionally, one has to be careful with the implementation since certain file systems (e.g., OCFS2, GFS) must take additional steps before calling prepare_write and commit_write. We have made this mmap functionality a compile-time option which is off by default.

Scalability. Although we do not consider it as serious as the two previous issues, the last issue is scalability. Even though most Unionfs users want two to six branches, there are some that want more. In its current state, the maximum number of branches that Unionfs supports is 1,024 due to the use of FD_SET and related macros. However, the overhead of using Unionfs becomes high with just 200 branches, even for simple operations such as readdir and lookup (see our evaluation in Section 5). The problem with these operations is that Unionfs needs to iterate through all the branches; for each branch it needs to determine whether or not it is a duplicate, whiteout, and so on. Currently, we are storing stacking information in a simple linear array. This structure, while easy to access and use, has a search complexity of O(n).

Of course, there are other operations, such as llseek operating on directories, which should be examined and possibly optimized. For other operations, Unionfs is a bit more efficient because it can use the dentry cache objects that have been populated by lookup.

5 Evaluation

We conducted our benchmarks on a 1.7GHz Pentium 4 machine with 1.25GB of RAM. Its system disk was a 30GB 7,200 RPM Western Digital Caviar IDE formatted with Ext3. In addition, the machine had one Maxtor 7,200 RPM 40GB IDE disk formatted with Ext2, which we used for the tests. We remounted the lower file systems before every benchmark run to purge file system caches. We used the Student-t distribution to compute the 95% confidence intervals for the mean elapsed, system, user, and wait times. Wait time is the elapsed time less CPU time used and consists mostly of I/O, but process scheduling can also affect it. In each case, the half-widths of the confidence intervals were less than 5% of the mean. The test machine was running a Fedora Core 4 Linux distribution with a vanilla 2.6.16-rc6 kernel.

In all the tests, the highest-priority branch was read-write, while all the other branches, if any, were read-only. More detailed evaluation can be found in our journal article [10].

5.1 Evaluation Workloads

We chose to perform two benchmarks to test the extreme cases-on one end of the spectrum there are CPU-intensive workloads, while on the other end there are I/O-intensive workloads.

OpenSSH build. Building OpenSSH [8] is a CPU-intensive benchmark. We used OpenSSH 4.0p1, which contains 74,259 lines of code. It performs several hundred small configuration tests, and then it builds 155 object files, one library, and four scripts. This benchmark contains a fair mix of file system operations, representing a workload characteristic of users. The highest-priority branch was read-write, while all the other branches, if any, were read-only

Postmark. Postmark v1.5 simulates the operation of electronic mail servers [3]. It performs a series of file system operations such as appends, file reads, creations, and deletions. This benchmark uses little CPU but is I/O intensive. We configured Postmark to create 20,000 files, between 512-10,240K bytes in size, and perform 200,000 transactions. We used 200 subdirectories to prevent linear directory look ups from dominating the results. All of the branches were read-write, to distribute the load evenly across branches. This is because Postmark does not have an initial working set, therefore using read-only branches does not make sense for this benchmark.

5.2 Results

On average, Unionfs incurred only 10.7% maximum overhead over Ext2 on the OpenSSH compile, and 71.7% overhead over Ext2 on Postmark. These results are somewhat worse compared to our previous benchmarks [10]. However, the difference in the OpenSSH compile benchmark appears mainly in I/O wait time, which could be contributed to copyup taking place. We did not use copyup in our previous benchmark.

OpenSSH build. We performed the OpenSSH compile with two different layouts of the data. The first distributed all the files from the source code tarball over all the branches using a simple round robin algorithm. The other layout consists of a copy of the entire source tree on each branch. For both layouts, we have measured and plotted the elapsed, system, and user times.

Figure 5: OpenSSH compile: Source code uniformly distributed across all branches.

When the OpenSSH source code is uniformly distributed across all the branches, the overhead is a mere 0.99% (Figure 5). This is due to the simple fact that we must perform several additional function calls before we hand of control to the lower file system (Ext2). With more branches, the overhead slightly increases to 2.1% with 8 branches. This shows that Unionfs scales roughly linearly for this benchmark.

Figure 6: OpenSSH compile: Source code duplicated on all branches.

With the OpenSSH source code duplicated on all branches (Figure 6), the overheads were slightly higher. A single branch configuration incurred 1.2% overhead. The slight increase in time is a logical consequence of Unionfs having to check all the branches, and on each branch dealing with the full source code tree which slows down linear directory lookups. The 8-branch configuration increased runtime by 10.7%. As with the previous data layout, Unionfs scales roughly linearly.

Postmark. Figure 7 shows the elapsed, system, and user time for Postmark. The elapsed time overheads for Unionfs are in the range of 64.0-71.7% above that of Ext2. Since Postmark is designed to simulate I/O intensive workloads, and all the VFS operations have to pass through Unionfs, it is not surprising that the overhead of Unionfs becomes apparent. Fortunately, typical user workloads are not I/O bound and therefore one will not notice much performance degradation as shown by the OpenSSH compile benchmarks.

Figure 7: Postmark: 20,000 files and 200,000 transactions.

6 Conclusion

It is often easy to create a piece of software whose functionality is enough for the authors. However, that functionality is usually a subset of that required by real users. Since the first release in early 2004, user feedback has helped us make Unionfs more complete and stable than it would have been had a small team of developers worked on it without any community feedback. Our users have used Unionfs for applications that were not even considered back when Unionfs was originally designed, and located bugs that would otherwise have gone unnoticed.

For quite some time, Linux users wanted a namespace unifying file system; Unionfs gives them exactly that. While there are still several known issues to deal with, Unionfs is steadily becoming a polished software package. With the increasing use and popularity of Unionfs we felt that the next logical step was to clean up Unionfs and submit it for kernel inclusion.

7 Acknowledgments

Unionfs would not be in nearly as good a state if it was not for our user community, which has submitted bug reports, patches, and provided us with interesting use cases. There are far too many contributors to list individually (there are over 37 listed in our AUTHORS file, which only includes those who have submitted patches), and we extend thanks to all contributors and users, named and unnamed. Early adopters and bug reporters like Tomas Matejicek and Fabian Franz helped immeasurably. Recently, Junjiro Okajima has fixed many bugs and can be counted on for high-quality patches. The experimental mmap code currently in Unionfs was contributed by Shaya Potter. Jay Dave, Puja Gupta, Harikesavan Krishnan, and Mohammad Nayyer Zubair worked on Unionfs in its early stages.

This work was partially made possible by NSF CAREER award EIA-0133589, NSF Trusted Computing Award CCR-0310493, and HP/Intel gift numbers 87128 and 88415.1.

Unionfs is released under the GPL. Sources and documentation can be downloaded from http://unionfs.filesystems.org.

References

[1]: J. S. Heidemann and G. J. Popek. Performance of cache coherence in stackable filing. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95), pages 3-6, Copper Mountain Resort, CO, December 1995. ACM SIGOPS.
[2]: Inside Security IT Consulting GmbH. Inside Security Rescue Toolkit. http://insert.cd, 2006.
[3]: J. Katcher. PostMark: A New Filesystem Benchmark. Technical Report TR3022, Network Appliance, 1997. www.netapp.com/tech_library/3022.html.
[4]: P. Kerr. m-dist: live linux midi distribution. http://plus24.com/m-dist/, 2005.
[5]: K. Knopper. Knoppix Linux. www.knoppix.net, 2006.
[6]: P. Lougher. SQUASHFS - A squashed read-only filesystem for Linux. http://squashfs.sourceforge.net, 2006.
[7]: T. Matejicek. SLAX - your pocket OS. http://slax.linux-live.org, 2006.
[8]: OpenBSD. OpenSSH. www.openssh.org, May 2005.
[9]: J. Silverman. Clusterix: Bringing the power of computing together. http://clusterix.net, 2004.
[10]: C. P. Wright, J. Dave, P. Gupta, H. Krishnan, D. P. Quigley, E. Zadok, and M. N. Zubair. Versatility and unix semantics in namespace unification. ACM Transactions on Storage (TOS), 2(1):1-32, February 2006.
[11]: C. P. Wright and E. Zadok. Unionfs: Bringing File Systems Together. Linux Journal, (128):24-29, December 2004.
[12]: E. Zadok and J. Nieh. FiST: A Language for Stackable File Systems. In Proc. of the Annual USENIX Technical Conference, pages 55-70, San Diego, CA, June 2000. USENIX Association.

File translated from T_EX by T_TH, version 3.76.
On 8 Sep 2006, 12:07.