The problem with symbolic links

At the sambaXP 2022 conference, Jeremy Allison gave a talk titled "The UNIX Filesystem API is Deeply Flawed: What to Do About It?". LWN regulars may recall hints of these discussions in a recent comment thread. He began his presentation with the problems that symbolic links ("symlinks") cause for application developers, then explained how the solutions to the problems posed by symbolic links have led to a substantial increase in the complexity of the APIs involved in working with pathnames.

Allison explained that hardlinks were the first "nice addition" to the original Unix filesystem API; unlike symbolic links, however, they are not dangerous and are, in fact, easy to use. A hard link is simply the connection between a directory entry and the inode of the file (or directory) to which that entry refers. Unix systems allow multiple links to any file, but require that inode and directory entries all reside on the same filesystem.

By contrast, symlinks contain another path as data, and the kernel transparently operates on the file at that path when system calls like open() or chown() are called on the symlink. This seemingly innocuous feature has led to an incredible amount of complexity being added in the effort to meet the needs of programs that need to know whether a pathname contains a symbolic link or not. These programs include archiving programs such as tar, synchronization and file transfer programs such as rsync, network file system servers such as Samba, and many others which suffer from security issues due to the that they don't pay enough attention to symbolic links in pathnames.

The variety of security issues resulting from symbolic links can be seen in a search for CVE entries, which gave Allison 1,361 results when he ran it. These include vulnerabilities that facilitate information disclosure, elevation of privilege, and arbitrary file manipulation, including deletion, among other attacks. Without discussing any specific CVE in detail, he gave an example of the type of security issue that can result from vulnerabilities in symbolic links.

An application running as root may try to verify that /data/mydir is a normal directory (not a symlink) before opening the /data/mydir/passwd file. Between the time the program checks the directory and the time the file opens, an attacker could replace the mydir directory with a symbolic link to /etc, and now the open file is, unexpectedly, /etc/passwd. This is a kind of race condition known as a TOCTOU (time-of-check-to-time-of-use) race.

Symbolic links and complexity

Symbolic links were created, Allison theorized, because hard links are restricted to linking within the same filesystem, so only symbolic links (which do not have this restriction) could be used if an administrator wanted to add new storage media without changing paths to users. The data. He quoted an advertisement for 4.2BSD, which boasted, "This feature frees the user from the constraints of the strict hierarchy that a tree structure imposes. This flexibility is essential for proper namespace management."

The addition of symbolic links led to the lstat() system call, which provided the means to identify whether the last component of a path name is a symbolic link. This was unfortunately insufficient to handle symlinks pointing to directories earlier in the path, he explained. An application could attempt to verify each component of the path individually, but not atomically; another application could make a change to one of the components during this process, which would lead to security vulnerabilities.

An option to the open() system call, O_NOFOLLOW, has the same problem as lstat(). O_NOFOLLOW tells the system call to fail with ELOOP if the last component in the path is a symbolic link, but it only checks the last component. The C library function realpath() follows symbolic links in a path and produces an absolute canonical pathname which the application can then compare with the original. Allison described this as an attractive but incorrect solution to the problem. Another process could make a change between when realpath() is called and when another function is used to manipulate the file in some way. In other words, the same TOCTOU race applies here.

Allison said the openat() system call was designed as a solution...

The problem with symbolic links

At the sambaXP 2022 conference, Jeremy Allison gave a talk titled "The UNIX Filesystem API is Deeply Flawed: What to Do About It?". LWN regulars may recall hints of these discussions in a recent comment thread. He began his presentation with the problems that symbolic links ("symlinks") cause for application developers, then explained how the solutions to the problems posed by symbolic links have led to a substantial increase in the complexity of the APIs involved in working with pathnames.

Allison explained that hardlinks were the first "nice addition" to the original Unix filesystem API; unlike symbolic links, however, they are not dangerous and are, in fact, easy to use. A hard link is simply the connection between a directory entry and the inode of the file (or directory) to which that entry refers. Unix systems allow multiple links to any file, but require that inode and directory entries all reside on the same filesystem.

By contrast, symlinks contain another path as data, and the kernel transparently operates on the file at that path when system calls like open() or chown() are called on the symlink. This seemingly innocuous feature has led to an incredible amount of complexity being added in the effort to meet the needs of programs that need to know whether a pathname contains a symbolic link or not. These programs include archiving programs such as tar, synchronization and file transfer programs such as rsync, network file system servers such as Samba, and many others which suffer from security issues due to the that they don't pay enough attention to symbolic links in pathnames.

The variety of security issues resulting from symbolic links can be seen in a search for CVE entries, which gave Allison 1,361 results when he ran it. These include vulnerabilities that facilitate information disclosure, elevation of privilege, and arbitrary file manipulation, including deletion, among other attacks. Without discussing any specific CVE in detail, he gave an example of the type of security issue that can result from vulnerabilities in symbolic links.

An application running as root may try to verify that /data/mydir is a normal directory (not a symlink) before opening the /data/mydir/passwd file. Between the time the program checks the directory and the time the file opens, an attacker could replace the mydir directory with a symbolic link to /etc, and now the open file is, unexpectedly, /etc/passwd. This is a kind of race condition known as a TOCTOU (time-of-check-to-time-of-use) race.

Symbolic links and complexity

Symbolic links were created, Allison theorized, because hard links are restricted to linking within the same filesystem, so only symbolic links (which do not have this restriction) could be used if an administrator wanted to add new storage media without changing paths to users. The data. He quoted an advertisement for 4.2BSD, which boasted, "This feature frees the user from the constraints of the strict hierarchy that a tree structure imposes. This flexibility is essential for proper namespace management."

The addition of symbolic links led to the lstat() system call, which provided the means to identify whether the last component of a path name is a symbolic link. This was unfortunately insufficient to handle symlinks pointing to directories earlier in the path, he explained. An application could attempt to verify each component of the path individually, but not atomically; another application could make a change to one of the components during this process, which would lead to security vulnerabilities.

An option to the open() system call, O_NOFOLLOW, has the same problem as lstat(). O_NOFOLLOW tells the system call to fail with ELOOP if the last component in the path is a symbolic link, but it only checks the last component. The C library function realpath() follows symbolic links in a path and produces an absolute canonical pathname which the application can then compare with the original. Allison described this as an attractive but incorrect solution to the problem. Another process could make a change between when realpath() is called and when another function is used to manipulate the file in some way. In other words, the same TOCTOU race applies here.

Allison said the openat() system call was designed as a solution...

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow