opening /proc/self/fd/1 is not the same as dup(1)
Introduction
Unix processes have file descriptors which point to file descriptions (struct file
in Linux). Multiple file descriptors can point to the same file description, for instance by duplicating them with dup(2), or by passing them across process boundaries using fork(2) or UNIX Domain Sockets (unix(7)).
For a long time, I was under the impression that that was also what happened behind the scenes when opening /dev/fd/${FD}
(a.k.a. /proc/${PID}/fd/${FD}
) on Linux. I thought I would get a new file descriptor which is also pointing to the same file description, similar to if you were calling dup(fd)
. This is wrong!
This feature is mis-documented
The misunderstanding is even documented in my earlier edition of “The Linux Programming Interface” (section 5.11) (but it has been fixed in newer editions, as Michael Kerrisk points out in the comments below):
Opening one of the files in the /dev/fd directory is equivalent to duplicating the corresponding file descriptor. Thus, the following statements are equivalent:
fd = open("/dev/fd/1", O_WRONLY); fd = dup(1);
This is a reasonably simple explanation which is close enough to reality for many practical use cases, and which is true on other Unixes, but it is not fully accurate on Linux. (The book is very comprehensive and useful nevertheless.)
This RedHat bug from 2000 discusses how that behaviour was apparently changed in Linux 1.3.34. The aforementioned equivalence between the open(2) and dup(2) calls is called the “Plan9 semantics” there.
The man page gives usage examples, but does not go into a lot of detail on the exact semantics in the case of open(2).
/dev/fd/*
behave different on other Unixes
On top of that, the behavior is implemented differently on other Unixes.
From a FreeBSD 14 box:
$ ./dup -dup > out; cat out; echo
1d
$ ./dup -proc > out; cat out; echo
1d
On FreeBSD, the result of open("/dev/fd/1", O_WRONLY);
does share the same file description with the original file descriptor, as if we were calling dup(1)
.
Part 1: An experiment!
It turns out, opening /dev/fd/*
, /proc/${PID}/fd/*
or /proc/self/fd/*
results in a separate file description (struct file
) being allocated for you, but it refers to the same underlying file on disk.
You can try it out with the following program:
$ cat dup.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int usage(const char *name) {
printf("Usage: %s [-dup|-proc]\n", name);
return 0;
}
int main(int argc, char *argv[]) {
int fd;
if (argc != 2) {
return usage(argv[0]);
}
if (!strcmp(argv[1], "-dup")) {
fd = dup(1); // stdout
if (fd < 0) {
err(1, "dup");
}
} else if (!strcmp(argv[1], "-proc")) {
fd = open("/dev/fd/1", O_WRONLY);
if (fd < 0) {
err(1, "open /dev/fd/1");
}
} else {
return usage(argv[0]);
}
write(1, "1", 1);
write(fd, "d", 1);
close(fd);
}
When we build and run this program, we can see that the behavior of dup(2) and open(2) is actually different!
Duplicating the file descriptor using dup(2)
$ make dup
cc -Wall -static dup.c -o dup
$ ./dup -dup > out; cat out; echo
1d
$
In the dup(2) case, the struct file
is actually shared – both file descriptors refer to the exact same file description. The first write(2) updates the file description’s file position (f_pos
). The second write(2) uses the exact same file description, so it sees the updated file position, and the byte gets written after the one that was written before.
Duplicating the file descriptor through /proc
$ ./dup -proc > out; cat out; echo
d
$
In the proc(2) case, we see only one byte written to the output file.
So there are two struct file
s created –
and they use independent positions f_pos
in the file, which are both set to 0 initially.
- The first write(2) through stdout (fd 1) updates the file position from 0 to 1.
- The second write(2) uses a separate file description and overwrites the byte that was previously written.
That’s why we can only see “d
” in the output.
Other file types
So far, this was a bit confusing. It’s definitely inconsistent with
the theory that opening /dev/fd/*
does the same as dup(2). But
what happens for other file types than regular files?
TCP Sockets: Can not be reopened through /proc
You can try this out by redirecting stdout to a socket, using the
obscure /dev/tcp
extension in bash1:
$ nc -l 9999 &
[1] 4166
$ ./dup -proc >/dev/tcp/localhost/9999
dup: open /dev/fd/1: No such device or address
[1]+ Done nc -l 9999
$
The error here is ENXIO: No such device or address
.
For sockets, the /proc/self/fd/*
entry is a symlink to a name like socket:[16902]
.
lrwx------ 1 gnoack gnoack 64 Feb 17 23:12 1 -> 'socket:[16902]'
Pipes: Can be reopened through /proc
However, a pipe can be reopened through /dev/fd/1
, for example like this:
$ ./dup -proc | cat ; echo
1d
…and this works even though the pipe’s symlink looks like this:
l-wx------ 1 gnoack gnoack 64 Feb 17 23:10 1 -> 'pipe:[15895]'
Part 2: What is really happening
First, let’s recall the in-kernel VFS structure:
The following things happen in a sequence:
- A user space process calls
open("/proc/self/fd/1")
- System call handler:
- parses flags
- does the path walk, which eventually invokes
proc_pid_get_link()
:fs/proc/base.c:proc_pid_get_link()
:- invokes
proc_fd_link()
through a callbackfs/proc/fd.c:proc_fd_link()
: looks up the originalstruct file*
from the target task and returns the->f_path
that existed on thatstruct file
(through an output pointer argument).
- invokes
nd_jump_link()
, which sets the result of the path walk innameidata
to the previously set path!
- invokes
- eventually calls
path_openat()
.namei.c:path_openat()
: Always allocates a newstruct file
withalloc_empty_file()
namei.c:do_open()
: callsvfs_open()
, which in turn callsdo_dentry_open()
open.c:do_dentry_open()
:- first initializes the file ops from the inode:
f->f_op = fops_get(inode->i_fop)
- the calls the “open” file operation:
f->f_op->open
- first initializes the file ops from the inode:
Where did the no_open
pointer come from?
For the TCP socket above, f->f_op->open
is set to the no_open
function, which unconditionally returns ENXIO
. So that socket can’t be reopened through /proc
.
The decision which f_op->open
is used for each file is done in inode.c:init_special_inode
, for sockets and pipes.
Summary
- Every call to open(2) results in a new
struct file*
being allocated. - The resulting
struct file*
refers to an existing inode, even for special files like pipes. - Not all of the special files support this kind of re-opening.
-
These
/dev/tcp/...
files do not actually exist: bash treats these paths specially and really just calls the BSD socket API itself… but we can use it here to write directly into a socket. ↩︎