opening /proc/self/fd/1 is not the same as dup(1)

A dive into Linux' /proc file system

February 19, 2024

Introduction

Unix processes have file descriptors which point to file descriptions (struct file in Linux). Multiple file descriptors can point to the same file description, for instance by duplicating them with dup(2), or by passing them across process boundaries using fork(2) or UNIX Domain Sockets (unix(7)).

For a long time, I was under the impression that that was also what happened behind the scenes when opening /dev/fd/${FD} (a.k.a. /proc/${PID}/fd/${FD}) on Linux. I thought I would get a new file descriptor which is also pointing to the same file description, similar to if you were calling dup(fd). This is wrong!

This feature is mis-documented

The misunderstanding is even documented in my earlier edition of “The Linux Programming Interface” (section 5.11) (but it has been fixed in newer editions, as Michael Kerrisk points out in the comments below):

Opening one of the files in the /dev/fd directory is equivalent to duplicating the corresponding file descriptor. Thus, the following statements are equivalent:
fd = open("/dev/fd/1", O_WRONLY);
fd = dup(1);

This is a reasonably simple explanation which is close enough to reality for many practical use cases, and which is true on other Unixes, but it is not fully accurate on Linux. (The book is very comprehensive and useful nevertheless.)

This RedHat bug from 2000 discusses how that behaviour was apparently changed in Linux 1.3.34. The aforementioned equivalence between the open(2) and dup(2) calls is called the “Plan9 semantics” there.

proc_pid_fd(5) gives usage examples, but does not go into a lot of detail on the exact semantics in the case of open(2).

`/dev/fd/*` behave different on other Unixes

On top of that, the behavior is implemented differently on other Unixes.

From a FreeBSD 14 box:

$ ./dup -dup > out; cat out; echo
1d
$ ./dup -proc > out; cat out; echo
1d

On FreeBSD, the result of open("/dev/fd/1", O_WRONLY); does share the same file description with the original file descriptor, as if we were calling dup(1).

Part 1: An experiment!

It turns out, opening /dev/fd/*, /proc/${PID}/fd/* or /proc/self/fd/* (proc_pid_fd(5)) results in a separate file description (struct file) being allocated for you, but it refers to the same underlying file on disk.

You can try it out with the following program:

$ cat dup.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int usage(const char *name) {
  printf("Usage: %s [-dup|-proc]\n", name);
  return 0;
}

int main(int argc, char *argv[]) {
  int fd;

  if (argc != 2) {
    return usage(argv[0]);
  }

  if (!strcmp(argv[1], "-dup")) {
    fd = dup(1);  // stdout
    if (fd < 0) {
      err(1, "dup");
    }
  } else if (!strcmp(argv[1], "-proc")) {
    fd = open("/dev/fd/1", O_WRONLY);
    if (fd < 0) {
      err(1, "open /dev/fd/1");
    }
  } else {
    return usage(argv[0]);
  }

  write(1, "1", 1);
  
  write(fd, "d", 1);
  close(fd);
}

When we build and run this program, we can see that the behavior of dup(2) and open(2) is actually different!

Duplicating the file descriptor using dup(2)

$ make dup
cc -Wall -static    dup.c   -o dup
$ ./dup -dup > out; cat out; echo
1d
$

In the dup(2) case, the struct file is actually shared – both file descriptors refer to the exact same file description. The first write(2) updates the file description’s file position (f_pos). The second write(2) uses the exact same file description, so it sees the updated file position, and the byte gets written after the one that was written before.

Duplicating the file descriptor through `/proc`

$ ./dup -proc > out; cat out; echo
d
$

In the proc_pid_fd(5) case, we see only one byte written to the output file. So there are two struct files created – and they use independent positions f_pos in the file, which are both set to 0 initially.

The first write(2) through stdout (fd 1) updates the file position from 0 to 1.
The second write(2) uses a separate file description and overwrites the byte that was previously written.

That’s why we can only see “d” in the output.

Other file types

So far, this was a bit confusing. It’s definitely inconsistent with the theory that opening /dev/fd/* does the same as dup(2). But what happens for other file types than regular files?

TCP Sockets: Can not be reopened through `/proc`

You can try this out by redirecting stdout to a socket, using the obscure /dev/tcp extension in bash¹:

$ nc -l 9999 &
[1] 4166
$ ./dup -proc >/dev/tcp/localhost/9999
dup: open /dev/fd/1: No such device or address
[1]+  Done                    nc -l 9999
$

The error here is ENXIO: No such device or address.

For sockets, the /proc/self/fd/* entry is a symlink to a name like socket:[16902].

lrwx------ 1 gnoack gnoack 64 Feb 17 23:12 1 -> 'socket:[16902]'

Pipes: Can be reopened through `/proc`

However, a pipe can be reopened through /dev/fd/1, for example like this:

$ ./dup -proc | cat ; echo
1d

…and this works even though the pipe’s symlink looks like this:

l-wx------ 1 gnoack gnoack 64 Feb 17 23:10 1 -> 'pipe:[15895]'

Part 2: What is really happening

First, let’s recall the in-kernel VFS structure:

The following things happen in a sequence:

A user space process calls open("/proc/self/fd/1")
System call handler:
- parses flags
- does the path walk, which eventually invokes proc_pid_get_link():
  - fs/proc/base.c:proc_pid_get_link():
    - invokes proc_fd_link() through a callback
      - fs/proc/fd.c:proc_fd_link(): looks up the original struct file* from the target task and returns the ->f_path that existed on that struct file (through an output pointer argument).
    - invokes nd_jump_link(), which sets the result of the path walk in nameidata to the previously set path!
- eventually calls path_openat().
  - namei.c:path_openat(): Always allocates a new struct file with alloc_empty_file()
  - namei.c:do_open(): calls vfs_open(), which in turn calls do_dentry_open()
  - open.c:do_dentry_open():
    - first initializes the file ops from the inode: f->f_op = fops_get(inode->i_fop)
    - the calls the “open” file operation: f->f_op->open

Where did the `no_open` pointer come from?

For the TCP socket above, f->f_op->open is set to the no_open function, which unconditionally returns ENXIO. So that socket can’t be reopened through /proc.

The decision which f_op->open is used for each file is done in inode.c:init_special_inode, for sockets and pipes.

Summary

Every call to open(2) results in a new struct file* being allocated.
The resulting struct file* refers to an existing inode, even for special files like pipes.
Not all of the special files support this kind of re-opening.

These /dev/tcp/... files do not actually exist: bash treats these paths specially and really just calls the BSD socket API itself… but we can use it here to write directly into a socket. ↩︎

opening /proc/self/fd/1 is not the same as dup(1)

Introduction

This feature is mis-documented

/dev/fd/* behave different on other Unixes

Part 1: An experiment!

Duplicating the file descriptor using dup(2)

Duplicating the file descriptor through /proc

Other file types

TCP Sockets: Can not be reopened through /proc

Pipes: Can be reopened through /proc