opening /proc/self/fd/1 is not the same as dup(1)

A dive into Linux' /proc file system

Introduction

Unix processes have file descriptors which point to file descriptions (struct file in Linux). Multiple file descriptors can point to the same file description, for instance by duplicating them with dup(2), or by passing them across process boundaries using fork(2) or UNIX Domain Sockets (unix(7)).

0 1 2 3 0 1 2 3 Process 1: Process 2: file description f_pos

For a long time, I was under the impression that that was also what happened behind the scenes when opening /dev/fd/${FD} (a.k.a. /proc/${PID}/fd/${FD}) on Linux. I thought I would get a new file descriptor which is also pointing to the same file description, similar to if you were calling dup(fd). This is wrong!

This feature is mis-documented

The misunderstanding is even documented in my earlier edition of “The Linux Programming Interface” (section 5.11) (but it has been fixed in newer editions, as Michael Kerrisk points out in the comments below):

Opening one of the files in the /dev/fd directory is equivalent to duplicating the corresponding file descriptor. Thus, the following statements are equivalent:

fd = open("/dev/fd/1", O_WRONLY);
fd = dup(1);

This is a reasonably simple explanation which is close enough to reality for many practical use cases, and which is true on other Unixes, but it is not fully accurate on Linux. (The book is very comprehensive and useful nevertheless.)

This RedHat bug from 2000 discusses how that behaviour was apparently changed in Linux 1.3.34. The aforementioned equivalence between the open(2) and dup(2) calls is called the “Plan9 semantics” there.

The man page gives usage examples, but does not go into a lot of detail on the exact semantics in the case of open(2).

/dev/fd/* behave different on other Unixes

On top of that, the behavior is implemented differently on other Unixes.

From a FreeBSD 14 box:

$ ./dup -dup > out; cat out; echo
1d
$ ./dup -proc > out; cat out; echo
1d

On FreeBSD, the result of open("/dev/fd/1", O_WRONLY); does share the same file description with the original file descriptor, as if we were calling dup(1).

Part 1: An experiment!

It turns out, opening /dev/fd/*, /proc/${PID}/fd/* or /proc/self/fd/* results in a separate file description (struct file) being allocated for you, but it refers to the same underlying file on disk.

You can try it out with the following program:

$ cat dup.c
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int usage(const char *name) {
  printf("Usage: %s [-dup|-proc]\n", name);
  return 0;
}

int main(int argc, char *argv[]) {
  int fd;

  if (argc != 2) {
    return usage(argv[0]);
  }

  if (!strcmp(argv[1], "-dup")) {
    fd = dup(1);  // stdout
    if (fd < 0) {
      err(1, "dup");
    }
  } else if (!strcmp(argv[1], "-proc")) {
    fd = open("/dev/fd/1", O_WRONLY);
    if (fd < 0) {
      err(1, "open /dev/fd/1");
    }
  } else {
    return usage(argv[0]);
  }

  write(1, "1", 1);
  
  write(fd, "d", 1);
  close(fd);
}

When we build and run this program, we can see that the behavior of dup(2) and open(2) is actually different!

Duplicating the file descriptor using dup(2)

$ make dup
cc -Wall -static    dup.c   -o dup
$ ./dup -dup > out; cat out; echo
1d
$

In the dup(2) case, the struct file is actually shared – both file descriptors refer to the exact same file description. The first write(2) updates the file description’s file position (f_pos). The second write(2) uses the exact same file description, so it sees the updated file position, and the byte gets written after the one that was written before.

0 1 2 3 'dup' process: file description f_pos dup(2)

Duplicating the file descriptor through /proc

$ ./dup -proc > out; cat out; echo
d
$

In the proc(2) case, we see only one byte written to the output file. So there are two struct files created – and they use independent positions f_pos in the file, which are both set to 0 initially.

That’s why we can only see “d” in the output.

0 1 2 3 'dup' process: file description f_pos file description f_pos open(2)

Other file types

So far, this was a bit confusing. It’s definitely inconsistent with the theory that opening /dev/fd/* does the same as dup(2). But what happens for other file types than regular files?

TCP Sockets: Can not be reopened through /proc

You can try this out by redirecting stdout to a socket, using the obscure /dev/tcp extension in bash1:

$ nc -l 9999 &
[1] 4166
$ ./dup -proc >/dev/tcp/localhost/9999
dup: open /dev/fd/1: No such device or address
[1]+  Done                    nc -l 9999
$ 

The error here is ENXIO: No such device or address.

For sockets, the /proc/self/fd/* entry is a symlink to a name like socket:[16902].

lrwx------ 1 gnoack gnoack 64 Feb 17 23:12 1 -> 'socket:[16902]'

Pipes: Can be reopened through /proc

However, a pipe can be reopened through /dev/fd/1, for example like this:

$ ./dup -proc | cat ; echo
1d

…and this works even though the pipe’s symlink looks like this:

l-wx------ 1 gnoack gnoack 64 Feb 17 23:10 1 -> 'pipe:[15895]'

Part 2: What is really happening

First, let’s recall the in-kernel VFS structure:

Process fd file object f_path path dentry dentry d_inode inode object i_sb Superblock object Disk file mnt struct vfsmount This is the file description

The following things happen in a sequence:

Where did the no_open pointer come from?

For the TCP socket above, f->f_op->open is set to the no_open function, which unconditionally returns ENXIO. So that socket can’t be reopened through /proc.

The decision which f_op->open is used for each file is done in inode.c:init_special_inode, for sockets and pipes.

Summary


  1. These /dev/tcp/... files do not actually exist: bash treats these paths specially and really just calls the BSD socket API itself… but we can use it here to write directly into a socket. ↩︎

Comments