Thread by @LucasWerkmeister@wikis.world

@LucasWerkmeister@wikis.world at 10/22/2023, 1:29:51 PM

A cool Bash feature that I think more people should know: >() and <() syntax, aka “process substitution”.

These can be used as part of a command, to assemble more complicated pipelines. Each <() effectively replaces an input file argument, and each >() replaces an output file argument.

For example:

# diff the hexdump of two binary files
diff <(hexdump file1.bin) <(hexdump file2.bin)

# print a file, followed by its number of lines
cat file.txt | tee >(wc -l)

One way to think about this is as an extended version of /dev/stdin and /dev/stdout. In fact this syntax can be used to replace those:

in-command... | some-program -i /dev/stdin -o /dev/stdout | out-command...
# is equivalent to
some-program -i <(in-command...) -o >(out-command...)

But you’re no longer limited to just one input and one output (or two outputs if you count /dev/stderr). You can use this as often as you need. (I used two inputs in the diff example above.)

In terms of what you get at the end of the day, it’s a bit as if you used a bunch of temporary files:

outer-cmd <(in-cmd A) <(in-cmd B) >(out-cmd A) >(out-cmd B)
# is kind of like
in-cmd A > file1
in-cmd B > file2
outer-cmd file1 file2 file3 file4
out-cmd A < file3
out-cmd B < file4

Except that it all happens at the same time, using pipes instead of regular files. Nothing ever ends up on disk.

In fact, you could emulate a much closer version using named pipes:

mkfifo fifo1 fifo2 fifo3 fifo4
in-cmd A > fifo1 &
in-cmd B > fifo2 &
out-cmd A < fifo3 &
out-cmd B < fifo4 &
outer-cmd fifo1 fifo2 fifo3 fifo4 &
wait
rm fifo1 fifo2 fifo3 fifo4

Except, again, what Bash does under the hood doesn’t end up on disk – the pipes are anonymous.

You can quite simply peek under the hood by using this syntax with a command that *doesn’t* treat its arguments as files.

$ echo <(pwd) >(cat)
/dev/fd/63 /dev/fd/62

echo was called with two /dev/fd paths. (On Linux, /dev/fd is a symlink to /proc/self/fd; I gather the /dev/fd name is more compatible with other unixes.) The numbers refer to file descriptors that Bash set up before running the command. 1/2

If echo opened the path /dev/fd/63 for reading, it would be connected to the pwd command, and if it read from the file descriptor, it would read pwd’s output; similarly, if echo opened /dev/fd/62 for writing, it would be connected to cat and writing to it would end up in cat’s standard input. But echo doesn’t treat its arguments like file paths, so none of this happens. 2/2

In practice, I use <() a lot more than >(), and often with diff as I showed above; it can also be like:

# diff pretty-printed JSON
diff <(jq . file1.json) <(jq . file2.json)
# or even
diff <(curl URL1 | jq .) <(curl URL2 | jq .)

# check that a command produces the expected output
diff expected.txt <(some command)

My Bash history also remembers this, which would print the lines common to two files:

comm -12 <(sort file1) <(sort file2)

(comm only works on sorted input, so we have to sort the files separately. The <() syntax lets us do it on the fly instead of using temporary files.)

So that’s my rambling thread about this syntax which I find quite hard to explain but very useful ^^

Feel free to share your ideas or suggestions for what else you could use this feature for :)

According to https://www.gnu.org/software/coreutils/manual/html_node/tee-invocation.html, zsh and ksh also support this syntax, but I know nothing about those shells.

The second example at the top is technically a UUOC but I didn’t want to make it even more confusing.

The Bash manpage suggests that if /dev/fd is not available, named pipes (FIFOs) can be used instead, so in that case something might end up on disk after all. (I don’t know for sure, though.)