Proper Treatment 正當作法/ blog/ posts/ Higher-order shell
標籤 Tags:
2008-09-29 01:39

Many Unix utilities execute their arguments as a command line in turn. Some well-known examples are nice, sudo, xargs, and find. These programs are sometimes called meta-commands, but I think of them as higher-order programs by analogy to higher-order functions.

Surely I was not the first to write the following two higher-order programs, given how useful they are.

tmp :: String -> (FilePath -> IO ()) -> IO ()

The first program is tmp. It puts standard input in a temporary file, then invokes the arguments as a command with the name of the temporary file appended.

This program is useful because some programs require input in the form of an ordinary file rather than a pipe. For example, say that we want to retrieve a URL and display the contents using xdvi. It would be nice to be able to say

curl http://www.diku.dk/~andrzej/papers/RC.dvi | xdvi -

or

xdvi <(curl http://www.diku.dk/~andrzej/papers/RC.dvi)

but my xdvi only works on an ordinary file named on the command line:

$ curl http://www.diku.dk/~andrzej/papers/RC.dvi | xdvi -
xdvi.bin: Fatal error: -: No such file,
          and -.dvi doesn't exist either.

$ xdvi <(curl http://www.diku.dk/~andrzej/papers/RC.dvi)
xdvi.bin: Fatal error: /dev/fd/63: File has zero size,
          and /dev/fd/63.dvi doesn't exist either.

Instead, we can use tmp to adapt xdvi to our purpose.

$ curl --silent http://www.diku.dk/~andrzej/papers/RC.dvi |
  tmp xdvi

When the command is done, tmp deletes the temporary file.

$ echo foo | tmp cat
foo

$ echo foo | tmp echo | xargs cat
cat: /tmp/uyocf2fHQx: No such file or directory

Often I use tmp when I need to process the same data twice. For example, it takes two invocations of psselect to print duplex using manual feed on a printer without duplex.

$ a2ps -o- Higher-order_shell.mdwn |
  tmp sh -c 'psselect -e -r $0 | lpr;
             read </dev/tty;
             psselect -o $0 | lpr'

The use of $0 above is documented in the sh man page:

-c

Read commands from the command_string operand. Set the value of special parameter 0 (see Special Parameters) from the value of the command_name operand and the positional parameters ($1, $2, and so on) in sequence from the remaining argument operands. No commands shall be read from the standard input.

We can also use sh -c in the same way to nest multiple invocations of tmp together. For example, the following command uses lynx to convert two HTML files into plain text, then uses xxdiff to show how the results differ. We need to invoke tmp twice because xxdiff requires both of its arguments to name ordinary files rather than pipes.

$ lynx -dump Control-Monad-State-Lazy.html |
  tmp sh -c 'lynx -dump Control-Monad-State-Strict.html |
             tmp xxdiff $0'

Because xxdiff takes the argument - to mean standard input, the following command works too.

$ lynx -dump Control-Monad-State-Lazy.html |
  tmp sh -c 'lynx -dump Control-Monad-State-Strict.html |
             xxdiff $0 -'

keep :: (forall m. MonadIO m => m ()) -> IO a

The second program is keep. It invokes the arguments as a command and monitors the files the command accesses. Whenever any of the files change, keep invokes the command again. Most of the time, I use keep to invoke LaTeX when I don’t want to bother writing a Makefile:

$ keep pdflatex talk.tex

Internally, keep uses strace (another higher-order program) to find out which files are accessed by the command, and Linux’s inotify facility to watch the files for changes. The incron utility also uses inotify to run a command when files change, but keep discovers which files to watch automatically.

Errata

Thanks to bennymack on reddit for pointing out that file can work on standard input. I changed the example from file to xdvi.

I also fixed the out-of-scope type variable in the faux signature of keep.