3  Process notation

Scsh has a notation for controlling Unix processes that takes the form of s-expressions; this notation can then be embedded inside of standard Scheme code. The basic elements of this notation are process forms, extended process forms, and redirections.

3.1  Extended process forms and i/o redirections

An extended process form is a specification of a Unix process to run, in a particular I/O environment:

epf ::= (pf redir1 ... redirn)
where pf is a process form and the rediri are redirection specs. A redirection spec is one of:
(< [fdes] file-name) Open file for read.
(> [fdes] file-name) Open file create/truncate.
(<< [fdes] object) Use object's printed rep.
(>> [fdes] file-name) Open file for append.
(= fdes fdes/port) Dup2
(- fdes/port) Close fdes/port.
stdports 0,1,2 dup'd from standard ports.
The fdes file descriptors have these defaults:
< << > >>
0 0 1 1

The subforms of a redirection are implicitly backquoted, and symbols stand for their print-names. So (> ,x) means ``output to the file named by Scheme variable x,'' and (< /usr/shivers/.login) means ``read from /usr/shivers/.login.'' This implicit backquoting is an important feature of the process notation, as we'll see later (sections 5 and 10.6).

Here are two more examples of i/o redirection:


(< ,(vector-ref fv i)) 
(>> 2 /tmp/buf)
These two redirections cause the file fv[i] to be opened on stdin, and /tmp/buf to be opened for append writes on stderr.

The redirection (<< object) causes input to come from the printed representation of object. For example,

(<< "The quick brown fox jumped over the lazy dog.")
causes reads from stdin to produce the characters of the above string. The object is converted to its printed representation using the display procedure, so
(<< (A five element list))
is the same as
(<< "(A five element list)")
is the same as
(<< ,(reverse '(list element five A))).
(Here we use the implicit backquoting feature to compute the list to be printed.)

The redirection (= fdes fdes/port) causes fdes/port to be dup'd into file descriptor fdes. For example, the redirection

(= 2 1)
causes stderr to be the same as stdout. fdes/port can also be a port, for example:
(= 2 ,(current-output-port))
causes stderr to be dup'd from the current output port. In this case, it is an error if the port is not a file port (e.g., a string port). {Note No port sync}

More complex redirections can be accomplished using the begin process form, discussed below, which gives the programmer full control of i/o redirection from Scheme.

3.2  Process forms

A process form specifies a computation to perform as an independent Unix process. It can be one of the following:


(begin .  scheme-code)     
(| 
pf
1 ... 
pf
n)          
(|+  connect-list 
pf
1 ... 
pf
n)      
(epf .  epf)                       
( prog 
arg
1 ... 
arg
n)       
        

; Run  scheme-code in a fork.
; Simple pipeline
; Complex pipeline
; An extended process form.
; Default: exec the program.
The default case (prog arg1 ... argn) is also implicitly backquoted. That is, it is equivalent to:
(begin (apply exec-path `(prog arg1 ... argn)))
Exec-path is the version of the exec() system call that uses scsh's path list to search for an executable. The program and the arguments must be either strings, symbols, or integers. Symbols and integers are coerced to strings. A symbol's print-name is used. Integers are converted to strings in base 10. Using symbols instead of strings is convenient, since it suppresses the clutter of the surrounding "..." quotation marks. To aid this purpose, scsh reads symbols in a case-sensitive manner, so that you can say
(more Readme)
and get the right file. (See section 7 for further details on lexical issues.)

A connect-list is a specification of how two processes are to be wired together by pipes. It has the form ((from1 from2 ... to) ...) and is implicitly backquoted. For example,

(|+ ((1 2 0) (3 3)) pf1 pf2)
runs pf1 and pf2. The first clause (1 2 0) causes pf1's stdout (1) and stderr (2) to be connected via pipe to pf2's stdin (0). The second clause (3 3) causes pf1's file descriptor 3 to be connected to pf2's file descriptor 3.

3.3  Using extended process forms in Scheme

Process forms and extended process forms are not Scheme. They are a different notation for expressing computation that, like Scheme, is based upon s-expressions. Extended process forms are used in Scheme programs by embedding them inside special Scheme forms. There are three basic Scheme forms that use extended process forms: exec-epf, &, and run:


(exec-epf .  epf)
(& .  epf)         
(run .  epf)
    

; Nuke the current process.
; Run  epf in background; return pid.
; Run  epf; wait for termination.
;    Returns exit status.
These special forms are macros that expand into the equivalent series of system calls. The definition of the exec-epf macro is non-trivial, as it produces the code to handle i/o redirections and set up pipelines. However, the definitions of the & and run macros are very simple:
(& . epf) (fork (lambda () (exec-epf . epf)))
(run . epf) (wait (& . epf))

Figures 2 and 3 show a series of examples employing a mix of the process notation and the syscall library. Note that regular Scheme is used to provide the control structure, variables, and other linguistic machinery needed by the script fragments.



;; If the resource file exists, load it into X.
(if (file-exists? f))
    (run (xrdb -merge ,f)))

;; Decrypt my mailbox; key is "xyzzy".
(run (crypt xyzzy) (< mbox.crypt) (> mbox))

;; Dump the output from ls, fortune, and from into log.txt.
(run (begin (run (ls))
            (run (fortune))
            (run (from)))
     (> log.txt))

;; Compile FILE with FLAGS.
(run (cc ,file ,@flags))

;; Delete every file in DIR containing the string "/bin/perl":
(with-cwd dir
  (for-each (lambda (file)
              (if (zero? (run (grep -s /bin/perl ,file)))
                  (delete-file file)))
            (directory-files)))
Figure 2:  Example shell script fragments (a)




;; M4 preprocess each file in the current directory, then pipe
;; the input into cc. Errlog is foo.err, binary is foo.exe.
;; Run compiles in parallel.
(for-each (lambda (file)
            (let ((outfile (replace-extension file ".exe"))
                  (errfile (replace-extension file ".err")))
              ( (| (m4) (cc -o ,outfile))
                 (< ,file)
                 (> 2 ,errfile))))
          (directory-files))

;; Same as above, but parallelise even the computation
;; of the filenames.
(for-each (lambda (file)
            ( (begin (let ((outfile (replace-extension file ".exe"))
                            (errfile (replace-extension file ".err")))
                        (exec-epf (| (m4) (cc -o ,outfile))
                                  (< ,file)
                                  (> 2 ,errfile))))))
          (directory-files))

;; DES encrypt string PLAINTEXT with password KEY. My DES program
;; reads the input from fdes 0, and the key from fdes 3. We want to
;; collect the ciphertext into a string and return that, with error
;; messages going to our stderr. Notice we are redirecting Scheme data
;; structures (the strings PLAINTEXT and KEY) from our program into
;; the DES process, instead of redirecting from files. RUN/STRING is
;; like the RUN form, but it collects the output into a string and 
;; returns it (see following section).

(run/string (/usr/shivers/bin/des -e -3)
            (<< ,plaintext) (<< 3 ,key))

;; Delete the files matching regular expression PAT.
;; Note we aren't actually using any of the process machinery here --
;; just pure Scheme.
(define (dsw pat)
  (for-each (lambda (file)
              (if (y-or-n? (string-append "Delete " file))
                  (delete-file file)))
            (file-match #f pat)))
Figure 3:  Example shell script fragments (b)


3.4  Procedures and special forms

It is a general design principle in scsh that all functionality made available through special syntax is also available in a straightforward procedural form. So there are procedural equivalents for all of the process notation. In this way, the programmer is not restricted by the particular details of the syntax. Here are some of the syntax/procedure equivalents:

Notation Procedure
| fork/pipe
|+ fork/pipe+
exec-epf exec-path
redirection open, dup
& fork
run wait + fork
Having a solid procedural foundation also allows for general notational experimentation using Scheme's macros. For example, the programmer can build his own pipeline notation on top of the fork and fork/pipe procedures.

(fork [thunk])         (procedure)
Fork spawns a Unix subprocess. Its exact behavior depends on whether it is called with the optional thunk argument.

With the thunk argument, fork spawns off a subprocess that calls thunk, exiting when thunk returns. Fork returns the subprocess' pid to the parent process.

Without the thunk argument, fork behaves like the C fork() routine. It returns in both the parent and child process. In the parent, fork returns the child's pid; in the child, fork returns #f.

(fork/pipe [thunk])         (procedure)
Like fork, but the parent and child communicate via a pipe connecting the parent's stdin to the child's stdout. This function side-effects the parent by changing his stdin.

In effect, fork/pipe splices a process into the data stream immediately upstream of the current process. This is the basic function for creating pipelines. Long pipelines are built by performing a sequence of fork/pipe calls. For example, to create a background two-process pipe a | b, we write:


(fork (lambda () (fork/pipe a) (b)))
which returns the pid of b's process.

To create a background three-process pipe a | b | c, we write:


(fork (lambda () (fork/pipe a)
            (fork/pipe b)
            (c)))
which returns the pid of c's process.

(fork/pipe+ conns [thunk])         (procedure)
Like fork/pipe, but the pipe connections between the child and parent are specified by the connection list conns. See the
(|+ conns pf1 ... pfn)
process form for a description of connection lists.

3.5  Interfacing process output to Scheme

There is a family of procedures and special forms that can be used to capture the output of processes as Scheme data. Here are the special forms for the simple variants:


(run/port    .  epf) ; Return port open on process's stdout.
(run/file    .  epf) ; Process > temp file; return file name.
(run/string  .  epf) ; Collect stdout into a string and return.
(run/strings .  epf) ; Stdout->list of newline-delimited strings.
(run/sexp    .  epf) ; Read one sexp from stdout with READ.
(run/sexps   .  epf) ; Read list of sexps from stdout with READ.

Run/port returns immediately after forking off the process; other forms wait for either the process to die (run/file), or eof on the communicating pipe (run/string, run/strings, run/sexps). These special forms just expand into calls to the following analogous procedures:
(run/port* thunk)         (procedure)
(run/file* thunk)         (procedure)
(run/string* thunk)         (procedure)
(run/strings* thunk)         (procedure)
(run/sexp* thunk)         (procedure)
(run/sexps* thunk)         (procedure)
For example, (run/port . epf) expands into
(run/port* (lambda () (exec-epf . epf))).

These procedures can be used to manipulate the output of Unix programs with Scheme code. For example, the output of the xhost(1) program can be manipulated with the following code:


;;; Before asking host REMOTE to do X stuff, 
;;; make sure it has permission.
(while (not (member remote (run/strings (xhost))))
  (display "Pausing for xhost...")
  (read-char))

The following procedures are also of utility for generally parsing input streams in scsh:

(port->string port)         (procedure)
(port->sexp-list port)         (procedure)
(port->string-list port)         (procedure)
(port->list reader port)         (procedure)
Port->string reads the port until eof, then returns the accumulated string. Port->sexp-list repeatedly reads data from the port until eof, then returns the accumulated list of items. Port->string-list repeatedly reads newline-terminated strings from the port until eof, then returns the accumulated list of strings. The delimiting newlines are not part of the returned strings. Port->list generalises these two procedures. It uses reader to repeatedly read objects from a port. It accumulates these objects into a list, which is returned upon eof. The port->string-list and port->sexp-list procedures are trivial to define, being merely port->list curried with the appropriate parsers:

(port->string-list  port= (port->list read-line  port)
(port->sexp-list    port= (port->list read  port)
The following compositions also hold:

run/string*   =  port->string      o run/port*
run/strings*  =  port->string-list o run/port*
run/sexp*     =  read              o run/port*
run/sexps*    =  port->sexp-list   o run/port*