/*
 * watershed - an auxiliary verb for optimising away
 *             unnecessary runs of idempotent commands
 *
 * watershed is Copyright 2007 Canonical Ltd
 * written by Ian Jackson <ian@davenant.greenend.org.uk>
 * and this version now maintained as part of chiark-utils
 *
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public
 * License along with this file; if not, consult the Free Software
 * Foundation's website at www.fsf.org, or the GNU Project website at
 * www.gnu.org.
 *
 */
/*
 *  NB a different fork of this code exists in Ubuntu's udev.
 */
/*
 *
 * usage: watershed [<options>] <command> [<arg>...]
 *
 * options:
 *   -d|--state-dir <state-dir>
 *        default is /var/run/watershed for uid 0
 *                   $HOME/.watershed for others
 *   -i|--command-id <command-id>
 *
 * files used:
 *    <state-dir>/<command-id>.lock            lockfile
 *    <state-dir>/<command-id>.cohort          cohort
 *
 * default <command-id> is
 *    hex(sha256(argv[0]+'\0' + argv[1]+'\0' ... argv[argc-1]+'\0')
 *    '='
 *    mangled argv[0] (all chars [^-+_0-9A-Za-z] replaced with ?
 *                     and max 32 chars)
 *
 * exit status:
 *  127      - something went wrong, or process died with some other signal
 *  SIGPIPE  - process died with SIGPIPE
 *  x        - process called _exit(x)
 *
 * stdin/stdout/stderr:
 *
 *  If watershed exits 127 due to some unexpected problem, a message
 *  is printed to stderr explaining why (obviously).
 *
 *  If a watershed invocation ends up running the process, the process
 *  simply inherits stdin/out/err.  Otherwise stdin/stdout are not used.
 *
 *  If the process run for us by another invocation of watershed exits
 *  zero, or watershed die with the same signal as the process
 *  (currently just SIGPIPE), nothing is printed to stderr.  Otherwise
 *  (ie, failure of the actual process, in another invocation),
 *  watershed prints a description of the wait status to stderr, much
 *  as the shell might.
 *
 */
/*
 * gcc -Wall -Wwrite-strings -Wmissing-prototypes watershed.c -o watershed /usr/lib/libnettle.a
 */
/*
 *
 * Theory:
 *
 *  We consider only invocations with a specific command id (and state
 *  directory), since other invocations are completely independent by
 *  virtue of having different state file pathnames and thus different
 *  state files.  Normally, a command id corresponds to invocations
 *  with a particular set of command line arguments and a state
 *  directory corresponds to a particular euid; environment variable
 *  settings and other inherited process properties are disregarded.
 *
 *  A `cohort' is a set of invocations which can be coalesced into one
 *  run of the command.  For each cohort there is a file, the cohort
 *  file (which may not yet exist, may exist and have a name, or may
 *  be unliked).
 *
 *  An `invocation' is an invocation of the `watershed' program.  A
 *  `process' is an invocation of the requested command.
 *
 *  There is always one current cohort, in one of the following
 *  two states:
 *
 *   * Empty
 *     No invocations are in this cohort yet.
 *     The cohort filename is ENOENT.
 *     This is the initial state for a cohort, and the legal next
 *     state is Accumulating.
 *
 *   * Accumulating
 *     The process for this run has not yet started, so that new
 *     invocations arriving would be satisfied if this cohort were to
 *     run.
 *     The cohort filename refers to this cohort's file.
 *     The legal next state for the cohort is Ready.
 *
 *  Additionally, there may be older cohorts in the following states:
 *
 *   * Ready
 *     The command for this cohort has not yet been run.
 *     The cohort file has no name and is empty.
 *     Only one cohort, the lockholder's, may be in this state.
 *     The next legal states are Running, or exceptionally Forgotten
 *     (if the lockholder crashes and is the only invocation in the
 *     cohort).
 *
 *   * Running
 *     The lockholder is running the command for this cohort.
 *     This state is identical to Ready from the point of view
 *     of all invocations except the lockholder.
 *     The legal next states are Done (the usual case), or (if the
 *     lockholder crashes) Ready or Forgotten.
 *
 *   * Done
 *     The main process for this run has finished.
 *     The cohort file has no name and contains sizeof(int)
 *     bytes, the `status' value from waitpid.
 *     The legal next state is Forgotten.
 *
 *   * Forgotten
 *     All invocations have finished and the cohort file no longer
 *     exists.  This is the final state.
 *
 *  Only the lockholder may move a cohort between states, except that
 *  any invocation may make the current Empty cohort become
 *  Accumulating, and that the kernel will automatically move a cohort
 *  from Running to Ready or from Done to Forgotten, when appropriate.
 *
 * 
 * Algorithm:
 *
 *   1. Open the cohort file (O_CREAT|O_RDWR)   so our cohort is
 *                                                 Accumulating/Ready/
 *                                                    Running/Done
 *                              
 *   2. Acquire lock (see below)                so lockholder's cohort is
 *				                   Accumulating/Ready/Done
 *   3. fstat the open cohort file
 *         If it is nonempty:                      Done
 *          Read status from it and exit.
 *         Otherwise, if nonzero link count:       Accumulating
 *          Unlink the cohort filename
 *         Otherwise:                              Ready
 *
 *   4. Fork and run the command                   Running
 *       and wait for it
 *
 *   5. Write the wait status to the cohort file   Done
 *
 *                      
 *   6. Release the lock                        so we are no longer lockholder
 *                                              but our cohort is still
 *                                                 Done
 *
 *   8. Exit                                       Done/Forgotten
 *
 *  If an invocation crashes (ie, if watershed itself fails, rather
 *  than if the command does) then that invocation's caller will be
 *  informed of the error.
 *
 *  If the lockholder crashes with the cohort in:
 *
 *     Accumulating:
 *       The cohort remains in Accumulating and another invocation can
 *       become the lockholder.  If there are never any other
 *       invocations then the lockfile and cohort file will not be
 *       cleaned up (see below).
 *
 *     Running/Ready:
 *       The cohort goes from Running back to Ready (see above) and
 *       another invocation in the same cohort will become the
 *       lockholder and run it.  If there is no other invocation in
 *       the cohort the cohort goes to Forgotten although the lockfile
 *       will not be cleaned up - see below.
 *
 *     Done:
 *       If there are no more invocations, the cohort is Forgotten but
 *       the lockfile is not cleaned up.
 *
 * Lockfile:
 *
 *  There is one lock for all cohorts.  The lockholder is the
 *  invocation which holds the fcntl lock on the file whose name is
 *  the lockfile.  The lockholder (and no-one else) may unlink the
 *  lockfile.
 *
 *  To acquire the lock:
 *
 *   1. Open the lockfile (O_CREAT|O_RDWR)
 *   2. Acquire fcntl lock (F_SETLKW)
 *   3. fstat the open lockfile and stat the lockfile filenmae
 *      If inode numbers disagree, close lockfile and start
 *      again from the beginning.
 *
 *  To release the lock, unlink the lockfile and then either close it
 *  or exit.  Crashing will also release the lock but leave the
 *  lockfile lying around (which is slightly untidy but not
 *  incorrect); if this is a problem a cleanup task could periodically
 *  acquire and release the lock for each lockfile found:
 *
 * Cleanup:
 *
 *  As described above and below, stale cohort files and lockfiles can
 *  result from invocations which crashed if the same command is never
 *  run again.  Such cohorts are always in Empty or Accumulating.
 *
 *  If it became necessary to clean up stale cohort files and
 *  lockfiles resulting from crashes, the following algorithm should
 *  be executed for each lockfile found, as a cleanup task:
 *
 *   1. Acquire the lock.
 *      This makes us the lockholder.           and the current cohort is in
 *                                                 Empty/Accumulating
 *
 *                                              so now that cohort is
 *   2. Unlink the cohort file, ignoring ENOENT.   Ready/Forgotten
 *   3. Release the lock.                          Ready/Forgotten
 *   4. Exit.                                      Ready/Forgotten
 *
 *  This consists only of legal transitions, so if current cohort
 *  wasn't stale, it will have been moved to Ready and some other
 *  invocation in this cohort will become the lockholder and as normal
 *  from step 4 of the main algorithm.  If the cohort was stale it
 *  will go to Forgotten straight away.
 *
 *  A suitable cleanup script, on a system with with-lock-ex, is: */
 //     #!/bin/sh
 //     set -e
 //     if [ $# != 1 ]; echo >&2 'usage: cleanup <statedir>'; exit 1; fi
 //     cd "$1"
 //     for f in ./*.lock; do
 //       with-lock-ex -w rm -f "${f%.lock}.cohort"
 //     done
/*
 */
