The Common Layer

This layer collects various utility functions such as file I/O and low-level systems such as logging. The largest modules here deal with resource allocation (see the section called System Resource Management in Chapter 8). These are the Open File Manager and the Temporary File Manager. I'll describe the modules in alphabetical order.

The Abort Module

This module implements an Abort type that signals that connection processing should be aborted. This will be triggered by a time-out. An abort can also be forced before the time-out happens which is useful if a connection is found to be broken while trying to do I/O.

There are two ways the abort condition can be detected. It can be polled or a CML event can be obtained for use in a CML.select call. Here is the module API.

signature ABORT =
sig
    type Abort

    (*  The arg is the time-out in seconds. *)
    val create:     int -> Abort

    (*  This never times out. *)
    val never:      unit -> Abort

    (*  This returns an abort event for synchronising. *)
    val evt:        Abort -> unit CML.event

    (*  This tests if the event has occurred. *)
    val aborted:    Abort -> bool

    (*  This forces the abort to happen early even if it is the 'never'
        condition.
    *)
    val force:      Abort -> unit
end

In an earlier design I implemented the time-out by starting a new thread for each connection to wait on a time-out event. The thread was used to set the flag that was tested by the aborted function. But since these timer threads lasted for as long as the time-out period there would be large numbers of them hanging around in the server during a heavy load of hundreds of connections per second. The performance of the CML run-time does not scale very well for large numbers of time-out events, as I explained in the section called Behaviour of Timeout Events in Chapter 7. After a few seconds of heavy load I found the server grinding to a halt.

The current implementation uses an I-variable from the synchronous variables module (see the section called Synchronous Variables in Chapter 6. The I-variable has all of the necessary properties for an abort condition. It can be set and tested like a flag and an event is available to indicate when it is set. Here is the definition for the Abort type.

datatype Abort = Abort of unit SyncVar.ivar

To manage the timing I've included a manager thread that maintains a map from future points in time to lists of Abort values.

structure Map = IntRedBlackMap

datatype Request = 
        Add of int * Abort          (* (timeout, force) -> abort *)

datatype State = State of {
    time:       int,                (* seconds since startup *)
    live:       (Abort list) Map.map(* waiting to expire *)
    }


fun server ch () =
let
    val start = Time.now()

    fun toTime secs = Time.fromSeconds(LargeInt.fromInt secs)
    fun trunc  time = Int.fromLarge(Time.toSeconds time)


    fun loop (state as State {time, ...}) =
    let
        fun request (Add (delay, abort)) = add delay abort state


        (*  If the timing drifts off it won't hurt if this 
            event is for a time in the past. It will be immediately
            enabled.
        *)
        val time_evt = CML.atTimeEvt(Time.+(start, toTime(time+1)))

        val new_state = CML.select[
                CML.wrap(CML.recvEvt ch,
                    MyProfile.timeIt "abort request" request),

                CML.wrap(time_evt,
                    (*MyProfile.timeIt "abort expire"*) (expire state))
                ]
    in
        loop new_state
    end
... omitted material ...
in
    loop (State {time = 0, live = Map.empty})
end



structure Mgr = Singleton(
                        type input    = Request CML.chan
                        val  newInput = CML.channel
                        val  object   = server
                        )

The Add request inserts an abort into the manager's map.

and add delay abort (state as State {time, live}) =
let
    (*  Find out the end-time in seconds relative to
        the start time of the server, rounded to the
        nearest second.
    *)
    val now   = Time.now()
    val since = Time.-(now, start)
    val ends  = trunc(Time.+(
                    Time.+(since, toTime delay),
                    Time.fromMilliseconds 500
                    ))

    val _ = Log.testInform Globals.TestTimeout Log.Debug
            (fn()=>TF.L ["Abort add delay=", Int.toString delay,
                         " now= ", Time.fmt 6 now,
                         " ends=", Int.toString ends
                         ])

    (* The insert operation will either insert or replace. *)
    fun add_abort() =
    (
        case Map.find(live, ends) of
          NONE =>
            let
                val new_live = Map.insert(live, ends, [abort])
            in
                State {time=time, live=new_live}
            end

        | SOME ab_list =>
            let
                val new_live = Map.insert(live, ends, abort::ab_list)
            in
                State {time=time, live=new_live}
            end
    )
in
    add_abort()
end

The key for the map is the the expiry time for the time-out measured in seconds since the start of the manager. The add_abort function either creates a new entry or adds the Abort to an existing entry. The maximum number of entries will be the size of the time-out from the server's configuration plus 1. This number should stay reasonably small. A 60 second time-out would be reasonable.

The manager counts through the seconds since startup. At each second the expire function scans the keys of the map to see if any lists have expired.

and expire (state as State {time, live}) () =
let
    (*  Find out what the time really is. *)
    val count = trunc(Time.-(Time.now(), start))

    fun check_entry (at_time, ab_list, new_live) =
    (
        if count >= at_time
        then
        (
            Log.testInform Globals.TestTimeout Log.Debug
                (fn()=>TF.L ["Abort expiring, count=",
                             Int.toString count,
                             " live size=",
                             Int.toString(Map.numItems live)
                             ]);

            (* Remove the entry and set all its aborts. *)
            app set_ab ab_list;
            new_live
        )
        else
            (*  Put the entry back into the map. *)
            Map.insert(new_live, at_time, ab_list)
    )


    and set_ab (Abort ivar) = (SyncVar.iPut(ivar, ()))
                                    handle _ => ()

    val new_live = Map.foldli check_entry Map.empty live
in
    State {time=count, live=new_live}
end

Since the red-black map is a pure functional value the expire function has to build a new map at each scan. This won't be a burden since it only happens once a second and the number of entries is not large. During the building the expired lists are simply omitted. The I-variables in the lists are set and then the Abort values are released. If there is still a client connection with a reference to an expired Abort it can poll it or wait on its event. Other Aborts simply become garbage.

Here is the implementation of the API functions.

fun create delay =
let
    fun run() =
    let
        val abort = Abort (SyncVar.iVar())
    in
        CML.send(Mgr.get(), Add(delay, abort));
        abort
    end
in
    MyProfile.timeIt "abort create" run ()
end


fun evt     (Abort ivar) = SyncVar.iGetEvt ivar
fun aborted (Abort ivar) = isSome(SyncVar.iGetPoll ivar)

fun force   (Abort ivar) = (SyncVar.iPut(ivar, ()))
                            handle _ => ()

fun never() = Abort (SyncVar.iVar())

The force function just sets the I-variable directly. If it is already set then this is ignored.

The never value is useful for times when you know there won't be a time-out, for example during the startup of the server. Since it can be forced, every caller must get a distinct value.

The Common Module

This gathers miscellaneous small declarations that are useful through-out the server. The module is normally opened where-ever it is used to avoid qualifying its declarations with a Common. prefix. So I don't want too many declarations which increases the risk of clashing with other identifiers in the server.

The following declarations are involved in aborting the server.

exception FatalX
exception InternalError of string

fun toErr s = (TextIO.output(TextIO.stdErr, s);
               TextIO.flushOut(TextIO.stdErr))

(*  These shutdown the server with the given status code. *)
fun success() = RunCML.shutdown OS.Process.success
fun fail()    = (toErr "Aborting\n"; RunCML.shutdown OS.Process.failure)

The FatalX exception aborts processing in the main thread when some . The InternalError exception is for filling in impossible cases in the code (fingers crossed). The success and fail functions will shutdown the server returning an appropriate process status. In either case the Startup.finish code is run (see the section called The Startup Module).

This module has the declaration of the SrcPos type for describing the location of an error in a configuration file. There is also STRT, the common hash table over string keys.

The FileIO Module

This module contains utility functions that operate on disk files and directories. They are typically wrappers around the SML Posix functions which log errors for the server. The operations include removing and creating a file, getting some file properties such as the size and modification time and doing controlled reading of files and directories.

The last of these are non-trivial. The withTextIn function controls the reading of a text file using the Open File Manager.

fun withTextIn abort file default func =
let
in
    case TextIOReader.openIt abort file of
      NONE   => default               (* the open failed *)
    | SOME h =>
        ((func (TextIOReader.get h))  (* handle an I/O failure with closing *)
            handle x => (Log.logExn x; TextIOReader.closeIt h; default)
        ) before (TextIOReader.closeIt h)
end
handle x => (Log.logExnArg file x; default) 

The calling thread will be blocked until a file descriptor is available. The abort of a connection due to a time-out is detected. Care is taken to ensure that neither I/O errors nor an abort leave the file open. The caller supplies a function func that reads the file via a TextIO.instream. The result from this function will be returned if all goes well. If there is an error then the default value will be returned. The SML General.before function is useful for attaching a clean-up operation like closing a file to an expression.

The listDir function does a similar job but returns a list of the files in a directory, excluding the dot and dot-dot entries. An empty list is returned if there is an error.

fun listDir abort dir =
let
    fun loop strm rslt =
    (
        case OS.FileSys.readDir strm of
          "" => rslt
        | s  => loop strm (s::rslt)
    )
in
    case DirReader.openIt abort dir of
      NONE => []                (* the open failed *)

    | SOME h =>
        (loop (DirReader.get h) []) before (DirReader.closeIt h)
end
handle x => (Log.logExnArg dir x; raise x) 

The exclCreate function for creating lock files is in this module. There is a description of it in the section called The Startup Module.

The Files Module

This module contains simple utilities that manipulate file names and test properties of files. For example there are the dirName, baseName, splitExt and appendFile functions to break up and build file paths. These are simple wrappers for the SML OS.Path functions. They don't need any further description.

The property testing functions are wrappers around the SML Posix.FileSys functions that do a similar job to the Unix stat and access system calls. Here are some of the functions. FS is an abbreviation of Posix.FileSys.

fun exists path = FS.access(path, [])

fun isDir  path = exists path andalso FS.ST.isDir(FS.stat path)
fun isReg  path = exists path andalso FS.ST.isReg(FS.stat path)
fun isSym  path = exists path andalso FS.ST.isLink(FS.stat path)

fun accessibleDir path =
(
    isDir path andalso FS.access(path, [FS.A_READ, FS.A_EXEC])
)

The Log Module

This implements the Logging Manager. It writes time-stamped messages to either standard error or a log file. The messages can have different severity levels in the usual way. The severity threshold level can be set with the LogLevel server configuration parameter (see the section called The Server Parameters in Chapter 8).

The manager is needed to ensure that messages from different threads in the server are logged atomically, that they don't get their fragments interleaved. I also want the server threads to not be held up while logging messages. A thread should be able to send off a message and then immediately continue with its work. The manager implements this by using a CML mailbox to receive messages. A mailbox has unlimited buffering so no send operation will ever block. This might introduce a denial-of-service risk if a client connection can be induced to generate large numbers of errors rapidly. But I think that the risk is miniscule.

Here is the basic logging API. Messages are composed from text fragments (see the section called The Text Module) so that they can be built efficiently.

type Level

val Fatal: Level
val Error: Level
val Warn:  Level
val Info:  Level
val Debug: Level

(*  This writes a logging message.
    Error messages are counted.
*)
val log:        Level -> TextFrag.Text -> unit

(*  This writes a logging message related to some source file position.
    Error messages are counted.
*)
val logP:       Level -> Common.SrcPos -> TextFrag.Text -> unit

The log level argument is tested and the message is discarded if the severity is below the threshold. This can be wasteful if the message will usually not be logged, such as for informational or debugging messages. For these I have a slightly different API.

val inform:     Level -> (unit -> TextFrag.Text) -> unit
val testInform: int -> Level -> (unit -> TextFrag.Text) -> unit

Here the message is represented by a function that generates the text. The function won't be called if the message is not logged so the cost of generating the message is avoided. A typical use of this is the debugging message:

val _ = Log.inform Log.Debug (fn()=>TF.L 
            ["HTTP reading into file len=", Int.toString len])

The integer to string conversion and assembly of the message will not be done unless debugging messages are being logged.

The logging destination and level are controlled by these functions.

(*  This returns the count of error messages seen so far.
*)
val numErrors:  unit -> int

(*  This waits until the log messages have drained.
*)
val flush:      unit -> unit

(*  Set the file for error logging.  This is currently only
    allowed to be set once.
*)
val setLogFile: string -> unit

(*  Set the level for error logging. *)
val setLevel:   Level -> unit

(*  Set the level for error logging to be at least as low as the
    given level.
*)
val lowerLevel:   Level -> unit

All messages with severity of Error or greater are counted. The server startup code calls the Log.numErrors function to see if errors were reported while reading the configuration files. If so then it aborts the server with a fatal error.

The lowerLevel function is used while reading the server configuration files to lower the log level to Warn to ensure that warning messages can be seen. The flush function is needed when changing the logging destination or level.

Another important source of logging messages are exceptions. Here is the API for logging exceptions.

(*  Log any kind of exception.
    This is guaranteed not to raise any exception.
*)
val logExn:  exn -> unit

(*  Log with some extra information e.g. a file name. *)
val logExnArg:  string -> exn -> unit

The logExn function itself must not raise any exceptions otherwise code such as the following from the Listener module will go wrong when the handler fails to complete.

handle x =>
    let
        (*  See also Connect.getPort *)
        val (_, port) = INetSock.fromAddr sock_addr
    in
        (
            Socket.close sock;
            TmpFile.releasePort port
        ) handle _ => ();               (* being paranoid *)
        Log.logExn x;
        CML.send(lchan, ConnDied)
    end

The manager is a singleton object (see the section called The Singleton Module) running a server thread for an internal protocol. Here is the top-level of this server.

datatype LogRequest =
        ReqLog of Level * TF.Text * Time.time
    |   ReqSetFile of string
    |   ReqNumErrors of int Sy.ivar
    |   ReqFlush of unit Sy.ivar        (* flush with acknowledge *)



fun log_server mbox () =
let
    (*  An imperative state will be OK in this small context. *)
    val num_errors: int ref = ref 0
    val log_strm  = ref TextIO.stdErr
    val log_file  = ref ""


    fun loop() =
    let
        fun timeout() = TextIO.flushOut(!log_strm)
    in
        CML.select[
            CML.wrap(Mailbox.recvEvt mbox, handle_msg),
            CML.wrap(CML.timeOutEvt (Time.fromSeconds 1), timeout)
            ];
        loop()
    end

The manager's state consists of the number of errors, and the output stream for logging. (The log file name is not used in the code but might be useful later). I've been a bit lazy and implemented the state using imperative variables. This saves winding the state through all of the code especially as it is rarely updated. Since only the server thread updates them they are safe.

I've included a 1 second time-out in the server. This flushes the log stream to disk so that error messages show up promptly.

Here is the protocol handler.

and handle_msg msg =
(
    case msg of
      ReqLog (level, msg, time) => internal_log level msg time

    | ReqSetFile file =>
        (
            if !log_file = ""
            then
                set_log_file file
            else
                internal_log Error (TF.S
                    "Attempted to set the log file twice")
                    (Time.now())
        )

    | ReqNumErrors rvar => Sy.iPut(rvar, !num_errors)

    | ReqFlush rvar => Sy.iPut(rvar, ())
)

The ReqNumErrors message is a request to return the number of errors. The ReqFlush handshakes with the flush function below to make sure that the mailbox is empty.

structure Logger = Singleton(
                        type input    = LogRequest Mailbox.mbox
                        val  newInput = Mailbox.mailbox
                        val  object   = log_server
                        )

fun send_request req = Mailbox.send (Logger.get(), req)
fun flush() =
let
    val rvar = Sy.iVar()
in
    send_request (ReqFlush rvar);
    Sy.iGet rvar
end

The CML.recv call in flush will block until the server thread responds to the ReqFlush message. All preceding messages in the mailbox must have been processed at this point.

Here is the function that actually prints the message.

and internal_log level msg when =
let
    fun put s = TextIO.output(!log_strm, s)

    val date  = Date.fromTimeLocal(when)
    val fdate = Date.fmt "%Y %b %d %H:%M:%S" date
in
    put(concat[fdate, " ", formatLevel level, ": "]);
    TF.appPrefix "\t" put msg;
    put "\n";
    update_counts level
end

I add a simple time-stamp to each message. The appPrefix call applies the put function to each fragment of the message but putting a tab before each subsequent line. This lays out multi-line messages nicely as long as the TextFrag line-breaking is used properly.

Here is the implementation of the basic log API.

fun log level msg = 
(
    if level >= (!log_level) orelse level = Fatal (* speed up check *)
    then
        send_request (ReqLog (level, msg, Time.now()))
    else
        ();

    if level = Fatal
    then
    (
        flush();
        Common.toErr(concat[formatLevel level, ": ",
                            TF.toString TF.UseLf msg, "\n"]);
        Common.fail()               (* abandon execution *)
    )
    else
        ()
)

There is a potential race condition in that the log_level variable could be set by more than one thread calling setLevel at the same time. But in practice the level is only set at configuration time when only one thread is running.

Here is the implementation of the exception logging API.

fun logSysErr arg (msg: string, _: OS.syserror option) = error [arg, " ", msg]

fun logExn x = logExnArg "" x

and logExnArg arg x =
(
    case x of
      OS.SysErr x        => logSysErr arg x
    | IO.Io {cause, ...} => logExnArg arg cause

    | InternalError msg  => log Fatal (TF.L ["Internal Error: ", msg])
    | FatalX             => log Fatal (TF.S "Fatal Error")

    | x                  => log_any x
)
handle _ => ()              (* don't raise any more exceptions *)

and log_any x = log Error (TF.L [exnName x, ": ", exnMessage x])

This formats all kinds of exceptions. The system and I/O error exceptions are the most likely and they come with extra detail. As a fall-back I can always report the name of the exception with General.exnName.

The Mutex Module

The Singleton pattern (see the section called The Singleton Module) relies on having a static variable that holds a handle to the singleton object. This is updated with the handle the first time that it is accessed. Since it can be accessed by any number of threads I have the classic race-condition problem. I need some sort of mutual exclusion (mutex) to control access to these static variables. The Mutex module implements mutexes using an M-variable as described in the section called Semaphores via Synchronous Variables in Chapter 6. Here is the API for the module.

signature MUTEX =
sig
    type Mutex

    (*  Mutex values can be saved over an exportML so you can
        "statically" create them.
    *)
    val create:     unit -> Mutex

    (*  This runs the function in the critical section.
        This will work with CML running or not.
    *)
    val lock:       Mutex -> (unit -> 'a) -> 'a
end

A mutex can be created as a top-level value in a module. This can be saved in an exported heap without any trouble. The lock function is passed a job function that typically updates some static variable. Here is the implementation.

structure Mutex: MUTEX =
struct
    structure SV = SyncVar

    type Mutex = bool SV.mvar


    fun create() = SV.mVarInit true


    fun lock mutex func =
    (
        SV.mTake mutex;
        let
            val r = func()
        in
            SV.mPut(mutex, true);
            r
        end
        handle x => (
            SV.mPut(mutex, true);
            raise x
            )
    )

end

The MyProfile Module

This module implements two utilities. The first is a simple run-time timer to measure how many microseconds it takes to run a function. I use this to get some idea of how long the server spends performing each step in returning a page. On my Linux ix86 machine the timer has a resolution of 1 microsecond. It can time functions down to around 5 microseconds with reasonably reliable results.

The second utility is some code for profiling. The standard profiling code described in the section called Execution Time Profiling in Chapter 4 will not link with CML since it uses code within the SML/NJ compiler that is linked with the non-CML TextIO module. I have repeated the profiling report function here with some simplifications. It can produce a profiling report on stdout when the server has been compiled with profiling.

The Open File Manager

This module implements the resource management for file descriptors that is described in the section called System Resource Management in Chapter 8. If a connection cannot get enough file descriptors then I could either abort the connection or make it wait until more are available. Aborting is a bit crude. With the concurrent design I should be able to get the parts of the server to cooperate better than that.

The Open Manager is a central place where the usage of file descriptors is counted. Before a connection attempts to open files it must request the Open Manager to allocate it the number of descriptors that it will need. If there are not enough free then the connection must wait. The connection will be put onto a queue. When another connection closes its files the descriptors will be returned to the Open Manager. The Manager will pass them onto a waiting connection. The waiting connections are dealt with in first-in-first-out order for fairness.

If a connection is aborted or fails with an internal exception then there is a risk that files will be left open and the server will "leak" descriptors and eventually grind to a halt. I already rely on the SML garbage collector to clean up a connection if there is a time-out or another abort condition. I want the open files to be cleaned up as well. This is an application of finalisation as described in the section called Weak Pointers in Chapter 4.

I don't want to rely on finalisation to close files during normal operation. This would leave files open unnecessarily until the next garbage collection. The Open Manager must allow files to be opened and closed normally but also detect when an open file has become garbage and close the file and make the file descriptor available for reuse.

A socket for an incoming connection is a special case. Its file descriptor is created by the operating system. The best that the server can do is make sure that it is counted by the Open Manager after the connection is established.

Being Generic

The Open Manager must be able to deal with all of the different kinds of objects that require file descriptors. These include sockets, regular files (binary and text), directories and pipes to communicate with child processes. Each of these kinds has a different way of opening and closing.

I want the Manager to be extensible. It should be easy to add new kinds of file objects. This is something that object-oriented languages do well. In an O-O language I would define an abstract base class for a file object and sub-class it for each kind. The language would dynamically dispatch an open() method on a file object depending on the actual kind of the resource.

Unfortunately SML does not provide any form of dynamic dispatch. I could revert to a variant-record structure with a datatype like

datatype Object =
        Regular of ...
    |   Directory of ...
    ...

But then I would have to have case expressions all over the place where the file object is handled. This is very messy and hard to extend. This is the very problem that the object-oriented paradigm is designed to solve.

An alternative structure is to divide the Manager along the boundaries between kinds. I will create separate managers specialised to each kind of file object. These managers must cooperate to keep a count of the open files so there is yet another Open Counter manager that just counts the files. It does not need to know anything about operating on the files.

The generic manager, OpenMgrFn, is a functor specialised to its kind of file with an implementation module. The resulting kind-specific manager needs its own specialised finalisation manager, Fin, since the finalisation needs to know how to close any open files that it finds.

The resulting class diagram is shown in Figure 9-2.

Figure 9-2. The Open Manager Objects.

The generic functor is organised like this

functor OpenMgrFn(
    structure Impl: OPEN_MGR_IMPL
    ) : OPEN_MGR =
struct
    open Common
    structure TF = TextFrag

    structure Ctr  = OpenCounter
    structure Impl = Impl

    structure Fin = FinaliseFn(
                        structure Type =
                        struct
                            type T = Impl.Opened * Ctr.Allocation
                            fun finalise (opn, _) = ignore(Impl.closeIt opn)
                            val name = Impl.name
                        end)


    type Arg    = Impl.Arg
    type Opened = Impl.Opened
    type Closed = Impl.Closed
    type Holder = Fin.Holder

It takes an implementation structure parameter called Impl. The functor builds a specialised finaliser that it calls Fin.

An implementation conforms to the following signature.

signature OPEN_MGR_IMPL =
sig
    val     name:   string

    type    Arg
    type    Opened
    type    Closed

    (*  This is the number of file descriptors that are needed
        by the open.
    *)
    val num_fds:    int


    datatype Result = 
            Success of Opened
        |   Fail                (* give up totally *)
        |   Retry               (* should try again later *)

    val openIt:     Arg -> Result
    val closeIt:    Opened -> Closed

end

The openIt function must open the object and (if successful) return a type Opened which represents the opened object. The Result type is used by the handshaking protocol with the Open Counter manager which is described in the section called Opening a File.

Finally the Open Manager produces this signature.

signature OPEN_MGR =
sig
    structure Impl: OPEN_MGR_IMPL

    (*  This describes what can be opened or closed. *)
    type    Arg = Impl.Arg

    (*  This represents an open object. *)
    type    Opened = Impl.Opened

    (*  This is the type returned from a close operation. *)
    type    Closed = Impl.Closed

    (*  This is a holder for the object.  The object will be
        finalised if the caller loses its reference to the
        object.
    *)
    type    Holder

    val get:    Holder -> Opened

    (*  Open/close the object.
        This will return NONE if the open failed or was aborted.
    *)
    val openIt:     Abort.Abort -> Arg -> Holder option
    val openIt':    Arg -> Holder option
    val closeIt:    Holder -> Closed
end

The design of these managers depends on the signature constraints being transparent. Transparent means that information about the implementation of a type is known by the compiler and allowed to propagate through the various modules. The ExecReader module relies on this. If you follow through the declarations you find that the following types are identical.

ExecReader.Opened = ExecReader.Impl.Opened = Unix.proc * string

The code for the CGI Node Handler in the section called The CGI Node Handler can extract the Unix.proc value to manipulate the process by writing

val (proc, _) = ExecReader.get holder

The opposite of transparent is an opaque signature constraint which is indicate by using ":>" instead of ":" before the signature name. With an opaque the implementation of the Opened type would be hidden since only the name is declared in the OPEN_MGR_IMPL signature.

Finalisation

Finalisation is done by maintaining a collection of weak references to each open file. To make this work I have to have one value that is shared between the finalisation manager and a client. The manager keeps a weak reference to this value and the client has one or more normal (strong) references. When all of the strong references have gone the value will be collected and then the weak reference will report that the value is gone. This will be a trigger for the manager to close the file.

Note that the shared value is not the open file. The manager must still have a reference to the file after the shared value has been collected. The client must be careful not to hold a reference to the open file without also having one to the shared value. To make this safer the client will only be able to get to the open file from the shared value.

A finalisation manager has the following signature. The Holder type will be chosen to ensure that it is always copied by reference.

signature FINALISER =
sig
    (*  This is the value that is shared between the client 
        and the manager.
    *)
    type Holder

    (*  This is the value in the holder that will be finalised.  *)
    type T

    val get:    Holder -> T

    (*  This adds a new T to the manager.  *)
    val add:    T -> Holder

    val remove: Holder -> unit
end

The generic code for the manager is in a functor which takes the details about the type of the finalised value as a parameter.

signature FINALISE_TYPE =
sig
    type T

    val finalise:   T -> unit
    val name:       string
end
functor FinaliseFn(
    structure Type: FINALISE_TYPE
    ): FINALISER =
struct

A manager is a concurrent object with a simple list of weak references as its state. It takes messages to add and remove values from its list. Its message protocol is

datatype Req = 
        ReqAdd of T * Holder Sy.ivar
    |   ReqRemove of Holder

(*  When the holder is collected we should have the last
    strong ref to T which we finalise.
*)
type Wref  = int * T * (Holder W.weak)

(*  This requires a linear scan of all held objects which
    shouldn't be a performance problem since GCs are
    infrequent.
*)
type State = int * Wref list

The add message creates the holder. The holder contains a pair of an integer key and the value to be finalised. The integer key allows holders to be identified since we can't assume that the value supports the equality operator. The key is also applied to the weak references.

type    T = Type.T

type Pair = int * T

(*  We use a ref on the Pairs to ensure that they are 
    copied by reference.
*)
type Holder = Pair ref

The manager must receive a signal telling it when the garbage collection has been done. This is received as a message from the signal manager (see the section called The Signal Manager).

fun server chan () =
let
    val gc_port = SignalMgr.mkGcPort()

    fun loop state =
    (
        loop(CML.select[
            CML.wrap(CML.recvEvt chan, handle_msg state),
            CML.wrap(SignalMgr.gcEvt gc_port, finalise state)
            ])
    )
in
    loop(0, [])
end

The finalise function scans the weak references and tests which ones to keep.

and finalise (state as (tag_cnt, wrefs)) () : State =
let
    val _ = Log.testInform G.TestFinalise Log.Debug
                (fn()=>TF.L ["Finaliser ", Type.name, ": finalising"])

    (*  Test if this wref should be kept or finalised.
    *)
    fun keep (_, value, wref) =
    (
        case W.strong wref of
          NONE   => (Type.finalise value; false)
        | SOME _ => true
    )
in
    (tag_cnt, List.filter keep wrefs)
end

Opening a File

Opening a file requires a sequence of steps involving handshaking between the Open Counter and the Open Manager. The goal is to ensure that there is no chance of a file descriptor being lost due to some error while opening the file.

First here is the signature for the Open Counter manager.

signature OPEN_COUNTER =
sig

    (*  This represents some number of file descriptors. It ensures
        that a release matches the allocation.
    *)
    type Allocation

    datatype Response =
            Success
        |   Fail of Allocation
        |   Retry of Allocation

    (*  Return the response on the supplied channel. *)
    type Start = Allocation * Response CML.chan

    (*  Pass in a channel to receive the start message. *)
    val request:   (int * Start CML.chan) -> unit

    (*  Release n file descriptors. *)
    val release:    Allocation -> unit

    (*  Return the number open and the number pending. *)
    val stats:      unit -> int * int
end

An Allocation value represents some number of open files. It provides some protection against programming errors by ensuring that the client can only return exactly the same number that it allocated.

The file allocation starts with a call from the openIt function of the Open Manager to the request function of the Open Counter. The Open Counter will, either immediately or some time later, start a handshake with the Open Manager using the channel in the request. The Start type is the message passed to the Open Manager to start the handshake. The Open Manager attempts to open the files and responds with a Response type. There is provision for retrying an open that fails due to an unexpected lack of file descriptors. In this case the Open Manager will go to the end of the queue to wait for more descriptors to become available. Figure 9-3 shows the sequence of the handshake for the Success and Retry cases. Fin is the specialised finaliser within the Open Manager.

Figure 9-3. The File Opening Handshake.

A consequence of this design is that only one open operation can occur at a time. The Open Counter runs the handshaking sequentially. This shouldn't be a problem since file opens, even the forking of CGI-BIN scripts, are quick. If this sequential processing proves to be a problem the Open Manager could be changed to run concurrent handshaking.

Here is the code for the openIt function.

fun openIt abort arg =
let
    val schan = CML.channel()

    (*  We may have to try several times.

        To be safe from deadlock there must be no possibility
        of an exception preventing the state transitions from
        completing. Otherwise the counter will block forever.

        So when we abort we must leave a thread behind to finish
        the handshaking. Trying to remove the pending request from
        the counter risks race conditions.
    *)
    fun try() =
    let
        fun got_alloc (alloc, rchan) =
        (
            case Impl.openIt arg of
              Impl.Success opn => (CML.send(rchan, Ctr.Success);
                                   SOME (opn, alloc))

            | Impl.Fail =>  (CML.send(rchan, Ctr.Fail alloc); NONE)

            | Impl.Retry => (CML.send(rchan, Ctr.Retry alloc); try())
        )
        handle _ => (CML.send(rchan, Ctr.Fail alloc); NONE)


        fun got_abort() =
        let
            fun dummy() =
            let
                val (alloc, rchan) = CML.recv schan
            in
                CML.send(rchan, Ctr.Fail alloc)
            end
        in
            CML.spawn dummy;
            NONE
        end
    in
        CML.select[
            CML.wrap(CML.recvEvt schan, got_alloc),
            CML.wrap(Abort.evt abort, got_abort)
            ]
    end
in
    (*  Start trying *)
    Ctr.request (Impl.num_fds, schan);

    (*  Once opened, set up a finaliser on the Opened value. *)
    case try() of
      NONE     => NONE
    | SOME farg => SOME(Fin.add farg)
end

A Specialised Open Manager

Here is the code the creates TextIOReader, an Open Manager specialised to reading text files using the TextIO module (which has the handy inputLine function).

local
    structure E  = Posix.Error
    structure TF = TextFrag

    structure Impl =
    struct
        val     name = "TextIOReader"
        type    Arg = string
        type    Opened = TextIO.instream
        type    Closed = unit

        val     num_fds = 1

        datatype Result = 
                Success of Opened
            |   Fail
            |   Retry


        fun openIt file =
        (
            Success(TextIO.openIn file)
        )
        handle
          x as IO.Io {cause = OS.SysErr (_, SOME err), ...} =>
        (
            if err = E.mfile orelse err = E.nfile
            then
                Retry
            else
            (
                Log.logExn x;   (* a real error *)
                Fail
            )
        )
        | x => (Log.logExn x; Fail)
        

        fun closeIt strm =
        (
            TextIO.closeIn strm
        )
        handle x => Log.logExn x

    end
in
    structure TextIOReader = OpenMgrFn(structure Impl = Impl)
end

This code appears at the top module level. It defines two modules and an alias E for the Posix.Error module and the usual TF alias. The E, TF and Impl are made private to the TextIOReader module using local...end syntax.

The Impl module has to include all of the declarations in the OPEN_MGR_IMPL signature, including the Result type, which varies with the Opened type. What it opens is a file path so Arg is a string. The opened value is an input text stream. This only requires one file descriptor. The openIt function checks for the EMFILE and ENFILE error codes which indicate that no file descriptor is available. The open will be retried later in this case.

Since the web server is going to be forking/exec-ing CGI-BIN scripts it should be setting the close-on-exec flag on most of the files that it opens. It would be a security breach to let scripts inherit internal files, sockets, etc. Unfortunately there is no mechanism in SML/NJ to operate on a TextIO or BinIO stream at the file descriptor level. I do do it for sockets though.

The Signal Manager

The web server should be catching signals so that it can clean up temporary files and such when it is killed. A CML thread can't be interrupted but I can broadcast an interrupt message to all interested threads. But this is rather awkward to handle. Each thread would have to be listening for an interrupt at each place where it may block for a while. At the moment all that I do is terminate the server by calling the common fail() function. This shuts down CML and I have registered a shutdown handler in the Main module. This handler can clean up for the server. See the section called The Startup Module.

I also need to distribute a signal indicating when a garbage collection has been done. This is used by finalisation code, for example see the section called Finalisation.

Signal handling is provided by the Signals module, see the section called Signals in Chapter 4. The GC signal is broadcast to the server using the Multicast module of the CML library. To use this you create a channel to carry a message stream and any thread wishing to receive these messages creates a port which listens to that channel. The signature for the signal manager is

signature SIGNAL_MGR =
sig

    (*  Each client must have its own port. *)
    type GcPort
    type IntPort

    datatype Interrupt = SIGINT | SIGTERM

    (*  This sets up the signal handling. *)
    val init:   unit -> unit


    (*  Create a new client port.
    *)
    val mkGcPort:   unit -> GcPort
    val mkIntPort:  unit -> IntPort


    (*  This creates an event for the arrival of the
        next GC signal. Call it anew for each GC.
    *)
    val gcEvt:  GcPort -> unit CML.event


    (*  This creates an event for the arrival of the
        next interrupting signal.
    *)
    val intEvt: IntPort -> Interrupt CML.event

end

GC messages don't carry any information so I just use the unit type. They are delivered to a GcPort. The mkGcPort function creates a new port to receive a GC message. The gcEvt function returns an event that a thread can select on. The code for handling the GC signal is

type GcPort  = unit Multicast.port
type IntPort = Interrupt Multicast.port

val gc_mchan: unit Multicast.mchan option ref = ref NONE
val int_mchan: Interrupt Multicast.mchan option ref = ref NONE


fun init() =
(
    gc_mchan  := SOME(Multicast.mChannel());
    int_mchan := SOME(Multicast.mChannel());

    Sig.setHandler(Sig.sigGC, Sig.HANDLER gc_handler);
    Sig.setHandler(Sig.sigINT, Sig.HANDLER int_handler);
    Sig.setHandler(Sig.sigTERM, Sig.HANDLER int_handler);

    (*  We'd better catch this for when writing to sockets. *)
    let
        val s = valOf(Sig.fromString "PIPE")
    in
        Sig.setHandler(s, Sig.HANDLER pipe_handler)
    end;
    ()
)


and gc_handler(_, _, kont) =
(
    Log.testInform Globals.TestTiming Log.Debug
        (fn()=>TF.S "GC signalled");
    Multicast.multicast(valOf(!gc_mchan), ());
    kont
)

The GC signal handler just broadcasts a message and continues the server.

The channels can't be set up until the server is running with the CML library. So an init function is required to set up the channels. This must be called at the very beginning of the server startup since the open file manager (the section called The Open File Manager) requires it before any files can be opened. The test harnesses must do the same.

The Singleton Module

This module is a simple encapsulation of the start-up of a thread that implements a singleton concurrent object. The object is represented by a CML channel or mailbox that receives the messages of its API.

The module is a functor that is specialised by the type of the input channel, a function to create the channel and the function that runs in the thread. Here is the complete code for the module.

functor Singleton (
    type input
    val  newInput: unit -> input
    val  object: input -> unit -> unit
    )
    : SINGLETON =
struct
    structure SV = SyncVar

    type input = input

    val input: input option ref = ref NONE

    (*  An initialised mvar can be saved over an exportML.
        The value it contains is the baton, like a binary semaphore.
    *)
    val mutex = Mutex.create()


    (*  The double-checked locking will be safe in CML since it
        isn't really multi-tasking or SMP (cf Java).
    *)
    fun get() =
    (
        case !input of
          NONE =>
            let
                fun init() =
                (
                    case !input of
                      NONE =>
                        let
                            val i = newInput()
                        in
                            input := SOME i;
                            ignore(CML.spawn (object i));
                            i
                        end
                    | SOME i => i
                )
            in
                Mutex.lock mutex init
            end

        | SOME i => i
    )
end

The module provides a single function called get which returns the channel to the object. A thread for the object is spawned the first time that the get function is called.

The channel is stored in a static variable so its update must be synchronised to protect against more than one thread calling the get function at the same time. See the section called The Mutex Module.

Here is an example of the use of this module. This is taken from the section called The Log Module.

structure Logger = Singleton(
                        type input    = LogRequest Mailbox.mbox
                        val  newInput = Mailbox.mailbox
                        val  object   = log_server
                        )

fun send_request req = Mailbox.send (Logger.get(), req)

The Text Module

A common operation in the server is constructing text messages. This varies from constructing the headers of HTTP responses through to constructing error messages for logging. In a traditional language like C a programmer typically assembles a message into a buffer by copying text fragments. This is copy-by-value for the fragments. If you try to be more efficient and do copy-by-reference for the fragments in C you can easily end up with slower code. This is because you will probably end up calling malloc a few times and the overhead will probably outweigh the cost of copying most strings. The memory management issues make it worse. Strings are often copied just to isolate the various domains of ownership of memory and also to protect against strings being modified.

The low-overhead memory allocation of SML/NJ changes the balance in favour of copy-by-reference. A list of strings is a list of pointers to strings and can usually be constructed faster than the strings can be copied and of course memory management is not an issue.

The TextFrag module implements a data structure that represents a string as an aggregate of string fragments[1]. It is also independent of the different line termination conventions, a LF or a CR-LF. Here is the Text type.

datatype Text =
        Empty
    |   Nl                (* a line break, perhaps CRLF *)
    |   WS                (* exactly one blank character *)
    |   S of string
    |   L of string list  (* concatenation of some strings *)
    |   C of Text list    (* concatenation of texts *)

In the following description I abbreviate the module name to TF (as I do in the server code). The TF.S constructor is the simplest case that represents a single string. The string should not contain any new-line character. Use the TF.Nl constructor to separate lines. This will be substituted later with whatever line-termination convention you want. The TF.C constructor combines fragments together. For example here are two lines of text.

val foxes =
    TF.C [TF.S "The quick brown fox", TF.Nl,
          TF.S "jumps over the lazy dog"]

The TF.WS constructor is available as a useful separator when joining fragments. The TF.L constructor handles the common case of concatenating multiple strings.

Here is the main part of the API.

datatype LineSep = UseLf | UseCrLf

(*  This applies the function to each string piece. The function
    could be print() for example.
*)
val apply:  LineSep -> (string -> unit) -> Text -> unit

(*  Calculate the length in characters of the text.
*)
val length: LineSep -> Text -> int

(*  This is like apply but it prints the prefix before
    each subsequent line.
*)
val appPrefix:  string -> (string -> unit) -> Text -> unit

(*  Produce the string that the Text corresponds to.
*)
val toString:  LineSep -> Text -> string

The apply function can be used to print a text fragment. For example to print the foxes fragment above to the standard output:

TF.apply TF.UseLf print foxes

The appPrefix is similar but inserts a prefix string before subsequent lines and only terminates with LF. I use this for indenting continuation lines in error messages. The toString function concatenates all of the text fragments by copying, which you sometimes may have to do.

I'll just show the implementation of the apply API to show you how it works. The other functions are similar. Note that app is the standard SML List.app function.

(*  crlf is the string to apply in place of Nl. *)
fun applyP crlf func Empty   = ()
|   applyP crlf func Nl      = func crlf
|   applyP crlf func WS      = func " "
|   applyP crlf func (S s)   = func s
|   applyP crlf func (L ss)  = app func ss
|   applyP crlf func (C lst) = app (applyP crlf func) lst


fun lsep UseLf   = "\n"
|   lsep UseCrLf = "\r\n"

fun apply sep func text = applyP (lsep sep) func text

fun appPrefix prefix func text = applyP ("\n" ^ prefix) func text

A complex example of text fragments can be found in the directory fancy indexing code that is described in the section called The Directory Node Handler.

The TmpFile Module

In the section called System Resource Management in Chapter 8 I described how the server must manage the amount of disk space used by temporary files. If there is insufficient disk space for a connection to save an incoming entity then the connection must be blocked until the space is available or the connection times-out. If a connection is aborted then any temporary files that belong to it must be deleted.

At the moment the only use I have for temporary files is for saving incoming entities in HTTP requests. I label the body file that is associated with a connection with the port number for the connection. Then when a connection is closed it can be easily found and deleted. I don't use the finalisation facility of the section called Finalisation since I want disk files to be cleaned up as soon as possible and I can rely on the connection code to catch all error or abort conditions for the connection.

Here is the API for the TmpFile Manager.

type TmpFile

(*  Allocate a new file for the given port number.
    If we give up trying to create the file then we return NONE.
*)
val newBodyFile: Abort.Abort -> string -> int -> int -> TmpFile option

(*  Get the file name. *)
val getName:        TmpFile -> string

(*  This releases the files associated with the port number.
    They will be deleted.
*)
val releasePort:    int -> unit


(*  This sets the temp file disk space limit. It must be
    called before any temp files are created, preferably
    from the config.  The size is in bytes.
    The limit must be no larger than 10^9 bytes.
*)
val setDiskLimit:   int -> unit

The TmpFile type represents the allocation of a name and disk space for the file. The caller must still write the data into the file. The newBodyFile function allocates a TmpFile for saving the entity body from a HTTP request. The arguments are the abort condition, temporary directory name, file length and port number. Only one body file should be allocated with the same port number.

The getName function will return the full path of the file for the caller to write to. The releasePort function deletes all files that are labelled with the given port number. The caller should not retain any TmpFile values with that port number after doing this.

The setDiskLimit function sets the number of bytes that are available for sharing out to the temporary files. I assume that all files will be in the same temporary directory or if not then they are all in the same file system. The file system must not be mounted over NFS since files are created with exclusive locking. This is specified in the section called The Server Parameters in Chapter 8.

The manager is a singleton concurrent object. Here are the types in the module.

datatype TmpFile = TmpFile of {
        id:         int,
        port:       int,
        file:       string,         (* absolute file path *)
        len:        int             (* a size estimate *)
        }

and AllocRequest =
        Record of Pending
    |   Release of int              (* release all files on the port *)
    |   Undo of TmpFile             (* undo one allocation *)


and Reply = Success of TmpFile | Reject

(*  This state could allow multiple files on the
    same port number.  It must be pure to allow
    recursion through allocate().
*)
and State = State of {
        tmps:       TmpFile list,   (* successfully allocated *)
        pending:    Pending list,
        used:       int,            (* disk space used in bytes *)
        id_cnt:     int,            (* to allocate ids *)
        last_warn:  Time.time option(* time of the last warning *)
        }

(*  A pending request has the port, file and length
    and a reply channel.
*)
withtype Pending = int * string * int * Reply Sy.ivar

The TmpFile value includes a unique identifier for simple equality testing. AllocRequest and Reply make up the protocol for the Manager. The Undo request is for cleaning up if a request is aborted. The Manager's state includes a list of all allocated TmpFile values, all pending requests and counters for disk space and identifiers. The Manager will log a warning if it runs short on disk space. These warnings are limited to 1 per second to avoid flooding the log. The last_warn field records the time when the last warning was given.

The disk space limit is recorded in a static variable.

val disk_limit = ref (valOf Int.maxInt)

fun setDiskLimit n = (disk_limit := n)

This is set once when the configuration file is read (see the section called The Config Module - Interface) so I don't have to worry about concurrent access to the variable.

Here is the allocate function that implements the Record request from the newBodyFile function.

and allocate
        (state as State {tmps, pending, used, id_cnt, last_warn}) 
        (pend as (port, file, len, rvar))
        =
let
    val _ = Log.testInform Globals.TestTmpFile Log.Debug
                (fn()=>TF.L ["TmpFile allocate file ", file])
in
    if used + len <= !disk_limit
    then
        let
            val tmp = TmpFile {
                        id      = id_cnt,
                        port    = port,
                        file    = file,
                        len     = len
                        }
        in
            Sy.iPut(rvar, Success tmp);

            State {
                tmps    = tmp::tmps,
                pending = pending,
                used    = used + len,
                id_cnt  = id_cnt + 1,
                last_warn = last_warn
                }
        end
    else
        let
            val now = Time.now()
        in
            if last_warn = NONE orelse
               Time.toMilliseconds(Time.-(now, valOf(last_warn)))
                   >= 1000
            then
                Log.log Log.Warn
                    (TF.S "TmpFile: Tmp disk space limit exceeded")
            else
                ();

            State {
                tmps    = tmps,
                pending = pend::pending,
                used    = used,
                id_cnt  = id_cnt,
                last_warn = SOME now
                }
        end
end
handle _ => (* e.g. integer overflow *)
    (
        Sy.iPut(rvar, Reject);
        Log.testInform Globals.TestTmpFile Log.Debug (fn() => TF.L [
            "TmpFile allocation error on port ", Int.toString port]);
        state
    )

This just does a simple check of the requested space against the amount available. If there is enough it sends a Success reply. If there isn't enough space then the request is just added to the pending list and a warning may be logged.

I must be careful not to let an exception abort the Manager. The (warn level) logging and FileIO.removeFile functions do not raise exceptions. But I have to watch out for integer overflow with the large numbers that file sizes might be.

The release and undo functions implement the remaining two requests to the Manager.

(*  Remove all those files on the port. *)
and release state the_port =
let
    fun keep (TmpFile {port, ...}) = (the_port <> port)
in
    remove state keep
end



(*  Remove based on the id. *)
and undo state tmp =
let
    val TmpFile {id = tmp_id, ...} = tmp
    fun keep (TmpFile {id, ...}) = (tmp_id <> id)
in
    remove state keep
end

These are just two different ways to remove files. The first removes files on a given port. The second removes a particular file based on its unique id. The common code is in the remove function. The different behaviour is represented by the keep functions which decides which files to keep.

and remove 
        (state as State {tmps, pending, used, id_cnt, last_warn}) 
        keep
        =
let
    val _ = Log.testInform Globals.TestTmpFile Log.Debug
                (fn()=>TF.S "TmpFile removing")

    (*  First remove files.  Calculate the new used space.
        The pending list is separated out.
    *)
    fun filter [] new_tmps new_used =
    (
        State {
            tmps    = new_tmps,
            pending = [],
            used    = new_used,
            id_cnt  = id_cnt,
            last_warn = last_warn
            }
    )
    |   filter (tmp::rest) new_tmps new_used =
    (
        if keep tmp
        then
            filter rest (tmp::new_tmps) new_used
        else
            let   (* This raises nothing. *)
                val TmpFile {file, len, ...} = tmp
            in
                FileIO.removeFile file;
                filter rest new_tmps (new_used - len)
            end
    )


    (*  Retry all of the pending requests. Any that can't
        be satisfied will end up in the pending list again.
        We use a FIFO order for rerunning the requests.
    *)
    fun retry []        new_state = new_state
    |   retry (p::rest) new_state = retry rest (allocate new_state p)

    val filtered_state = filter tmps [] used
    val final_state = retry (rev pending) filtered_state
in
    final_state
end

The keep function is used to filter the TmpFile values that the Manager has retained. If a file is deleted then the amount of used space is reduced by the file's length. There may then be space for some of the pending requests to be satisfied. The simplest way to handle this is to rerun all of the pending requests by feeding them to the allocate function again. If there are still pending requests that can't be satisfied then they will end up back in the pending list in the state. Note that since the pending requests are pushed onto the front of the list but I want to serve them in first-come-first-served order I need to reverse the pending list before rerunning it. Since there should be few pending requests if any in normal operation this design will be efficient enough.

The processing of the newBodyFile function has two parts. The first part is to allocate a file name. I choose names of the form "portnnnn" where "nnnn" is the port number. If there is a name clash then subsequent names of the form "portnnnn_1" etc. are tried. Here is the code for the first part.

fun newBodyFile abort tmp_dir len port =
let
    (*  If we get a name clash then add _ suffixes.
    *)
    fun try n =
    let
        val base = concat[
                    "port", Int.toString port, 
                    if n = 0 then "" else ("_" ^ (Int.toString n))
                    ]

        val file = Files.appendFile tmp_dir base

        val () = Log.testInform Globals.TestTmpFile Log.Debug
                    (fn() => TF.L ["newBodyFile trying ", file]);
    in
        if FileIO.exclCreate file
        then
            allocate port file len
        else
        (
            if n > 10
            then
            (
                Log.error ["File name clashes for file ", file];
                NONE                (* give up *)
            )
            else
                try (n+1)
        )
    end

The try function makes around 10 attempts to create the file. The FileIO.exclCreate function ensures that the name is exclusively allocated to the server. There may be left-over files using the same port if a previous run of the server crashed.

The second part is the reservation of disk space. This is done with the following allocate function in newBodyFile.

and allocate port file len =
let
    val rvar = Sy.iVar()


    fun got_reply (Success tmp) = SOME tmp
    |   got_reply Reject        = (FileIO.removeFile file; NONE)


    and got_abort() = (CML.spawn dummy; NONE)


    (*  Run this to catch left-over allocation requests. *)
    and dummy() =
    (
        case Sy.iGet rvar of
          Success tmp => CML.send(Alloc.get(), Undo tmp)
        | Reject      => ()
    )
in
    CML.send(Alloc.get(), Record(port, file, len, rvar));

    CML.select[
        CML.wrap(Sy.iGetEvt rvar, got_reply),
        CML.wrap(Abort.evt abort, got_abort)
        ]
end

The function sends a Record message to the Manager to record the file's information. Then it waits for the reply or an abort condition. If a Success reply is returned then the TmpFile value is returned to the caller. If a Reject reply is returned then there has been some kind of error. The file is removed and NONE is returned to the caller.

If there is an abort condition then I need to abort the request somehow. It would be tricky to try to extract the request out of the Manager without race conditions. The CML withNack function could be used to inform the manager about the loss of the client. But for simplicity I like to leave the request with the Manager and instead leave behind a dummy receiver in a new thread that will immediately undo a successful allocation. The Undo message will remove the file. Assuming that aborts are rare this will be efficient enough.

The URL Module

This module implements a parsed representation for a URL. For details on the URL syntax see the section called URL Syntax in Chapter 8.

datatype URL = HTTP_URL of {
    host:       string option,
    port:       int option,
    userinfo:   string option,      (* user name/password *)
    path:       URLPath,
    query:      string option,      (* in the undecoded form *)
    fragment:   string option       (* # suffix, undecoded *)
    }

and URLPath = URLPath of {
    segs:       Segment list,
    absolute:   bool
    }

and Segment = Segment of {
    part:       string,
    params:     string list
    }

All of the parts in the URL are separated into fields in the URL type. I only support HTTP URLs at the moment. The datatype allows room for expansion. All of the HTTP URL syntax is supported but this is more general than the server actually needs. For example the syntax allows parameters to be attached to each segment in the URL path but you would only expect to find parameters on the last segment in practice. The server only ever expects to encounter absolute URL paths and never fragments.

I will only describe the API. The code is a lot of long-winded string manipulation to break the URL at all the different kinds of delimiters. Here is the signature for the API.

exception BadURL of string          (* with a reason *)

val emptyURL:   URL


(*  This parses a general URL.  It raises BadURL if the syntax
    is not parsable as a HTTP URL.
*)
val parseURL:   string -> URL


(*  This parses just the path part of a URL, excluding the fragment.
    It raises BadURL if the syntax is not parsable.
*)
val parseURLPath:   string -> URLPath


(*  This parses just a simple path which contains no parameters.
    It raises BadURL if the syntax is not parsable.
*)
val parseSimplePath:   string -> URLPath


(*  This tests if two paths match, ignoring parameters. *)
val samePath:   URLPath -> URLPath -> bool

(*  This splits a path into a prefix of part names and a final name.
    E.g. /a/b/c becomes SOME ([a,b], c) and / becomes NONE
*)
val splitPath:  URLPath -> (URLPath * string) option

(*  Convert back to a valid URL in string form.

    This introduces escapes etc.  For now we only escape the
    "reserved" character class.  We could also escape the mark
    characters for safety.  Netscape does.

*)
val URLToString:    URL     -> string
val pathToString:   URLPath -> string

(*  This removes the % URL escapes from the string.
    Bad escapes are passed through unchanged.
*)
val unescapeURL:    string -> string

(*  This escapes anything that is not an unreserved character in
    the string.
*)
val escapeURL:      string -> string

The parseURL function is the main interface. If the URL has an invalid format then the BadURL exception is raised. This exception carries an error message that the caller may choose to log. The parseURLPath function is not currently used. The parseSimplePath function is for URL paths in the node sections of the configuration file (see the section called The Node Parameters in Chapter 8). A simple path does not allow parameters on a path segment.

The URLToString function is the opposite of parseURL. It is used when formatting headers and also in generating HTML. It must reintroduce escaping for unsafe characters. The escapeURL and unescapeURL functions take care of the escaping. The remaining functions are utilities of occasional use.

Notes

[1]

It was inspired by the wset type in ML Server Pages from Moscow ML