Chapter 4. The SML/NJ Extensions

Table of Contents
The Unsafe API
Signals
The SMLofNJ API
The Socket API

These are extensions to the Basis library that are specific to SML/NJ. You can find reference documentation to them in the "Special features of SML/NJ" page via the SML/NJ home page[SML].

The Unsafe API

The Unsafe API is a collection of functions that bypass the normal safety checks of the language and the Basis library. These functions are available in the Unsafe structure. It provides:

Unchecked subscripting is used internally by the array and vector functions in the Basis library. Wherever possible you should design your code to make use of the Basis functions. Using the unchecked operations directly puts your program at risk of crashing.

Unsafe Vectors and Arrays

The following monomorphic vectors and arrays are available.

Unsafe.CharVector

This operates on strings, which are vectors of characters.

Unsafe.Word8Vector

This operates on vectors of bytes.

Unsafe.CharArray, Unsafe.Word8Array

These operate on arrays of characters or bytes.

Unsafe.Real64Array

This operates on arrays of double precision reals. The C equivalent would be the array type double[].[1]

These structures conform to one of these two signatures.

signature UNSAFE_MONO_VECTOR =
  sig

    type vector
    type elem

    val sub : (vector * int) -> elem
    val update : (vector * int * elem) -> unit
    val create : int -> vector

  end

signature UNSAFE_MONO_ARRAY =
  sig

    type array
    type elem

    val sub : (array * int) -> elem
    val update : (array * int * elem) -> unit
    val create : int -> array

  end

So you can see that you get to update elements of vectors in place just as you can with arrays. The create functions create a vector or array of the given length with uninitialised elements.

For arrays and vectors of other kinds of elements there are the structures Unsafe.Vector and Unsafe.Array which conform to the following signatures.

signature UNSAFE_VECTOR =
  sig

    val sub : ('a vector * int) -> 'a
    val create : (int * 'a list) -> 'a vector

  end

signature UNSAFE_ARRAY =
  sig

    val sub : ('a array * int) -> 'a
    val update : ('a array * int * 'a) -> unit
    val create : (int * 'a) -> 'a array

  end

The vector create function creates a vector from a list. You have to supply the length of the list as the first argument. The array create function creates an array given a length and an initial value for each element.

Memory Representation Information

The Unsafe.Object structure provides some functions for getting information about the memory representation. Read the source code in the boot/Unsafe/object* files of the compiler. You won't find much use for this in your programs. The most useful functions look like being the toWord32, toInt32 functions which can convert a byte array to a 32 bit integer. But there isn't enough functionality here to be useful for serialising values into a wire protocol. (See the section called Integers in Chapter 3 for serialising integers).

You could use this structure to estimate the size of objects in memory. Here is my version of a function to estimate the size of a value, including pointed-to values. I've used O as an alias for Unsafe.Object.

(*  Estimate the size of v in 32-bit words.
    Boxed objects have an extra descriptor word
    which also contains the length for vectors
    and arrays.
*)
fun sizeof v =
let
    fun obj_size obj =
    (
        case O.rep obj of
          O.Unboxed => 1    (* inline 31 bits *)
        | O.Real    => 1+2

        | O.Pair      => tup_size obj
        | O.Record    => tup_size obj
        | O.RealArray => tup_size obj

        | O.PolyArray => arr_size obj

        (* includes Word8Vector.vector
           and CharVector.vector
        *)
        | O.ByteVector => 1 +
            ((size(O.toString obj)+3) div 4)

        (* includes Word8Array.array
           and CharArray.array
        *)
        | O.ByteArray =>  1 +
            ((Array.length(O.toArray obj)+3) div 4)

        | _ => 2    (* punt for other objects *)
    )

    (*  Count the record plus the size of
        pointed-to objects in the heap.
    *)
    and tup_size obj =
    let
        fun sz obj =
            if O.boxed obj
            then
                1 + (obj_size obj)
            else
                1
    in
        Vector.foldl
            (fn (obj, s) => s + (sz obj))
            1
            (O.toTuple obj)
    end

    and arr_size obj =
    let
        fun sz obj =
            if O.boxed obj
            then
                1 + (obj_size obj)
            else
                1
    in
        Array.foldl
            (fn (obj, s) => s + (sz obj))
            1
            (O.toArray obj)
    end
in
    obj_size(O.toObject v)
end

This is a main function to try it out.

fun main(arg0, argv) =
let
    fun show name v = print(concat[
            "Size of ", name,
            " = ", Int.toString(sizeof v),
            " 32-bit words\n"])
in
    show "integer"  3;
    show "real"     3.3;
    show "string"   "abc";

    show "pair"     ("abc", 42);
    show "record"   {a = 1, b = 4.5, c = "fred"};

    OS.Process.success
end

See the section called Heap Object Layout in Chapter 7 for more information on object layout in the heap.

The C Interface

The runtime includes a collection of C functions that implement the low-level Basis operations such as those in the Posix structure. The SML code calls these C functions using the functions in the Unsafe.CInterface structure. These functions must be specially written to take arguments in the form of SML values. This is not a general purpose interface to C functions. I only mention it in case you think that it is for general purpose use.

Later versions of SML/NJ will include a general purpose interface for calling any C function in a shared library which is loaded at run-time.

Miscellaneous Unsafe Operations

The Unsafe.blastRead and Unsafe.blastWrite functions are used to serialise/deserialise entire data structures for writing to files. The blastWrite function is expensive to run since it uses the garbage collector to traverse the data structure to locate all values reachable from the root value. You shouldn't call it often to serialise small data structures. Instead it is intended that you build up an entire data structure and then dump it into a file at exit time.

The Unsafe.cast function can be used to cast a value to any other type. This of course is very dangerous unless you know the underlying memory representation. Most cases where you might want to do this are already provided for. For example converting between bytes and characters is provided in the Byte structure.

The other functions in Unsafe should not be used. Some are used by separate systems such as the Concurrent ML library which we will be using later.

The Unsafe.Poll structure is not normally accessible and isn't interesting to us.

Notes

[1]

There should be a Unsafe.Real64Vector but it isn't implemented yet.