Next Previous Contents

2. Overview of MLton

MLton is a whole-program optimizing SML compiler. By using whole-program compilation, MLton often reduces or eliminates the runtime cost that arises with separate compilation of features of SML such as functors, modules, polymorphism, and higher-order functions. MLton takes advantage of having the entire program to perform many transformations such as: defunctorization, monomorphisation, inlining, unboxing, argument flattening, redundant argument removal, and representation selection. Whole program compilation is an integral part of the design of MLton, and is not likely to change.

MLton compiles the full SML language and contains a mostly complete implementation of the basis library (section basis library). If there is a program that is valid according to The Definition of Standard ML that is not accepted by MLton, there is a bug (section bugs). At present MLton should only be run on valid SML programs. There is quite a bit of front-end error checking that is not done. If you feed MLton an invalid program, it may do any of the following: terminate and produce an executable, fail to terminate, or produce a terse error message. Note that this does not mean that the behavior of an executable generated by MLton is random. It simply means that the compiler may behave strangely for invalid SML programs. The behavior of MLton for valid SML programs is well defined.

MLton generates standalone executables with often very good performance (see the performance page). Whole program compilation does have significant space and time requirements; despite this limitation, MLton is capable of compiling large SML programs. To date, MLton has compiled itself (45K lines) and the ML Kit (75K lines). The distributed version of MLton is self-hosting. Compiling MLton requires a machine with at least 256M RAM.

MLton's runtime system is a simple two-space stop-and-copy collector, which works well for programs that allocate a lot of ephemeral data, but not so well for programs with large long-lived data. Despite this, the runtime system does not place an undue overhead when compiling large programs such as the compiler itself. By default, the runtime system automatically resizes the heap and stack. There are also several command line arguments to control heap resizing (section manual page).

The MLton implementation of arbitrary precision arithmetic (the IntInf structure) uses the GNU multiprecision library (gmp). Hence, for IntInf intensive programs, MLton is often orders of magnitude faster than SML/NJ.


Next Previous Contents