4.2. WRITING EFFICIENT CODE 39
4.2.2 Minimizing consing
We have already considered the use of stack space. But various constructs in func-
tional programs use another kind of store, usually allocated from an area called the
heap. Whereas the stack grows and shrinks in a sequential manner based on the
flow of control between functions, other storage used by the CAML system cannot
be reclaimed in such a simple way. Instead, the runtime system occasionally needs
to check which bits of allocated memory aren’t being used any more, and reclaim
them for future use, a process known as garbage collection. A particularly important
example is the space used by constructors for recursive types, e.g. ::. For example,
when the following fragment is executed:
let l = 1::[] in tl l;;
a new block of memory, called a ‘cons cell’, is allocated to store the instance of
the :: constructor. Typically this might be three words of storage, one being an
identifier for the constructor, and the other two being pointers to the head and
tail of the list. Now in general, it is difficult to decide when this memory can be
reclaimed. In the above example, we immediately select the tail of the list, so it
is clear that the cons cell can be recycled immediately. But in general this can’t
be decided by looking at the program, since l might be passed to various functions
that may or may not just look at the components of the list. Instead, one needs
to analyze the memory usage dynamically and perform garbage collection of what
is no longer needed. Otherwise one would eventually run out of storage even when
only a small amount is ever needed simultaneously.
Implementors of functional languages work hard on making garbage collection
efficient. Some claim that automatic memory allocation and garbage collection
often works out faster than typical uses of explicit memory allocation in languages
like C (malloc etc.) While we wouldn’t go that far, it is certainly very convenient
that memory allocation is always done automatically. It avoids a lot of tedious and
notoriously error-prone parts of programming.
Many constructs beloved of functional programmers use storage that needs to
be reclaimed by garbage collection. While worrying too much about this would
cripple the style of functional programs, there are some simple measures that can
be taken to avoid gratuitous consing (creation of cons cells). One very simple rule
of thumb is to avoid using append if possible. As can be seen by considering the
way the recursive calls unroll according to the definition
#let rec append l1 l2 =
match l1 with
[] -> l2
| (h::t) -> h::(append t l2);;
this typically generates n cons cells where n is the length of the first argument
list. There are often ways of avoiding appending, such as adding extra accumulator
arguments to functions that can be augmented by direct use of consing. A striking
example is the list reversal function, which we coded earlier as:
#let rec rev =
fun [] -> []
| (h::t) -> append (rev t) [h];;
This typically generates about n
2
/2 cons cells, where n is the length of the list.
The following alternative, using an accumulator, only generates n of them: