Notes about the environment problem

TIES542 Principles of Programming Languages, Spring 2017
Antti-Juhani Kaijanaho

All programmers are probably aware of the basic stack-frame technique of implementing basic subroutine (function, method) calls. There is a stack (usually drawn as growing downward) where each time a subroutine call is made, new stack frame (in Finnish pinokehys) is pushed. The stack frame contains things like

the return address (where control should be returned after the call is finished), and
some local variables of the called function.

The environment problem (also known as the funarg problem) occurs when a language permits both defining subroutines inside subroutines (as lambda expressions or otherwise) and passing subroutines around as data (whether as arguments, as return values, or as the contents of a variable). Except for C, all major modern languages support this in one form or another.

For example, Java has always supported so called anonymous inner classes. These are subclasses declared directly in a new expression and have been used for decades to code simple callbacks (listeners) in AWT and Swing and related applications. Java 8 adds support for lambda expressions, which are a more compact way to express the most common use cases.

The simple version of the environment problem (called the downward funarg problem) concerns situations where a locally declared subroutine is passed as an argument to some subroutine, which (either itself or via some other subroutine it calls) then calls this argument subroutine before returning. A classic example is passing an ordering function, whose behavior depends on local variables of its enclosing subroutine, to a sort subroutine, like this (C-like pseudocode):

enum direction { FORWARD, BACKWARD };
void foo(enum direction dir)
{
    int compare(string a, string b) {
        switch (dir) {
        case FORWARD:
            return strcmp(a, b);
        case BACKWARD:
            return strcmp(b, a);
        }
    }
    ....
    string array[];
    ...
    sort(array, compare);
    ...
}

After foo calls sort, and sort calls its second parameter (here foo's compare), the stack looks like this:

Stack after sort has called foo's inner compare

The environment problem: how does compare know where the dir variable it needs is located? If foo is recursive, there may be multiple dir variables on the stack. Theoretically, sort may also possess a local variable called dir (and in some situations this is not at all a theoretical issue).

The standard solution is to pass compare to sort as an object containing two fields: one specifies the routine code to be called, and the other points out the stack frame of the foo activation that actually created that object. We call the latter the static link. This two-field object is nowadays commonly called a closure (in Finnish sulkeuma); the Moses paper called it FUNCTION.

Things become more complicated if the closure needs to remain live after the activation of the enclosing routine has returned (this is the upward funarg problem). This happens classically when functions return other functions and also when closures are saved in long-lived objects. The latter is quite common with GUI event callbacks: each callback gets saved in the GUI object to await the time a relevant event arrives. By that time, the original subroutine that created the closure has long since finished.

A simple example of the former (using again C-like pseudocode):

( int (string,string) ) make_comparator(enum direction dir)
{
    int return_value(string a, string b) {
        switch (dir) {
        case FORWARD:
            return strcmp(a, b);
        case BACKWARD:
            return strcmp(b, a);
        }
    }
    return return_value;
}


void foo(enum direction dir)
{
   ....
    string array[];
    ...
    sort(array, make_comparator(dir));
    ...
}```

Now, as you can see in the picture below, when sort is ready to call its second argument, the dir variable in the make_comparator stack frame has been long destoyed.

Stack during make_comparator call and after sort is called

The simple solution to this harder problem is to not use the standard stack at all but to allocate stack frames (now better called activation records, in Finnish aktivaatiotietue) in the garbage-collected heap. If each activation record contains a pointer (called a dynamic link) to the activation record of its caller, then logical stack structure is preserved but those activation records whose contents must survive the return of the subroutine are not prematurely destroyed. This is called a spaghetti stack.

More commonly, compilers analyze programs to determine which local variables escape, that is, which local variables must outlive the return of the subroutine. Only escaping variables are allocated on the garbage-collected heap, and non-escaping variables remain on the standard stack. Since most local variables do not escape, this means that only rarely do we have to allocate something on the garbage-collected heap upon function call.