Meeting 14 lecture notes

TIES542 Principles of Programming Languages, Spring 2017
Antti-Juhani Kaijanaho

Basics of subtyping

The essence of subtyping is the principle of substitutibility or the principle of subsumption: if type \(T\) is a subtype of type \(U\) (written sometimes \(T \leq U\), sometimes \(T \mathbin{<:} U\)), then any term of type \(T\) can be used where a term of type \(U\) is expected. Formally:

\[\begin{gathered} \Gamma \vdash t : T \qquad T \leq U \\ \hline \Gamma \vdash t : U \end{gathered}\]

From this principle, we can derive a number of sutyping rules. For example:

Consider the case of simple records containing named fields. Let us say that \(T = \{ a : \mathbf{Num}, b : \mathbf{String} \}\) and \(U = \{ a : \mathbf{Num}, b : \mathbf{String}, c : \mathbf{Num} \}\). Can we justify some sort of subtype relation between them? Suppose that \(o : T\); then we may (type-correctly) write \(o.a\) and \(o.b\). Similarly, if \(p : U\), we may (type-correctly) write \(p.a\), \(p.b\), and \(p.c\). Clearly, \(p\) may be used wherever \(o\) is expected (since \(p\) has all the operations expected of \(o\). More generally, adding fields to a record creates a subtype:

\[\begin{gathered}\hline \{ \ell_1 : T_1, \dots, \ell_m : T_m, \dots, \ell_n : T_n \} \leq \{ \ell_1 : T_1, \dots, \ell_m : T_m \} \end{gathered}\]
Continue considering the case of records. What happens if we replace a field's type with a subtype? Say \(T = \{ a : U \}\) and \(T' = \{ a : U' \}\), with \(U' \leq T'\)? Assume no modification of a field is allowed. This means that the only relevant operation is \(o.a\); if \(o\) is expected to be of type \(T\), then \(o.a\) is expected of type \(U\), but if \(o\) is in fact of type \(T'\), then \(o.a\) is in fact of type \(U'\). It seems \(T'\) may be used whenever \(T\) is expected. Thus:

\[\begin{gathered} T_m' \leq T_m\\ \hline \{ \ell_1 : T_1, \dots, \ell_m : T_m', \dots, \ell_n : T_n \} \leq \{ \ell_1 : T_1, \dots, \ell_m : T_m, \dots, \ell_n : T_n \} \end{gathered}\]
If modification of a field value is allowed, then we must also consider assignment to a field. Assume \(T = \{ a : U \}\) and \(T' = \{ a : U' \}\), with \(U' \leq T'\), as above. Now, if \(x : U\) and \(o : T\), the assignment \(o.a \mathbin{:=} x\) is obviously type-correct. But if \(o : T'\), the assignment \(o.a \mathbin{:=} x\) must be ill-typed (trying to use the supertype \(U\) in place of its subtype \(U'\)). But of \(o : T\) and \(x : U'\), all is well again. Now, combining this result with the previous one, we arrive at the following subtyping rule when field assigment is allowed:

\[\begin{gathered} T_m' \leq T_m\qquad T_m \leq T_m'\\ \hline \{ \ell_1 : T_1, \dots, \ell_m : T_m', \dots, \ell_n : T_n \} \leq \{ \ell_1 : T_1, \dots, \ell_m : T_m, \dots, \ell_n : T_n \} \end{gathered}\]

The case of a function type is more difficult. There are two places where subtyping may affect things:

Let us vary the parameter type. Suppose \(f : T \to U\) and \(f' : T' \to U\), and \(x : T\). Now \(f(x)\) is obviously ok, but what about \(f'(x)\)? Here we are trying to use a \(T\) as if it were \(T'\). At the same time we are trying to use \(T' \to U\) as if it were \(T \to U\).
Let us vary the result type. Suppose \(f : T \to U\) and \(f' : T \to U'\), and \(x : T\). Consider another function \(g : U \to W\). Clearly \(g(f(x))\) is well typed; but what about \(g(f'(x))\)? Here we are trying to use an \(U'\) as if it were \(U\), and a \(T \to U'\) as if it were \(T \to U\).

In summary, we arrive at the following subtyping rule:

\[\begin{gathered} U_1 \leq T_1 \qquad T_2 \leq U_2 \\ \hline T_1 \to T_2 \leq U_1 \to U_2 \end{gathered}\]

Notice how the parameter types invert while the result types do not. We commonly say that the result type is covariant (varies in the same direction) and the parameter type is contravariant (varies in the oppopsite direction). The contravariance of parameter types is what makes subtyping theory difficult.

(A record is said to be covariant in its field types, if field modification is not allowed, and invariant in its field types, otherwise.)

The problem of binary methods

Both of the preliminary reading papers deal with the problem of using the self-type of an object as the type of a method parameter. The canonical example (discussed by Cook et al. on page 128) is of an equality predicate. In Java, we define object equality by overriding the method Object.equals(Object), for example like this:

class A {
    int i;
    public boolean equals(Object o) {
        if (!(o instanceof A)) return false;
        return i == ((A)o).i;
    }
}

This makes some sense if we must be able to compare A objects for equality to any other object, A or otherwise. But there are problems: for example, which equals method is used depends on which object is put in the this position. Secondly, it sometimes makes no sense to compare objects from different parts of the inheritance hierarchy. I would thus like to be able to write something like this:

class A {
    int i;
    public boolean equals(A oa) {
        return i == oa.i;
    }
}

In Java as it stands today, this is brittle, because this does not override Object.equals(Object) and thus which equals method depends also on the declared type of the argument expression! But even dismissing this problem, it is not quite good yet. Consider:

class A {
    int i;
    public boolean equals(A oa) {
        return i == oa.i;
    }
}
class B extends A {
    String s;
    public boolean equals(B ob) {
        return i == ob.i && s.equals(ob.s);
    }
}

Suppose now that we have A a1 = new A(), B b1 = new B(), B b2 = new B(), and A a2 = b2. Because of late binding, b2.equals(a1) and a2.equals(a1) both behave identically, but b1.equals(b2) and b1.equals(a2) do not, even though the parameter objects are exactly the same. This is not satisfactory.

What is needed here is the ability to specify that the parameter type must be exactly the same as the type of the implicit this parameter. There have been various proposals in the literature on how this can be accomplished (Ryu offers one of the most recent ones); they vary in detail but they all share the basic idea of a new magic type name, called This (or something similar). With it, we can write

class A {
    int i;
    public boolean equals(This oa) {
        return i == oa.i;
    }
}
class B extends A {
    String s;
    public boolean equals(This ob) {
        return i == ob.i && s.equals(ob.s);
    }
}

meaning that A can be compared with A and that B can be compared with B but that they may not be mixed, nor can anything else be compared with A or B. But this, even though it is highly desirable, breaks subtyping:

Suppose now that we have A a1 = new A(), A a2 = new B(), and A a3 = new A(). Now, a1.equals(a3) works but a2.equals(a3) does not; thus, B cannot be used everywhere A can, and thus B is not a subtype of A even though it is inherited from it.