Generics in the Common Type System

This document describes the changes to the type system of the CLR to support generics (generic type definitions and generic methods). This design has the following aims:

Orthogonality with respect to the existing CLR.: Where possible, generic type instantiations may occur in any context where existing CLR types may occur.
Language independence.: No assumptions about the source language have been made. We wish to support the existing generics-like features of as many languages as possible. We would also like to ensure that the design permits clean extensions of languages currently lacking generics; in particular, the Microsoft family consisting of VB, VC++ and C#.
Implementation independence.: The specification is independent of any particular implementation: a runtime compiler should be free to expand a generic type instantiation a la C++ templates, or to specialize representations and code on a case-by-case basis, or to share all representations and code (perhaps boxing and unboxing values to achieve this).
Implementation efficiency.: The design permits efficient implementation. Performance of generics should be no worse than the current use of Object to simulate generics; a good implementation should do much better, avoiding casts on reference type instantiations and producing specialized code for value type instantiations.
Statically checked at point of definition, not use.: A generic type definition can be validated and verified independently of its uses (contrast with C++ templates).
Uniform behaviour with respect to type parameters.: Our interpretation of generics or parametric polymorphism is that behaviour of generic types and generic methods should be "the same" at all type instantiations (again contrast C++ templates). This principle can be broken by code that switches on run-time instantiation information through casting and instance-of tests.

For concreteness, code fragments are written in a "C# with generics" pseudo-code.

Specification

Generic Type Definitions

A generic type definition is a class, interface, or value type definition with formal generic parameters. These generic parameters represent types. Each generic parameter has a name, and optionally constraints that must be interface or class types. The generic parameter is in scope in the declarations of:

the generic parameter constraints, the base class that the class extends, and the interfaces that it implements;
all instance members (fields, methods, constructors, properties and events)

Some examples are shown in Section 3.

The following restrictions are enforced:

"Naked" generic parameters (that is, the generic parameter itself, not nested inside some other constructed type) cannot be used to specify the superclass, interfaces, or constraints.

Rationale: the inheritance hierarchy must be determinable at the point of definition, not use.
When instantiations of generic type definitions are ignored, there can be no cycles in the inheritance/interface hierarchy. To be precise, define a graph whose nodes are possibly-parameterized (but uninstantiated) classes and interfaces and whose edges are the following:
1. If a (possibly-parameterized) class or interface C extends or implements a class or interface D, then add an edge from C to D.
2. If a (possibly-parameterized) class or interface C extends or implements an instantiated class or interface D<type-1, ..., type-n> then add an edge from C to D.
The graph is legal if it contains no cycles.

Rationale: when also considering the instantiations of generic type definitions, we wish to avoid cycles and infinite ascending chains in the inheritance hierarchy, whilst permitting definitions such as CloneableArrayList (see Section 2.13) in which there is a benign recursive reference to the class being defined. Here are some example definitions that we wish to outlaw:
Infinite chain:
```
class C<T> : C<C<T>>
```
Cycle of period 1:
```
class C<T> : C<String>
```
Cycle of period 2:
```
class C<T> : D<String>
class D<T> : C<String>
```
Infinite chain through mutual recursion:
```
class C<T> : D<C<T>>
class D<T> : C<D<T>>
```

Type Syntax

The grammar of types in the CLR is extended in the following ways.

Given a generic class, interface or value type definition, the generic parameters can be bound to actual generic arguments (i.e. types) to produce a generic type instantiation. The number of arguments must match the number of generic parameters specified in the generic type definition, and the constraints on the generic parameters must be observed. Examples:

Queue<String>
Pair<int,Object>
Pair<int,Pair<int,String>>
IEnumerable<float>
ICloneable<Queue<String>>
Vec3<double>

Generic parameters can be used as types whenever they are in scope, subject to the restrictions described above.

Type Equivalence

Equivalence on generic class and value type instantiations is by structure. That is, two generic type instantiations are equivalent if and only if they are instantiations of the same generic type definition and their generic arguments are equivalent. This applies to static type signatures and to run-time type representations (type handles).

Rationale: orthogonality, also compare arrays.

Run-time types and casting

Generic type definitions

A consequence of orthogonality with the existing CLR is that objects of generic type instantiations carry sufficient information to recover the actual arguments at run-time (the exact type). This manifests itself through casting and instance-of testing, as well as in reflection capabilities of course (Object.GetType).

The CLR rules for subtyping are extended as follows. Suppose that generic class or interface C has the header (constraints are optional)

C<tyvar-1, ..., tyvar-n> :
class-type, interface-type-1, ..., interface-type-m
where constraints

Then C<type-1,...,type-n> is a subtype of

class-type[substitute type-1 for tyvar-1,...,type-n for tyvar-n];

interface-type-j[substitute type-1 for tyvar-1,...,type-n for tyvar-n] for each j.

A consequence is that user-defined generic type definitions are not covariant in their type parameters, in contrast to the built-in array types and the generic types of the Eiffel programming language. For example, Queue<String> is not a subtype of Queue<Object>.

Rationale: this would lead to a type loophole without complex restrictions.

Accessibility of Members

A member of an instantiation of a generic type definition C is accessible if and only if the member is accessible in C itself.

Signatures and Binding

Instance members (fields and methods) of a generic type instantiation are referred to by member-ref, which consists of two parts:

The parent of the member, in this case an instantiation of the generic type definition. For example: IComparer<String>.
The name and formal signature of the member. For example: int Compare(T,T).

It is possible for distinct members to have identical types when instantiated, but which can be distinguished by member-ref. For example:

class C<S,T>
{
  String f;
  S f;
  void m(S x) {...}
  void m(T x) {...}
  void m(String x) {...}
}

The generic class instantiation C<String,String> now has three methods called m, all with the same type, and two fields called f with the same type. They are easily distinguished through the member-ref encoding described above:

String C<String,String>::f
S C<String,String>::f
void C<String,String>::m(S)
void C<String,String>::m(T)
void C<String,String>::m(String)

How a source language might resolve this kind of overloading is another matter, of course.

Rationale:

For interop purposes it might be useful to define one method with Object argument type and another at type parameter T.

To outlaw such ambiguous classes could involve a relatively expensive check in the class loader.

Inheritance and Overriding

There are two questions to answer here.

Given a method definition, which (if any) method in a superclass or interface does it override or implement? More concretely: which vtable slot (possibly new to this type) does it occupy?
In the absence of such a definition, which method implementations are inherited from the superclass?

The rules that determine the answers are a straightforward generalization of the existing rules, which are based around the method signature: the name and types of arguments and results. However, we must now take into account the instantiations of the superclass and implemented interfaces. This is illustrated in the following C# example:

class C<S,T>
{
  public virtual T m1() { ... } 
  public virtual S m2(T x) { ... } 
} 

class D<R> : C<int,R[]>
{
  override public R[] m1() { ... }
  // inherits int m2(R[])
}

class E<X> : D< List<X> >
{
  // inherits List<X>[] m1()
  override public int m2(List<X>[] x) { ... }
}

A single method body can serve as the implementation of multiple interfaces, just as in the existing runtime:

interface I<T>
{
  void m(T x);
} 

interface J
{
  void m(int x);
}

class C : I<int>, J
{
  void m(int x) { ... }
}

MethodImpls can be used to provide distinct implementations. In C#:

class C : I<int>,
J
{
  void I<int>.m(int x) { ... }
  void J.m(int x) { ... }
}

Restrictions

A class must not (directly or indirectly through interface inheritance) implement the same interface at more than one instantiation. For example, the class definitions below are illegal:

interface I<T> {
void m(T x); void n(); }
class C : I<int>, I<string>
class D<T> : I<T>, I<double>

Rationale:

The possibility of multiple instantiations introduces new ambiguities for which there's no obvious natural resolution. (e.g. in the above definition which method implementation does D<double>::n() refer to?).
It's tricky to implement without incurring overhead on every interface invoke.

Generic Methods

A generic method is a method for which formal generic parameters are declared. As with generic type definitions, each generic parameter has a name, and optionally constraints that must be interface or class types. See Section 2.14 for some examples.

The generic parameters are in scope in the types of the arguments and result (despite the appearance of the generic parameters to the right of the result type) and in the body of the method.

Generic instance (virtual and non-virtual) methods can occur inside generic type definitions, in which case the generic parameters of both the class and the method are in scope in the method.

Examples

Generic value type

public struct Pair<A, B> 
{ 
  A fst;
  B snd;

  Pair(A a, B b)
  { this.fst = a; this.snd = b; }

  public A GetFst() { return fst; }
  public B GetSnd() { return snd; }
}

Generic class

public class Stack<T>
{
  T[] items; 
  int nitems;
  const int initialsize = 20;

  public Stack() 
  { 
    nitems = 0; 
    items = new T[initialsize]; 
  }

  void Push(T item) 
  {
    if (items.Length == nitems)
    { T[] temp = items;
      items = new T[nitems*2];
      Array.Copy(temp, items, nitems); 
    }
    items[nitems++] = item;
  }

  T Pop() 
  { 
    if (nitems == 0) throw new EmptyStack(); 
    else return items[--nitems]; 
  }

  bool IsEmpty() { return nitems==0; }
}

Generic interfaces

public interface IEnumerator<T> 
{ 
  bool MoveNext();
  T GetCurrent();
  void Reset();
}

public interface Ienumerable<T>
{
  Ienumerator<T> GetEnumerator();
}

public interface Icloneable<T>
    where T: Object   // notice constraint
{
  T Clone();
}

public interface Icomparable<T>
{
  int Compare(T x, T y);
}

Subclassing and implementing

public class ArrayEnumerator<T> : Ienumerator<T> {...}

public class Vowels : Ienumerator<String> {...}

public class Primes : Ienumerator<int> {...}

public abstract class PrintableList<T : Iprintable> :
List<T>, Iprintable {...}

// Generic parameters in constraints (aka "F-bounded" polymorphism)
// Also notice recursive reference to class in interface
public abstract class CloneableArrayList<T : Icloneable<T>>
: ArrayList<T>, Icloneable<CloneableArrayList<T>> {...}

Generic Methods

public class ArrayUtils
{
  static void Reverse<T>(T[] arr, int index, int length)
  {
    int i = index;
    int j = index + length - 1; 
    T temp;

    while (i < j) 
    { temp = arr[i];
      arr[i] = arr[j];
      arr[j] = temp;
      i++; j--; }
  }

  static T[] Slice<T>(T[] arr, int index, int length)
  {
    T[] copy = new T[length];
    int j;
    for (j = 0; j < length; j++)
      copy[j] = arr[index+j];
    return copy;
  }
  static void Sort<T>(T[] arr, Icomparer<T> comparer) {...}
}

// Generic instance method inside a generic class definition
class FastArray<T>
{
  ...methods from earlier...
  void PairWith<U>(FastArray<U> that)
  {
    int n = this.Length;
    FastArray<Pair<T,U>> result = new FastArray<Pair<T,U>>(n);
    for (int j = 0; j < n; j++)
      result[j] = new Pair<T,U>(this[j], that[j]);
    return result;
  }

  void PairWith<U>(U that)
  {
    int n = this.Length;
    FastArray<Pair<T,U>> result = new FastArray<Pair<T,U>>(n);
    for (int j = 0; j < n; j++)
      result[j] = new Pair<T,U>(this[j], that);
    return result;
  }
}
// Example of PairWith in action, assuming arr1 : FastArray<int>
FastArray<Pair<int,String>> arr2 = arr1.PairWith<String>("foo");