IL extensions for generics

This document describes the changes to the Common Language Runtime to support generics. These changes are:

Compatible.
IL code from the V1 CLR will run unmodified in the generics extension.
Minimal.
The changes introduce as few new instructions as possible. They also change as few existing instructions as possible.
Verifiable.
The extensions are verifiable.
JIT-friendly.
It should be possible to compile IL to native code using a variety of strategies. It is assumed that sophisticated type tracking is not performed (as the Econo-JIT does only precise tracking of value types). It is also possible to compile most generic code at installation time.
Compiler-friendly.
Generation of IL from source languages with generics is straightforward. It should be possible to generate code more-or-less directly from the parse tree.

Overview of Changes

To explain the proposed changes it is helpful to categorize instructions according to their "generic behaviour".

Many of the existing instructions are generic in the sense that they work uniformly over any V1 type; hence they can also work over values of type T (where T is a formal type parameter). Others are reference-generic as they work over any reference type; hence they can also work over values of type T when T is constrained by a class or interface. Further instructions are value-generic as they work over any value type; these can be generalized to work over reference types too, and hence over values of type T.

Some instructions are parameterized by type signatures that might contain formal generic parameters, and others are parameterized by member signatures that might contain formal generic parameters and are used for binding purposes in addition to conveying type information. The remainder of the instructions are either (a) unverifiable, (b) not generic, or (c) control flow.

Instructions

Category

Changes required

ldloc, ldloca, stloc,
ldarg, ldarga, starg,
dup, pop, ret

Generic.

No change.

beq, bne.un,
brfalse, brtrue,
ldind.ref, stind.ref,
ldelem.ref, stelem.ref,
ldnull

Reference-generic.

New: generic instructions ldelem and stelem.

cpobj, initobj, ldobj, stobj, box, unbox

Value-generic.

Change: generalize to all types (except unbox), and bind generic parameters in token. New: generic instruction unbox.any.

ldtoken,
newarr, ldelema,
castclass, isinst,
mkrefany, refanyval

Parameterized by type signature.

Bind generic parameters in the signature.

ldfld, ldflda, stfld,
ldsfld, ldsflda, stsfld,
call, callvirt, newobj,
jmp, ldftn, ldvirtftn

Parameterized by member signature.

When used on generic class instantiations or to access per-instantiation static fields, the member-reference must include an instantiation TypeSpec. When used on generic methods, it must take a method-instantiation token instead.

We now discuss each of the categories above in more detail.

Generic instructions

The generic instructions listed in the table operate on values of any type. They are extended to operate on values of type T (for some generic parameter T), and on instantiations of generic type definitions (including generic value types).

Reference-generic instructions

The reference-generic instructions listed in the table operate on or produce values of any reference type. They are extended to operate on values of type T (for some generic parameter T), and on generic class and interface instantiations (but not generic value types).

Value-generic instructions

The instructions listed in the third row of the table (cpobj, initobj, ldobj, stobj, box, unbox) currently operate on value types only. All but the last can be generalized to operate on reference types also. We introduce a new instruction unbox.any which generalizes unbox to work on reference types and value types.

Type tokens

Some instructions are parameterized by type tokens (typedefs, typerefs or typespecs). In some cases the types are redundant (because the type of the object on the stack can be determined statically anyway) but aid non-type-tracking JITs. Instructions in this subcategory include box and unbox. Other instructions require the type token for instance testing (castclass, isinst, refanyval, ldelema) or instance creation (newarr, mkrefany). Finally ldtoken determines the run-time type handle that corresponds to a metadata type token.

These instructions are extended to deal with the two new forms of types: generic parameters and generic type instantiations.

Member signatures

The instructions ldfld, ldflda, stfld, ldsfld, ldsflda, stsfld, call, callvirt, newobj, ldftn, ldvirtftn and jmp are parameterized by member signature tokens. In order to use these instructions with generic type instantiations, the signature must be a member reference (not definition) in which the generic type instantiation is expressed as a TypeSpec. (Compare the use of pseudo methods for arrays where a TypeSpec is used to describe the particular array type).

In contrast to array signatures, though, the signature itself (argument and result types) are the formal signature of the field or method and hence may include generic parameters.

New instructions

ldelem - load an element of an array

Format

Assembly Format

Description

A3 <T>

ldelem type

Load the element at index onto the top of the stack

Stack Transition:

..., array, index

..., value

Description:

The ldelem instruction loads the value of the element with index index (of type U) in the zero-based one-dimensional array array and places it on the top of the stack. The type of the return value is indicated by the type token type in the instruction.

If type refers to a typespec that includes generic parameters then these are first bound to the actual generic arguments of the enclosing generic type definition and/or generic method.

In other respects it has the same behaviour as the ldelem.* instruction corresponding to the generic type instantiation.

The ldelem instruction has the same behavior as the ldelema instruction followed by the (generalized) ldobj instruction, but can be implemented more efficiently.

Exceptions:

NullReferenceException is thrown if array is null.

IndexOutOfRangeException is thrown if index is larger than the bound of array.

ArrayTypeMismatchException is thrown if array doesn't hold elements of the required type.

Verifiability:

Correct IL code requires that array is either null or an array.

stelem - store an element of an array

Format

Assembly Format

Description

A4 <T>

stelem type

Replace array element at index with the value on the stack

Stack Transition:

..., array, index, value

...,

Description:

The stelem instruction replaces the value of the element with zero-based index index (of type U) in the one-dimensional array array with value. Arrays are objects and hence represented by a value of type O. The value has the type specified by the token type in the instruction.

If type refers to a typespec that includes generic parameters then these are first bound to the actual generic arguments of the enclosing generic type definition and/or generic method.

In other respects it has the same behaviour as the stelem.* instruction corresponding to the generic type instantiation.

The stelem instruction has the same behaviour as the ldelema instruction followed by the (generalized) stobj instruction, but can be implemented more efficiently.

Exceptions:

NullReferenceException is thrown if array is null.

IndexOutOfRangeException is thrown if index is larger than the bound of array.

ArrayTypeMismatchException is thrown if array doesn't hold elements of the required type.

Verifiability:

Correct IL requires that array be a zero-based, one-dimensional array.

unbox.any - convert boxed type to raw form

Format

Assembly Format

Description

A5 <T>

unbox.any type

Extract the value type data from obj, its boxed representation

Stack Transition:

..., obj

..., value

Description:

Type is a metadata token (a typeref, typedef, or typespec) indicating the type of value. If type refers to a typespec that includes generic parameters then these are first bound to the actual generic arguments of the enclosing generic type definition and/or generic method.

When applied to the boxed form of a value type, the unbox.any instruction extracts the value contained within obj and is therefore equivalent to unbox followed by ldobj.

When applied to a reference type, the unbox.any instruction has the same effect as castclass type.

Exceptions:

InvalidCastException is thrown if obj is not a (boxed) type.

NullReferenceException is thrown if obj is null.

TypeLoadException is thrown if type cannot be found. This is typically detected when IL is converted to native code rather than at runtime.

Changes to Existing Instructions

box, unbox

Format:

box type
unbox type

Description of change:

If the type signature type contains generic parameters then these are bound to the corresponding actual generic arguments to the enclosing generic method and/or generic type definition.

The box instruction is generalized to work over reference types, in which case it behaves as a no-op. Verifiable IL requires that the argument of the box instruction has type O.

call, callvirt

Format:

call method
callvirt method

Description of change:

In order to invoke a method on an object of a generic type instantiation, method must be a member reference whose parent is a TypeSpec that specifies the generic type instantiation of the object.
[Rationale: whilst this instantiation information is not strictly necessary, as it is always possible to determine the instantiated type of the object on the stack, not all JITs do sufficiently precise type-tracking.]

When method refers to a generic method with n generic parameters, method must be a method-instantiation token that specifies the method and its generic parameters.

Verifiability changes:

The binding of generic parameters must satisfy the constraints (if any) specified in the generic type definition or generic method.

Examples:

call instance int32
class MyStack<int32>::Length()
callvirt !0 class MyStack<class System.String>::Pop()
call void ArrayUtils::Reverse<int64>(!0[])
call void class FastArray<int32>::PairWith<int64>(!0)

newobj

Format:

newobj method

Description of change:

In order to create an object of a generic type instantiation, method must be the member reference of a constructor whose parent is a TypeSpec that specifies the generic type instantiation.

Verifiability changes:

The instantiation must satisfy the constraints (if any) specified in the generic type definition.

Examples:

newobj instance void class MyStack<int32>::.ctor()

newarr

Format:

newarr type

Description of change:

If type refers to a typespec that includes generic parameters then these are first bound to the corresponding actual generic arguments of the enclosing generic type definition and/or generic method. This may involve run-time lookup.

ldelema

Format:

ldelema type

Description of change:

If type refers to a typespec that includes generic parameters then these are first bound to the corresponding actual generic arguments of the enclosing generic type definition and/or generic method. This may involve run-time lookup.

castclass, isinst

Format:

castclass class
isinst class

Description of change:

If the type signature class contains generic parameters then these are bound to the corresponding actual generic arguments of the enclosing generic method and/or generic type definition. This may involve run-time lookup.

cpobj, initobj, ldobj, stobj

Format:

cpobj classTok

initobj classTok

ldobj classTok

ldobj classTok

Description of change:

classTok is permitted to refer to a class or interface type, in which case:

If classTok refers to a typespec that includes generic parameters then these are first bound to the corresponding actual generic arguments of the enclosing generic type definition and/or generic method.

ldfld, ldflda, stfld

Format:

ldfld field
ldflda field
stfld field

Description of change:

When field refers to an instance field in a generic type definition, it must be a member reference whose parent is a TypeSpec that specifies the generic type instantiation.

Example:

ldfld !0 value class MyPair<int32,class System.String>::fst

ldsfld, ldsflda, stsfld

Format:

ldsfld field
ldsflda field
stsfld field

Description of change:

When field refers to a per-instantiation static field in a generic type definition, it must be a member reference whose parent is a TypeSpec that specifies the generic type instantiation.

Assembler grammar

Changes are marked in red.

<decl> ::=
...
| .class <classAttr>* <id>
[ < <formalgenpars> > ] // formal generic parameters

[extends <typeSpec>] // can be a type instantiation
[implements <typeSpec> [, < typeSpec >]*]

{ <classDecl>* }

<type> ::=
...
| ! <int32> // class generic parameter accessed by number
| !! <int32> // method generic parameter accessed by number
| <type>
< <genpars> > // generic type instantiation

<methodHead> ::=
<methAttr>* <callConv> <type> <methodName> [ < <formalgenpars> >]
( <signature> ) <implAttr>*

<genpars> ::= <type> [, <genpars>]*
<formalgenpars> ::= <formalgenpar> [
, <formalgenpars>]*
<formalgenpar> ::= <constraints> <id>
<constraints> ::=

<type>
| ( <genpars> )

<instr_type> ::=
... | ldelem | stelem | unbox.any

Examples

Stack class

// Class takes a single parameter of ref or value type
.class public Stack<any T>
{
  // !0 refers to the first (and only) generic parameter
  .field private !0[] items
  .field private int32 nitems

  .method public instance void .ctor()
  {
    .maxstack 10
    ldarg.0
    call instance void System.Object::.ctor()
    ldarg.0
    ldc.i4 0
    stfld int32 class Stack<!0>::nitems
    ldarg.0
    ldc.i4 20
    call instance void !0[]::.ctor(int32)
    stfld !0[] class Stack<!0>::items
    ret
  }

  // Signature for method has generic parameter as result type
  .method public !0 Pop()
  {
    .maxstack 10
    .locals(int32)
    ldarg.0
    ldfld int32 Stack::nitems
    dup
    brfalse EmptyStack
    ldc.i4 1
    sub
    stloc.0
    ldarg.0
    ldloc.0
    stfld int32 class Stack<!0>::nitems
    ldarg.0
    ldfld !0[] class Stack<!0>::items
    ldloc.0
    ldelem !0
    ret
  EmptyStack:
    pop
    newobj instance void Empty::.ctor()
    throw
  }

  .method public void Push(!0)
  {
    .maxstack 10
    .locals(!0[])    // Locals can be generic
    ldarg.0
    ldfld !0[] class Stack<!0>::items
    callvirt int32 System.Array::GetLength()
    ldarg.0
    ldfld class Stack<!0>::nitems
    bne.un NoExpand
    // code for array expansion
    ldarg.0
    ldfld !0[] class Stack<!0>::items
    stloc.0
    ldarg.0
    ldfld int32 class Stack<!0>::nitems
    ldc.i4 2
    mul
    call instance void !0[]::.ctor(int32)
    ldarg.0
    stfld !0[] class Stack<!0>::items
    ldloc.0
    ldarg.0
    ldfld !0[] class Stack<!0>::items
    ldarg.0
    ldfld class Stack<!0>::nitems
    call void System.Array::Copy(class System.Array, 
      class System.Array, int32)        
  NoExpand:
    ldarg.0
    ldfld !0[] class Stack<!0>::items
    ldarg.0
    ldfld int32 class Stack<!0>::nitems
    ldarg.1
    stelem !0
    ldarg.0
    ldarg.0
    ldfld int32 class Stack<!0>::nitems
    ldc.i4 1
    add
    stfld int32 class Stack<!0>::nitems
    ret
  }

  .method public static void main() il managed implemented
  {
    .entrypoint
    .maxstack  10

    // Locals can be generic type instantiations
    .locals(class Stack<class System.String>, 
    class Stack<int64>)

    // Create a Stack<String> with two elements
    newobj instance void class Stack<class System.String>::.ctor()
    stloc.0
    ldloc.0    
    ldstr "Rock!"
    // meth sig includes generic parameters
    callvirt void class Stack<class System.String>::Push(!0)
    ldloc.0    
    ldstr "Generics"
    callvirt void class Stack<class System.String>::Push(!0)

    // Pop the elements and print them
    ldloc.0
    // notice generic parameters again
    callvirt !0 class Stack<class System.String>::Pop()
    call void System.Console::WriteLine(class System.String)
    ldloc.0
    callvirt !0 class Stack<class System.String>::Pop()
    call void System.Console::WriteLine(class System.String)

    // Create a Stack<int64> with one element
    newobj instance void class Stack<int64>::.ctor()
    stloc.1
    ldloc.1    
    ldc.i8 123456789
    callvirt void class Stack<int64>::Push(!0)
    ...
    ret
  }
}

Generic method

  // static void Reverse<T>(T[] arr, int index, int length)
.method public static
void Reverse<any>(!!0[] arr,int32 index,int32 length)
  {
    .maxstack 10
    .locals(
      int32 i,          // int i;
      int32 j,          // int j;
      !!0 temp)         // T temp;
    ldarg index         // i = index;
    stloc i
    ldarg index
    ldarg length        // j = index + length - 1;
    add
    ldc.i4 1
    sub
    stloc j
  Loop:
    ldloc i             // while (i < j) {
    ldloc j
    bge Finish
    ldarg arr           // temp = arr[i];
    ldloc i
    ldelem !!0
    stloc temp
    ldarg arr           // arr[i] = arr[j]; 
    ldloc i
    ldarg arr
    ldloc j
    ldelem !!0
    stelem !!0
    ldarg arr           // arr[j] = temp;
    ldloc j
    ldloc temp
    stelem !!0
    ldloc i             // i++;
    ldc.i4 1
    add
    stloc i
    ldloc j             // j--;
    ldc.i4 1
    sub
    stloc j
    br Loop             // }
  Finish:
    ret
  }