This document describes the changes to the Common Language Runtime to support generics. These changes are:
To explain the proposed changes it is helpful to categorize instructions according to their "generic behaviour".
Many of the existing instructions are generic in the sense that they work uniformly over any V1 type; hence they can also work over values of type T (where T is a formal type parameter). Others are reference-generic as they work over any reference type; hence they can also work over values of type T when T is constrained by a class or interface. Further instructions are value-generic as they work over any value type; these can be generalized to work over reference types too, and hence over values of type T.
Some instructions are parameterized by type signatures that might contain formal generic parameters, and others are parameterized by member signatures that might contain formal generic parameters and are used for binding purposes in addition to conveying type information. The remainder of the instructions are either (a) unverifiable, (b) not generic, or (c) control flow.
Instructions |
Category |
Changes required |
ldloc, ldloca, stloc, |
Generic. |
No change. |
beq, bne.un, |
Reference-generic. |
New: generic instructions ldelem and stelem. |
cpobj, initobj, ldobj, stobj, box, unbox |
Value-generic. |
Change: generalize to all types (except unbox), and bind generic parameters in token. New: generic instruction unbox.any. |
ldtoken, |
Parameterized by type signature. |
Bind generic parameters in the signature. |
ldfld, ldflda, stfld, |
Parameterized by member signature. |
When used on generic class instantiations or to access per-instantiation static fields, the member-reference must include an instantiation TypeSpec. When used on generic methods, it must take a method-instantiation token instead. |
We now discuss each of the categories above in more detail.
The generic instructions listed in the table operate on values of any type. They are extended to operate on values of type T (for some generic parameter T), and on instantiations of generic type definitions (including generic value types).
The reference-generic instructions listed in the table operate on or produce values of any reference type. They are extended to operate on values of type T (for some generic parameter T), and on generic class and interface instantiations (but not generic value types).
The instructions listed in the third row of the table (cpobj, initobj, ldobj, stobj, box, unbox) currently operate on value types only. All but the last can be generalized to operate on reference types also. We introduce a new instruction unbox.any which generalizes unbox to work on reference types and value types.
Some instructions are parameterized by type tokens (typedefs, typerefs or typespecs). In some cases the types are redundant (because the type of the object on the stack can be determined statically anyway) but aid non-type-tracking JITs. Instructions in this subcategory include box and unbox. Other instructions require the type token for instance testing (castclass, isinst, refanyval, ldelema) or instance creation (newarr, mkrefany). Finally ldtoken determines the run-time type handle that corresponds to a metadata type token.
These instructions are extended to deal with the two new forms of types: generic parameters and generic type instantiations.
The instructions ldfld, ldflda, stfld, ldsfld, ldsflda, stsfld, call, callvirt, newobj, ldftn, ldvirtftn and jmp are parameterized by member signature tokens. In order to use these instructions with generic type instantiations, the signature must be a member reference (not definition) in which the generic type instantiation is expressed as a TypeSpec. (Compare the use of pseudo methods for arrays where a TypeSpec is used to describe the particular array type).
In contrast to array signatures, though, the signature itself (argument and result types) are the formal signature of the field or method and hence may include generic parameters.
Format |
Assembly Format |
Description |
A3 <T> |
ldelem type |
Load the element at index onto the top of the stack |
..., array, index
..., value
The ldelem instruction loads the value of the element with index index (of type U) in the zero-based one-dimensional array array and places it on the top of the stack. The type of the return value is indicated by the type token type in the instruction.
If type refers to a typespec that includes generic parameters then these are first bound to the actual generic arguments of the enclosing generic type definition and/or generic method.
In other respects it has the same behaviour as the ldelem.* instruction corresponding to the generic type instantiation.
The ldelem instruction has the same behavior as the ldelema instruction followed by the (generalized) ldobj instruction, but can be implemented more efficiently.
NullReferenceException is thrown if array is null.
IndexOutOfRangeException is thrown if index is larger than the bound of array.
ArrayTypeMismatchException is thrown if array doesn't hold elements of the required type.
Correct IL code requires that array is either null or an array.
Format |
Assembly Format |
Description |
A4 <T> |
stelem type |
Replace array element at index with the value on the stack |
..., array, index, value
...,
The stelem instruction replaces the value of the element with zero-based index index (of type U) in the one-dimensional array array with value. Arrays are objects and hence represented by a value of type O. The value has the type specified by the token type in the instruction.
If type refers to a typespec that includes generic parameters then these are first bound to the actual generic arguments of the enclosing generic type definition and/or generic method.
In other respects it has the same behaviour as the stelem.* instruction corresponding to the generic type instantiation.
The stelem instruction has the same behaviour as the ldelema instruction followed by the (generalized) stobj instruction, but can be implemented more efficiently.
NullReferenceException is thrown if array is null.
IndexOutOfRangeException is thrown if index is larger than the bound of array.
ArrayTypeMismatchException is thrown if array doesn't hold elements of the required type.
Correct IL requires that array be a zero-based, one-dimensional array.
Format |
Assembly Format |
Description |
A5 <T> |
unbox.any type |
Extract the value type data from obj, its boxed representation |
..., obj
..., value
Type is a metadata token (a typeref, typedef, or typespec) indicating the type of value. If type refers to a typespec that includes generic parameters then these are first bound to the actual generic arguments of the enclosing generic type definition and/or generic method.
When applied to the boxed form of a value type, the unbox.any instruction extracts the value contained within obj and is therefore equivalent to unbox followed by ldobj.
When applied to a reference type, the unbox.any instruction has the same effect as castclass type.
InvalidCastException is thrown if obj is not a (boxed) type.
NullReferenceException is thrown if obj is null.
TypeLoadException is thrown if type cannot be found. This is typically detected when IL is converted to native code rather than at runtime.
box type
unbox type
If the type signature type contains generic parameters then these are bound to the corresponding actual generic arguments to the enclosing generic method and/or generic type definition.
The box instruction is generalized to work over reference types, in which case it behaves as a no-op. Verifiable IL requires that the argument of the box instruction has type O.
call method
callvirt method
In order to invoke a method on an object of a
generic type instantiation, method must be a member reference whose
parent is a TypeSpec that specifies the generic type instantiation of the
object.
[Rationale: whilst this instantiation information is not strictly
necessary, as it is always possible to determine the instantiated type of the
object on the stack, not all JITs do sufficiently precise type-tracking.]
When method refers to a generic method with n generic parameters, method must be a method-instantiation token that specifies the method and its generic parameters.
The binding of generic parameters must satisfy the constraints (if any) specified in the generic type definition or generic method.
call instance int32 class MyStack<int32>::Length() callvirt !0 class MyStack<class System.String>::Pop() call void ArrayUtils::Reverse<int64>(!0[]) call void class FastArray<int32>::PairWith<int64>(!0)
newobj method
In order to create an object of a generic type instantiation, method must be the member reference of a constructor whose parent is a TypeSpec that specifies the generic type instantiation.
The instantiation must satisfy the constraints (if any) specified in the generic type definition.
newobj instance void class MyStack<int32>::.ctor()
newarr type
If type refers to a typespec that includes generic parameters then these are first bound to the corresponding actual generic arguments of the enclosing generic type definition and/or generic method. This may involve run-time lookup.
ldelema type
If type refers to a typespec that includes generic parameters then these are first bound to the corresponding actual generic arguments of the enclosing generic type definition and/or generic method. This may involve run-time lookup.
castclass class
isinst class
If the type signature class contains generic parameters then these are bound to the corresponding actual generic arguments of the enclosing generic method and/or generic type definition. This may involve run-time lookup.
cpobj classTok
initobj classTok
ldobj classTok
ldobj classTok
classTok is permitted to refer to a class or interface type, in which case:
If classTok refers to a typespec that includes generic parameters then these are first bound to the corresponding actual generic arguments of the enclosing generic type definition and/or generic method.
ldfld field
ldflda field
stfld field
When field refers to an instance field in a generic type definition, it must be a member reference whose parent is a TypeSpec that specifies the generic type instantiation.
ldfld !0 value class MyPair<int32,class System.String>::fst
ldsfld field
ldsflda field
stsfld field
When field refers to a per-instantiation static field in a generic type definition, it must be a member reference whose parent is a TypeSpec that specifies the generic type instantiation.
Changes are marked in red.
<decl> ::=
...
| .class <classAttr>* <id>
[
<
<formalgenpars>
>
]
//
formal generic parameters
[extends
<typeSpec>]
//
can be a type instantiation
[implements <typeSpec> [,
<
typeSpec
>]*]
{ <classDecl>* }
<type> ::=
...
|
!
<int32> // class generic parameter accessed by number
|
!!
<int32> // method generic parameter accessed by number
| <type> < <genpars>
>
// generic type instantiation
<methodHead> ::=
<methAttr>* <callConv> <type> <methodName> [
<
<formalgenpars> >]
( <signature> ) <implAttr>*
<genpars>
::= <type> [, <genpars>]*
<formalgenpars> ::= <formalgenpar> [
,
<formalgenpars>]*
<formalgenpar> ::= <constraints> <id>
<constraints> ::=
<type>
|
(
<genpars> )
<instr_type> ::=
...
|
ldelem
|
stelem
|
unbox.any
// Class takes a single parameter of ref or value type .class public Stack<any T> { // !0 refers to the first (and only) generic parameter .field private !0[] items .field private int32 nitems .method public instance void .ctor() { .maxstack 10 ldarg.0 call instance void System.Object::.ctor() ldarg.0 ldc.i4 0 stfld int32 class Stack<!0>::nitems ldarg.0 ldc.i4 20 call instance void !0[]::.ctor(int32) stfld !0[] class Stack<!0>::items ret } // Signature for method has generic parameter as result type .method public !0 Pop() { .maxstack 10 .locals(int32) ldarg.0 ldfld int32 Stack::nitems dup brfalse EmptyStack ldc.i4 1 sub stloc.0 ldarg.0 ldloc.0 stfld int32 class Stack<!0>::nitems ldarg.0 ldfld !0[] class Stack<!0>::items ldloc.0 ldelem !0 ret EmptyStack: pop newobj instance void Empty::.ctor() throw } .method public void Push(!0) { .maxstack 10 .locals(!0[]) // Locals can be generic ldarg.0 ldfld !0[] class Stack<!0>::items callvirt int32 System.Array::GetLength() ldarg.0 ldfld class Stack<!0>::nitems bne.un NoExpand // code for array expansion ldarg.0 ldfld !0[] class Stack<!0>::items stloc.0 ldarg.0 ldfld int32 class Stack<!0>::nitems ldc.i4 2 mul call instance void !0[]::.ctor(int32) ldarg.0 stfld !0[] class Stack<!0>::items ldloc.0 ldarg.0 ldfld !0[] class Stack<!0>::items ldarg.0 ldfld class Stack<!0>::nitems call void System.Array::Copy(class System.Array, class System.Array, int32) NoExpand: ldarg.0 ldfld !0[] class Stack<!0>::items ldarg.0 ldfld int32 class Stack<!0>::nitems ldarg.1 stelem !0 ldarg.0 ldarg.0 ldfld int32 class Stack<!0>::nitems ldc.i4 1 add stfld int32 class Stack<!0>::nitems ret } .method public static void main() il managed implemented { .entrypoint .maxstack 10 // Locals can be generic type instantiations .locals(class Stack<class System.String>, class Stack<int64>) // Create a Stack<String> with two elements newobj instance void class Stack<class System.String>::.ctor() stloc.0 ldloc.0 ldstr "Rock!" // meth sig includes generic parameters callvirt void class Stack<class System.String>::Push(!0) ldloc.0 ldstr "Generics" callvirt void class Stack<class System.String>::Push(!0) // Pop the elements and print them ldloc.0 // notice generic parameters again callvirt !0 class Stack<class System.String>::Pop() call void System.Console::WriteLine(class System.String) ldloc.0 callvirt !0 class Stack<class System.String>::Pop() call void System.Console::WriteLine(class System.String) // Create a Stack<int64> with one element newobj instance void class Stack<int64>::.ctor() stloc.1 ldloc.1 ldc.i8 123456789 callvirt void class Stack<int64>::Push(!0) ... ret } }
// static void Reverse<T>(T[] arr, int index, int length) .method public static void Reverse<any>(!!0[] arr,int32 index,int32 length) { .maxstack 10 .locals( int32 i, // int i; int32 j, // int j; !!0 temp) // T temp; ldarg index // i = index; stloc i ldarg index ldarg length // j = index + length - 1; add ldc.i4 1 sub stloc j Loop: ldloc i // while (i < j) { ldloc j bge Finish ldarg arr // temp = arr[i]; ldloc i ldelem !!0 stloc temp ldarg arr // arr[i] = arr[j]; ldloc i ldarg arr ldloc j ldelem !!0 stelem !!0 ldarg arr // arr[j] = temp; ldloc j ldloc temp stelem !!0 ldloc i // i++; ldc.i4 1 add stloc i ldloc j // j--; ldc.i4 1 sub stloc j br Loop // } Finish: ret }