How generics were added to .NET

Discuss this post on HackerNews and /r/programming


Before we dive into the technical details, let’s start with a quick history lesson, courtesy of Don Syme who worked on adding generics to .NET and then went on to design and implement F#, which is a pretty impressive set of achievements!!

Background and History

Update: Don Syme, pointed out another research paper related to .NET generics, Combining Generics, Precompilation and Sharing Between Software Based Processes (pdf)

To give you an idea of how these events fit into the bigger picture, here are the dates of .NET Framework Releases, up-to 2.0 which was the first version to have generics:

Version number CLR version Release date
1.0 1.0 2002-02-13
1.1 1.1 2003-04-24
2.0 2.0 2005-11-07

Aside from the historical perspective, what I find most fascinating is just how much the addition of generics in .NET was due to the work done by Microsoft Research, from .NET/C# Generics History:

It was only through the total dedication of Microsoft Research, Cambridge during 1998-2004, to doing a complete, high quality implementation in both the CLR (including NGEN, debugging, JIT, AppDomains, concurrent loading and many other aspects), and the C# compiler, that the project proceeded.

He then goes on to say:

What would the cost of inaction have been? What would the cost of failure have been? No generics in C# 2.0? No LINQ in C# 3.0? No TPL in C# 4.0? No Async in C# 5.0? No F#? Ultimately, an erasure model of generics would have been adopted, as for Java, since the CLR team would never have pursued a in-the-VM generics design without external help.

Wow, C# and .NET would look very different without all these features!!

The ‘Gyro’ Project - Generics for Rotor

Unfortunately there doesn’t exist a publicly accessible version of the .NET 1.0 and 2.0 source code, so we can’t go back and look at the changes that were made (if I’m wrong, please let me know as I’d love to read it).

However, we do have the next best thing, the ‘Gyro’ project in which the equivalent changes were made to the ‘Shared Source Common Language Implementation’ (SSCLI) code base (a.k.a ‘Rotor’). As an aside, if you want to learn more about the Rotor code base I really recommend the excellent book by Ted Neward, which you can download from his blog.

Gyro 1.0 was released in 2003 which implies that is was created after the work has been done in the real .NET Framework source code, I assume that Microsoft Research wanted to publish the ‘Rotor’ implementation so it could be studied more widely. Gyro is also referenced in one Don Syme’s posts, from Some History: 2001 “GC#” research project draft, from the MSR Cambridge team:

With Dave Berry’s help we later published a version of the corresponding code as the “Gyro” variant of the “Rotor” CLI implementation.

The rest of this post will look at how generics were implemented in the Rotor source code.

Note: There are some significant differences between the Rotor source code and the real .NET framework. Most notably the JIT and GC are completely different implementations (due to licensing issues, listen to DotNetRocks show 360 - Ted Neward and Joel Pobar on Rotor 2.0 for more info). However, the Rotor source does give us an accurate idea about how other core parts of the CLR are implemented, such as the Type-System, Debugger, AppDomains and the VM itself. It’s interesting to compare the Rotor source with the current CoreCLR source and see how much of the source code layout and class names have remained the same.


Implementation

To make things easier for anyone who wants to follow-along, I created a GitHub repo that contains the Rotor code for .NET 1.0 and then checked in the Gyro source code on top, which means that you can see all the changes in one place:

Gyro changes to implement generics

The first thing you notice in the Gyro source is that all the files contain this particular piece of legalese:

 ;    By using this software in any fashion, you are agreeing to be bound by the
 ;    terms of this license.
 ;   
+;    This file contains modifications of the base SSCLI software to support generic
+;    type definitions and generic methods. These modifications are for research
+;    purposes. They do not commit Microsoft to the future support of these or
+;    any similar changes to the SSCLI or the .NET product. -- 31st October, 2002.
+;   
 ;    You must not remove this notice, or any other, from this software.

It’s funny that they needed to add the line ‘They do not commit Microsoft to the future support of these or any similar changes to the SSCLI or the .NET product’, even though they were just a few months away from doing just that!!

Components (Directories) with the most changes

To see where the work was done, lets start with a high-level view, showing the directories with a significant amount of changes (> 1% of the total changes):

$ git diff --dirstat=lines,1 464bf98 2714cca
   0.1% bcl/
  14.4% csharp/csharp/sccomp/
   9.1% debug/di/
  11.9% debug/ee/
   2.1% debug/inc/
   1.9% debug/shell/
   2.5% fjit/
  21.1% ilasm/
   1.5% ildasm/
   1.2% inc/
   1.4% md/compiler/
  29.9% vm/

Note: fjit is the “Fast JIT” compiler, i.e the version released with Rotor, which was significantly different to one available in the full .NET framework.

The full output from git diff --dirstat=lines,0 is available here and the output from git diff --stat is here.

0.1% bcl/ is included only to show that very little C# code changes were needed, these were mostly plumbing code to expose the underlying C++ methods and changes to the various ToString() methods to include generic type information, e.g. ‘Class[int,double]’. However there are 2 more significant ones:

  • bcl/system/reflection/emit/opcodes.cs (diff)
  • bcl/system/reflection/emit/signaturehelper.cs (diff)
    • Add the ability to parse method metadata that contains generic related information, such as methods with generic parameters.

Files with the most changes

Next, we’ll take a look at the specific classes/files that had the most changes as this gives us a really good idea about where the complexity was

Added Deleted Total Changes File (click to go directly to the diff)
1794 323 1471 debug/di/module.cpp
1418 337 1081 vm/class.cpp
1335 308 1027 vm/jitinterface.cpp
1616 888 728 debug/ee/debugger.cpp
741 46 695 csharp/csharp/sccomp/symmgr.cpp
693 0 693 vm/genmeth.cpp
999 362 637 csharp/csharp/sccomp/clsdrec.cpp
926 321 605 csharp/csharp/sccomp/fncbind.cpp
559 0 559 vm/typeparse.cpp
605 156 449 vm/siginfo.cpp
417 29 388 vm/method.hpp
642 255 387 fjit/fjit.cpp
379 0 379 vm/jitinterfacegen.cpp
3045 2672 373 ilasm/parseasm.cpp
465 94 371 vm/class.h
515 163 352 debug/inc/cordb.h
339 0 339 vm/generics.cpp
733 418 315 csharp/csharp/sccomp/parser.cpp
471 169 302 debug/shell/dshell.cpp
382 88 294 csharp/csharp/sccomp/import.cpp

Components of the Runtime

Now we’ll look at individual components in more detail so we can get an idea of how different parts of the runtime had to change to accommodate generics.

Type System changes

Not surprisingly the bulk of the changes are in the Virtual Machine (VM) component of the CLR and related to the ‘Type System’. Obviously adding ‘parameterised types’ to a type system that didn’t already have them requires wide-ranging and significant changes, which are shown in the list below:

  • vm/class.cpp (diff )
    • Allow the type system to distinguish between open and closed generic types and provide APIs to allow working them, such as IsGenericVariable() and GetGenericTypeDefinition()
  • vm/genmeth.cpp (diff)
    • Contains the bulk of the functionality to make ‘generic methods’ possible, i.e. MyMethod<T, U>(T item, U filter), including to work done to enable ‘shared instantiation’ of generic methods
  • vm/typeparse.cpp (diff)
    • Changes needed to allow generic types to be looked-up by name, i.e. ‘MyClass[System.Int32]
  • vm/siginfo.cpp (diff)
    • Adds the ability to work with ‘generic-related’ method signatures
  • vm/method.hpp (diff) and vm/method.cpp (diff)
    • Provides the runtime with generic related methods such as IsGenericMethodDefinition(), GetNumGenericMethodArgs() and GetNumGenericClassArgs()
  • vm/generics.cpp (diff)
    • All the completely new ‘generics’ specific code is in here, mostly related to ‘shared instantiation’ which is explained below

Bytecode or ‘Intermediate Language’ (IL) changes

The main place that the implementation of generics in the CLR differs from the JVM is that they are ‘fully reified’ instead of using ‘type erasure’, this was possible because the CLR designers were willing to break backwards compatibility, whereas the JVM had been around longer so I assume that this was a much less appealing option. For more discussion on this issue see Erasure vs reification and Reified Generics for Java. Update: this HackerNews discussion is also worth a read.

The specific changes made to the .NET Intermediate Language (IL) op-codes can be seen in the inc/opcode.def (diff), in essence the following 3 instructions were added

In addition the IL Assembler tool (ILASM) needed significant changes as well as it’s counter part `IL Disassembler (ILDASM) so it could handle the additional instructions.

There is also a whole section titled ‘Support for Polymorphism in IL’ that explains these changes in greater detail in Design and Implementation of Generics for the .NET Common Language Runtime

Shared Instantiations

From Design and Implementation of Generics for the .NET Common Language Runtime

Two instantiations are compatible if for any parameterized class its compilation at these instantiations gives rise to identical code and other execution structures (e.g. field layout and GC tables), apart from the dictionaries described below in Section 4.4. In particular, all reference types are compatible with each other, because the loader and JIT compiler make no distinction for the purposes of field layout or code generation. On the implementation for the Intel x86, at least, primitive types are mutually incompatible, even if they have the same size (floats and ints have different parameter passing conventions). That leaves user-defined struct types, which are compatible if their layout is the same with respect to garbage collection i.e. they share the same pattern of traced pointers

From a comment with more info:

// For an generic type instance return the representative within the class of
// all type handles that share code.  For example, 
//    <int> --> <int>,
//    <object> --> <object>,
//    <string> --> <object>,
//    <List<string>> --> <object>,
//    <Struct<string>> --> <Struct<object>>
//
// If the code for the type handle is not shared then return 
// the type handle itself.

In addition, this comment explains the work that needs to take place to allow shared instantiations when working with generic methods.

Update: If you want more info on the ‘code-sharing’ that takes places, I recommend reading these 4 posts:

Compiler and JIT Changes

If seems like almost every part of the compiler had to change to accommodate generics, which is not surprising given that they touch so many parts of the code we write, Types, Classes and Methods. Some of the biggest changes were:

  • csharp/csharp/sccomp/clsdrec.cpp - +999 -363 - (diff)
  • csharp/csharp/sccomp/emitter.cpp - +347 -127 - (diff)
  • csharp/csharp/sccomp/fncbind.cpp - +926 -321 - (diff)
  • csharp/csharp/sccomp/import.cpp - +382 - 88 - (diff)
  • csharp/csharp/sccomp/parser.cpp - +733 -418 - (diff)
  • csharp/csharp/sccomp/symmgr.cpp - +741 -46 - (diff)

In the ‘just-in-time’ (JIT) compiler extra work was needed because it’s responsible for implementing the additional ‘IL Instructions’. The bulk of these changes took place in fjit.cpp (diff) and fjitdef.h (diff).

Finally, a large amount of work was done in vm/jitinterface.cpp (diff) to enable the JIT to access the extra information it needed to emit code for generic methods.

Debugger Changes

Last, but by no means least, a significant amount of work was done to ensure that the debugger could understand and inspect generics types. It goes to show just how much inside information a debugger needs to have of the type system in an managed language.

  • debug/ee/debugger.cpp (diff)
  • debug/ee/debugger.h (diff)
  • debug/di/module.cpp (diff)
  • debug/di/rsthread.cpp (diff)
  • debug/shell/dshell.cpp (diff)

Further Reading

If you want even more information about generics in .NET, there are also some very useful design docs available (included in the Gyro source code download):

Also Pre-compilation for .NET Generics by Andrew Kennedy & Don Syme (pdf) is an interesting read