Subverting .NET Type Safety with 'System.Runtime.CompilerServices.Unsafe'

14 Sep 2016 - 2169 words

In which we use `System.Runtime.CompilerServices.Unsafe` a generic API (“type-safe” but still “unsafe”) and mess with the C# Type System!

The post covers the following topics:

What it is and why it’s useful
How it works
Code samples
Tricks you can do with it
Using it safely

What it is and why it’s useful

The XML documentation comments for System.Runtime.CompilerServices.Unsafe state that it:

Contains generic, low-level functionality for manipulating pointers.

But we can get a better understanding of what it is by looking at the actual API definition from the current NuGet package (4.0.0):

// Contains generic, low-level functionality for manipulating pointers.
public static class Unsafe
{
    // Casts the given object to the specified type.
    public static T As<T>(object o) where T : class

    // Returns a pointer to the given by-ref parameter.    
    public static void* AsPointer<T>(ref T value);

    // Copies a value of type T to the given location.    
    public static void Copy<T>(void* destination, ref T source);

    // Copies a value of type T to the given location.
    public static void Copy<T>(ref T destination, void* source);

    // Copies bytes from the source address to the destination address.
    public static void CopyBlock(void* destination, void* source, uint byteCount);

    // Initializes a block of memory at the given location with a given initial value.    
    public static void InitBlock(void* startAddress, byte value, uint byteCount);

    // Reads a value of type T from the given location.
    public static T Read<T>(void* source);
    
    // Returns the size of an object of the given type parameter.    
    public static int SizeOf<T>();

    // Writes a value of type T to the given location.
    public static void Write<T>(void* destination, T value);
}

Note: I edited the the XML doc-comments for brevity, the full versions are available in the source. There are also some additional methods that have been added to the API, but to make use of them you have to use a version of the C# compiler with support for ref returns and locals.

However this doesn’t really tell us why it’s useful, to get some background on that we can look at the GitHub issue “Provide a generic API to read from and write to a pointer”:

So at a high-level the goals of the System.Runtime.CompilerServices.Unsafe library are to:

Provide a safer way of writing low-level unsafe code
- Without this library you have to resort to fixed and pointer manipulation, which can be error prone
Allow access to functionality that can’t be expressed in C#, but is possible in IL
- For instance Unsafe.Sizeof<T>() allows access to the Sizeof IL Opcode
Save developers from having to repeatedly write the same unsafe code
- There are already code-bases making use of it, including the Kestrel the high-performance web server, based on libuv.

It’s also worth pointing out that the library is primarily for use with a Value Type (int, float, etc) rather than a class or Reference type. You can use it with classes, however you have to pin them first, so they don’t move about in memory whilst you are working with the pointer.

Update: It was pointed out to me that Niels wrote an initial implementation of this library in a separate project, before Microsoft made their own version.

How it works

Because the library allows access to functionality that can’t be expressed in C#, it has to be written in raw IL, which is then compiled by a custom build-step. As an example we will look at the AsPointer method, which has the following signature:

public static void* AsPointer<T>(ref T value)

The IL for this is shown below, note how the ref keyword becomes & in IL and <T> is expressed as !!T:

.method public hidebysig static void* AsPointer<T>(!!T& 'value') cil managed aggressiveinlining
{
    .custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
    .maxstack 1
    ldarg.0
    conv.u
    ret
} // end of method Unsafe::AsPointer

Here we can see that it’s making use of the conv.u IL instruction. For reference the explanation of this, along with some of the other op codes used by the library are shown below:

Conv_U - Converts the value on top of the evaluation stack to unsigned native int, and extends it to native int.
Ldobj - Copies the value type object pointed to by an address to the top of the evaluation stack.
Stobj - Copies a value of a specified type from the evaluation stack into a supplied memory address.

After searching around I found several other places in the .NET Runtime that make use of raw IL in this way:

Code samples

There’s a nice set of unit tests that show the main use-cases for the library, for instance here is how to use Unsafe.Write(..) to directly change the value of an int via a pointer.

[Fact]
public static unsafe void WriteInt32()
{
    int value = 10;
    int* address = (int*)Unsafe.AsPointer(ref value);
    int expected = 20;
    Unsafe.Write(address, expected);

    Assert.Equal(expected, value);
    Assert.Equal(expected, *address);
    Assert.Equal(expected, Unsafe.Read<int>(address));
}

You can write something similar by manipulating pointers directly, but it’s not as straightforward (unless you are familiar with C or C++)

int value = 10;
int* ptr = &value;
*ptr = 30;
Console.WriteLine(value); // prints "30"

For a more real-world use case, the code below shows how you can access a KeyValuePair<DateTime, decimal> directly as a byte [] (taken from a GitHub discussion):

var dt = new KeyValuePair<DateTime, decimal>[2];
ref byte asRefByte = ref Unsafe.As<KeyValuePair<DateTime, decimal>, byte>(ref dt[0]);
fixed (byte * ptr = &asRefByte)
{
    // Treat the KeyValuePair<DateTime, decimal> as if it were a byte []
    ...
}

(this example is based on the StackOverflow question: “Get unsafe pointer to array of KeyValuePair<DateTime,decimal> in C#”)

Tricks you can do with it

Despite providing you with a nice strongly-typed API, you still have to mark your code as unsafe, which it’s a bit of a give-away that you can use it to do things that normal C# can’t!

Breaking immutability

Strings in C# are immutable and the runtime goes to great lengths to ensure you can’t bypass this behaviour. However under-the-hood the String data is just bytes which can be manipulated, indeed the runtime does this manipulation itself inside the StringBuilder class.

So using Unsafe.Write(..) we can modify the contents of a String - yay!! However it needs to be pointed out that this code will potentially break the behaviour of the String class in many subtle ways, so don’t ever use it in a real application!!

var text = "ABCDEFGHIJKLMNOPQRSTUVWXKZ";

Console.WriteLine("String Length {0}", text.Length); // prints 26
Console.WriteLine("Text: \"{0}\"", text); // "ABCDEFGHIJKLMNOPQRSTUVWXKZ"

var pinnedText = GCHandle.Alloc(text, GCHandleType.Pinned);
char* textAddress = (char*)pinnedText.AddrOfPinnedObject().ToPointer();

// Make an immutable string think that it is shorter than it actually is!!!
Unsafe.Write(textAddress - 2, 5);

Console.WriteLine("String Length {0}", text.Length); // prints 5
Console.WriteLine("Text: \"{0}\"", text); // prints "ABCDE

// change the 2nd character 'B' to '@'
Unsafe.Write(textAddress + 1, '@');

Console.WriteLine("Text: \"{0}\"", text); // prints "A@CDE

pinnedText.Free();

Messing with the CLR type-system

But we can go even further than that and do a really nasty trick to completely defeat the CLR type-system. This code is horrible and could potentially break the CLR in several ways, so as before don’t ever use it in a real application!!

int intValue = 5;
float floatValue = 5.0f;
object boxedInt = (object)intValue, boxedFloat = (object)floatValue;

var pinnedFloat = GCHandle.Alloc(boxedFloat, GCHandleType.Pinned);
var pinnedInt = GCHandle.Alloc(boxedInt, GCHandleType.Pinned);

int* floatAddress = (int*)pinnedFloat.AddrOfPinnedObject().ToPointer();
int* intAddress = (int*)pinnedInt.AddrOfPinnedObject().ToPointer();

Console.WriteLine("Type: {0}, Value: {1}", boxedInt.GetType().FullName, boxedInt);

// Make an int think it's a float!!!
int floatType = Unsafe.Read<int>(floatAddress - 1);
Unsafe.Write(intAddress - 1, floatType);

Console.WriteLine("Type: {0}, Value: {1}", boxedInt.GetType().FullName, boxedInt);

pinnedFloat.Free();
pinnedInt.Free();

Which prints out:

Type: System.Int32, Value: 5

Type: System.Single, Value: 7.006492E-45

Yep, we’ve managed to convince a int (Int32) type that it’s actually a float (Single) and behave like one instead!!

This works by overwriting the Method Table pointer for the int, with the same value as the float one. So when it looks up it’s type or prints out it’s value, it uses the float methods instead! Thanks to @Porges for the example that motivated this, his code does the same thing using fixed instead.

Using it safely

Despite the library requiring you to annotate your code with unsafe, there are still some safe or maybe more accurately safer ways to use it!

Fortunately one of the main .NET runtime developers provided a nice list of what you can and can’t do:

But as with all unsafe code, you’re asking the runtime to let you do things that you are normally prevented from doing, things that it normally saves you from, so you have to be careful!

Discuss this post in /r/csharp or /r/programming