Roslyn code base – performance lessons (part 2)

In my previous post, I talked about some of the general performance lessons that can be learnt from the Roslyn project. This post builds on that and looks at specific examples from the code base.

Generally the performance gains within Roslyn come down to one thing:

Ensuring the garbage collector does the least possible amount of work

.NET is a managed language and one of the features that it provides is memory management, via the garbage collector (GC). However GC doesn’t come for free, it has to find and inspect all the live objects (and their descendants) in the “mark” phrase, before cleaning up any dead objects in the “sweep” phase.

This is backed up by the guidance provided for contributing to Roslyn, from the Coding Conventions section:

  • Avoid allocations in compiler hot paths:
    • Avoid LINQ.
    • Avoid using foreach over collections that do not have a struct enumerator.
    • Consider using an object pool. There are many usages of object pools in the compiler to see an example.

It’s interesting to see LINQ specifically called out, I think it’s great and it does allow you to write much more declarative and readable code, in fact I’d find it hard to write C# code without it. But behind the scenes there are lots of hidden allocations going on and they are not always obvious. If you don’t believe me, have a go at Joe Duffy’s quiz (about 1/2 way through the blog post).

Techniques used

There are several techniques used in the Roslyn code base that either minimise or eliminate allocations, thus giving the GC less work to do. One important characteristic all of them share is that they are only applied to “Hot Paths” within the code. Optimising prematurely is never recommended, nor is using optimisations on parts of your code that are rarely exercised. You need to measure and identify the bottlenecks and understand what are the hot-paths through your code, before you apply any optimisations.

Avoiding allocations altogether

Within the .NET framework there are many methods that cause allocations, for instance String.Trim(..) or any LINQ methods. To combat this we can find several examples where code was specifically re-written, for example:

Another good lesson is that each improvement is annotated with a “// PERF:” comment to explain the reasoning, I guess this is to prevent another developer coming along and re-factoring the code to something more readable (at the expense of performance).

Object pooling with a Cache

Another strategy used is object pooling where rather than newing up objects each time, old ones are re-used. Again this helps relieve pressure on the GC as less objects are allocated and the ones that are, stick around for a while (often the life-time of the program). This is a sweet-spot for the .NET GC, as per the advice from Rico Mariani’s excellent Garbage Collector Basics and Performance Hints:

Too Many Almost-Long-Life Objects
Finally, perhaps the biggest pitfall of the generational garbage collector is the creation of many objects, which are neither exactly temporary nor are they exactly long-lived. These objects can cause a lot of trouble, because they will not be cleaned up by a gen0 collection (the cheapest), as they will still be necessary, and they might even survive a gen1 collection because they are still in use, but they soon die after that.

We can see how this was handled in Roslyn in the code below from StringBuilderPool, that makes use of the more generic ObjectPool infrastructure and helper classes. Obviously it was such a widely used pattern that they build a generic class to handle the bulk of the work, making it easy to write an implementation for a specific type, including StringBuilder, Dictionary, HashSet and Stream.

internal static class StringBuilderPool
{
  public static StringBuilder Allocate()
  {
    return SharedPools.Default<StringBuilder>().AllocateAndClear();
  }

  public static void Free(StringBuilder builder)
  {
    SharedPools.Default<StringBuilder>().ClearAndFree(builder);
  }

  public static string ReturnAndFree(StringBuilder builder)
  {
    SharedPools.Default<StringBuilder>().ForgetTrackedObject(builder);
    return builder.ToString();
  }
}

Having a class like this makes sense, a large part of compiling is parsing and building strings. Not only do they use a StringBuilder to save lots of temporary String allocations, but they also re-use StringBuilder objects to save the GC the work of having to clean up these.

Interestingly enough this technique has also been used inside the .NET framework itself, you can see this in the code below from StringBuilderCache.cs. Again, the comment shows that the optimisation was debated and a trade-off between memory usage and efficiency was weighed up.

internal static class StringBuilderCache
{
  // The value 360 was chosen in discussion with performance experts as a compromise between using
  // as little memory (per thread) as possible and still covering a large part of short-lived
  // StringBuilder creations on the startup path of VS designers.
  private const int MAX_BUILDER_SIZE = 360;

  [ThreadStatic]
  private static StringBuilder CachedInstance;

  public static StringBuilder Acquire(int capacity = StringBuilder.DefaultCapacity)
  {
    if(capacity <= MAX_BUILDER_SIZE)
    {
      StringBuilder sb = StringBuilderCache.CachedInstance;
      if (sb != null)
      {
        // Avoid stringbuilder block fragmentation by getting a new StringBuilder
        // when the requested size is larger than the current capacity
        if (capacity <= sb.Capacity)
        {
          StringBuilderCache.CachedInstance = null;
          sb.Clear();
          return sb;
        }
      }
    }
    return new StringBuilder(capacity);
  }

  public static void Release(StringBuilder sb)
  {
    if (sb.Capacity <= MAX_BUILDER_SIZE)
    {
      StringBuilderCache.CachedInstance = sb;
    }
  }

  public static string GetStringAndRelease(StringBuilder sb)
  {
    string result = sb.ToString();
    Release(sb);
    return result;
  }
}

Which you then use like this:

var builder = StringBuilderCache.Acquire();
// use the builder as normal, i.e. builder.Append(..)
string data = StringBuilderCache.GetStringAndRelease(builder);

Specialised Collections

Finally there are several examples where custom collections were written to ensure that excessive memory overhead wasn’t created. For instance in the code below from PENamesTypeSymbol.cs, you can clearly see that specific collections are re-used whenever there are 0, 1 or up-to 6 items.
The comment clearly spells out the trade-off, so whilst these collections aren’t as efficient when doing lookups (O(n) v O(log n)), they are more efficient in terms of space and so the trade-off is worth it. It’s also interesting to note that the size of 6 wasn’t chose randomly, in their tests they found that 50% of the time there were 6 items or fewer, so these optimisations will give a performance gain in the majority of scenarios.

private static ICollection<string> CreateReadOnlyMemberNames(HashSet<string> names)
{
  switch (names.Count)
  {
    case 0:
      return SpecializedCollections.EmptySet<string>();
    case 1:
      return SpecializedCollections.SingletonCollection(names.First());
    case 2:
    case 3:
    case 4:
    case 5:
    case 6:
      // PERF: Small collections can be implemented as ImmutableArray.
      // While lookup is O(n), when n is small, the memory savings are more valuable.
      // Size 6 was chosen because that represented 50% of the names generated in the Picasso end to end.
      // This causes boxing, but that's still superior to a wrapped HashSet
      return ImmutableArray.CreateRange(names);
    default:
      return SpecializedCollections.ReadOnlySet(names);
  }
}

Summary

All in all there are some really nice tricks and examples of high-performance code to be found in the Roslyn code base. But the main lesson is that you should never be applying these for the sake of it or because they look clever. They should only be used in conjunction with proper performance testing that identifies the parts of your code that cause it to run slower than your performance goals.

Interestingly enough StackOverflow faced a similar issue a few years back, see In managed code we trust, our recent battles with the .NET Garbage Collector, but that’s a subject for another post, stay tuned!


Update: Since first writing this post, I’ve found out about an excellent talk Essential Truths Everyone Should Know about Performance in a Large Managed Codebase, in which Dustin Campbell (a Roslyn Program Manager), talks about how they improved the performance of Roslyn. I can’t recommend it enough.

About these ads

8 thoughts on “Roslyn code base – performance lessons (part 2)

  1. Pov of just another application programmer:
    From Dustin Campbell’s examples, seems like C# optimization is about coding as if we were using C or C++.
    1. Understand what your source code compiles to (the boxing and TrimStart examples)
    2. Manage memory objects with a certain lifetime (not gen0, not long lived).
    3. Use object pools where possible.
    4. Know that the garbage collector exists, and that memory allocation/freeing is not magic. Lend it a helping hand wherever possible.
    5. Identify read-only objects and reduce the times they are created.
    etc.

  2. @sgdevs Yeah it does seem that at some point you need to undo that work that the managed run-time does or at least fight against it. But I think that overall you still get a benefit by having a managed run-time and GC.

  3. @sgdevs When performance matter you have to understand how things are working inside. It’s even more true with .NET as you get the framework which hides a lot of complexity to save you time. But you have to learn how to work with the Garbage Collector and not against it.
    In a WPF project recently, devs were complaining about performance and had no clue about how the Garbage Collector was working, memory allocation in .NET, etc… Just by explaining how it worked, they realised they were keeping references to objects because of registered events. They never took the time when they switch from VB6 to .NET to actually learn .NET. This is now a real pain to fix all the performance issues because best practices were not known.
    When tackling a new technology it’s always important to understand how things are going if you have high expectations and want to deliver high quality code.

  4. // PERF: Avoid calling string.Trim() because that allocates a new substring
    That is not true!
    String.Trim() does not always returns a new string: Take Reflector and look at the the implementation:
    [SecurityCritical]
    private string CreateTrimmedString(int start, int end) {
    int length = (end – start) + 1;
    if (length == this.Length) {
    return this; //No new string is allocated!
    }
    if (length == 0) return Empty;
    return InternalSubString(start, length, false);
    }
    But what is true is that it allocates a lot of Int32 and even shares the same method for TrimStart, TrimEnd and Trim, using IFs to differenciate between what to trim. Surely not fully optimized.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s