Open Source .NET – 1 year later08 Dec 2015 - 1225 words
A little over a year ago Microsoft announced that they were open sourcing large parts of the .NET framework. At the time Scott Hanselman did a nice analysis of the source, using Microsoft Power BI. Inspired by this and now that a year has passed, I wanted to try and answer the question:
How much Community involvement has there been since Microsoft open sourced large parts of the .NET framework?
I will be looking at the 3 following projects, as they are all highly significant parts of the .NET ecosystem and are also some of the most active/starred/forked projects within the .NET Foundation:
- Roslyn - The .NET Compiler Platform (“Roslyn”) provides open-source C# and Visual Basic compilers with rich code analysis APIs.
- CoreCLR - the .NET Core runtime, called CoreCLR, and the base library, called mscorlib. It includes the garbage collector, JIT compiler, base .NET data types and many low-level classes.
- CoreFX the .NET Core foundational libraries, called CoreFX. It includes classes for collections, file systems, console, XML, async and many others.
GitHub itself has some nice graphs built-in, for instance you can see the Commits per Month over an entire year:
Also you can get a nice dashboard showing the Monthly Pulse
However to answer the question above, I needed more data. Fortunately GitHub provides a really comprehensive API, which combined with the excellent Octokit.net library and the brilliant LINQPad, meant I was able to easily get all the data I needed. Here’s a sample LINQPad script if you want to start playing around with the API yourself.
However, knowing the “# of Issues” or “Merged Pull Requests” per/month on it’s own isn’t that useful, it doesn’t tell us anything about who created the issue or submitted the PR. Fortunately GitHub classifies users into categories, for instance in the image below from Roslyn Issue #670 we can see what type of user posted each comment, an “Owner”, “Collaborator” or blank which signifies a “Community” member, i.e. someone who (AFAICT) doesn’t work at Microsoft.
So now that we can get the data we need, what results do we get.
Here you can see that the Owners and Collaborators do in some cases dominate, e.g. in Roslyn where almost 60% of the issues were opened by them. But in other cases the Community is very active, especially in CoreCLR where Community members are opening more issues than Owners/Collaborators combined. Part of the reason for this is the nature of the different repositories, CoreCLR is the most visible part of the .NET framework as it encompasses most of the libraries that .NET developers would use on a day-to-day basis, so it’s not surprising that the Community has lots of suggestions for improvements or bug fixes. In addition, the CoreCLR has been around for a much longer time and so the Community has had more time to use it and find out the parts it doesn’t like. Whereas Roslyn is a much newer project so there has been less time to use it, plus finding bugs in a compiler is by its nature harder to do.
However if we look at Merged Pull Requests, we can see that that the overall amount of Community contributions across the 3 projects is much lower, only accounting for roughly 12%. This however isn’t that surprising, there’s a much higher bar for getting a pull request accepted. Firstly, if the project is using this mechanism, you have to pick an issue that is “up for grabs”, then you have to get any API changes through a review, then finally you have to meet any comparability/performance/correctness issues that come up during the code review itself. So actually 12% is a pretty good result as there is a non–trivial amount of work involved in getting your PR merged, especially considering most Community members will be working in their spare time.
Update: I was wrong about the “up for grabs” requirement, see this comment from David Kean and this tweet for more information. “Up for grabs” is a guideline and meant to help new users, but it is not a requirement, you can submit PRs for issues that don’t have that label.
Finally if you look at the amount per/month (see the 2 graphs below, click for larger images), it’s hard to pick up any definite trends or say if the Community is definitely contributing more or less over time. But you can say that over a year the Community has consistently contributed and it doesn’t look like that contribution is going to end. It is not just an initial burst that only happened straight after the projects were open sourced, it is a sustained level of contributions over an entire year.
The last thing that I want to do whilst I have the data is to take a look at the most popular Issue Labels and see what they tell us about the type of work that has been going on since the 3 projects were open sourced.
Here are a few observations about the results:
- Having CodeGen so high on the list is not that surprising considering that RyuJIT - the next-gen .NET JIT Compiler was only released 2 years ago. However, it’s a bit worrying that were so many issues, especially considering that some of them have severe consequences as the devs at Stack Overflow found out! (On a related note, if you want to find out lots of low-level details about what the JIT does, just take a look at all the issues that @MikeDN has commented on, unbelievably for someone with that much knowledge he doesn’t actually work on the product itself, or even another team at Microsoft!!)
- It’s nice to see that all 3 projects have a lots of “Up for Grabs” issues, see Roslyn, CoreCLR and CoreFX, plus the Community seems to be grabbing them back!
- Finally, I love the fact that Performance and Optimisation are being taken seriously, after all Performance is a Feature!!