"Stack Walking" in the .NET Runtime

21 Jan 2019 - 6375 words

What is ‘stack walking’, well as always the ‘Book of the Runtime’ (BotR) helps us, from the relevant page:

The CLR makes heavy use of a technique known as stack walking (or stack crawling). This involves iterating the sequence of call frames for a particular thread, from the most recent (the thread’s current function) back down to the base of the stack.

The runtime uses stack walks for a number of purposes:

The runtime walks the stacks of all threads during garbage collection, looking for managed roots (local variables holding object references in the frames of managed methods that need to be reported to the GC to keep the objects alive and possibly track their movement if the GC decides to compact the heap).

On some platforms the stack walker is used during the processing of exceptions (looking for handlers in the first pass and unwinding the stack in the second).

The debugger uses the functionality when generating managed stack traces.

Various miscellaneous methods, usually those close to some public managed API, perform a stack walk to pick up information about their caller (such as the method, class or assembly of that caller).

The rest of this post will explore what ‘Stack Walking’ is, how it works and why so many parts of the runtime need to be involved.

Table of Contents

Where does the CLR use ‘Stack Walking’?
The ‘Stack Walking’ API
Unwinding ‘Native’ Code
Unwinding ‘JITted’ Code
- Help from the ‘JIT Compiler’
Further Reading
- Stack Unwinding (general)
- Stack Unwinding (other runtimes)

Where does the CLR use ‘Stack Walking’?

Before we dig into the ‘internals’, let’s take a look at where the runtime utilises ‘stack walking’, below is the full list (as of .NET Core CLR ‘Release 2.2’). All these examples end up calling into the Thread::StackWalkFrames(..) method here and provide a callback that is triggered whenever the API encounters a new section of the stack (see How to use it below for more info).

Common Scenarios

Garbage Collection (GC)
- ScanStackRoots(..) here -> callback
Exception Handling (unwinding)
- x86 - UnwindFrames(..) here -> callback
- x64 - ResetThreadAbortState(..) here -> callback
Exception Handling (resumption):
- ExceptionTracker::FindNonvolatileRegisterPointers(..) here -> callback
- ExceptionTracker::RareFindParentStackFrame(..) here -> callback
Threads:
- Thread::IsRunningIn(..) (AppDomain) here -> callback
- Thread::DetectHandleILStubsForDebugger(..) here -> callback
Thread Suspension:
- Thread::IsExecutingWithinCer() (‘Constrained Execution Region’) here (wrapper and callback)
- Thread::HandledJITCase(..) here -> callback, alternative callback

Debugging/Diagnostics

Debugger
- DebuggerWalkStack(..) here -> callback
- DebuggerWalkStackProc() here (called from DebuggerWalkStack(..)) -> callback
Managed APIs (e.g System.Diagnostics.StackTrace)
- Managed code calls via an InternalCall (C#) here into DebugStackTrace::GetStackFramesInternal(..) (C++) here
- Before ending up in DebugStackTrace::GetStackFramesHelper(..) here -> callback
DAC (via by SOS) - Scan for GC ‘Roots’
- DacStackReferenceWalker::WalkStack<..>(..) here -> callback
Profiling API
- ProfToEEInterfaceImpl::ProfilerStackWalkFramesWrapper(..) here -> callback
Event Pipe (Diagnostics)
- EventPipe::WalkManagedStackForThread(..) here -> callback
CLR prints a Stack Trace (to the console/log, DEBUG builds only)
- PrintStackTrace() here (and other functions) -> callback

Obscure Scenarios

Reflection
- RuntimeMethodHandle::GetCurrentMethod(..) here (callback)
Application (App) Domains (See ‘Stack Crawl Marks’ below)
- SystemDomain::GetCallersMethod(..) here (also GetCallersType(..) and GetCallersModule(..)) (callback)
- SystemDomain::GetCallersModule(..) here (callback)
‘Code Pitching’
- CheckStacksAndPitch() here (wrapper and callback)
Extensible Class Factory (System.Runtime.InteropServices.ExtensibleClassFactory)
- RegisterObjectCreationCallback(..) here (callback)
Stack Sampler (unused?)
- StackSampler::ThreadProc() here (wrapper and callback)

Stack Crawl Marks

One of the above scenarios deserves a closer look, but firstly why are ‘stack crawl marks’ used, from coreclr/issues/#21629 (comment):

Unfortunately, there is a ton of legacy APIs that were added during netstandard2.0 push whose behavior depend on the caller. The caller is basically passed in as an implicit argument to the API. Most of these StackCrawlMarks are there to support these APIs…

So we can see that multiple functions within the CLR itself need to have knowledge of their caller. To understand this some more, let’s look an example, the GetType(string typeName) method. Here’s the flow from the externally-visible method all the way down to where the work is done, note how a StackCrawlMark instance is passed through:

Type::GetType(string typeName) implementation (Creates StackCrawlMark.LookForMyCaller)
RuntimeType::GetType(.., ref StackCrawlMark stackMark) implementation
RuntimeType::GetTypeByName(.., ref StackCrawlMark stackMark, ..) implementation
extern void GetTypeByName(.., ref StackCrawlMark stackMark, ..) definition (call into native code, i.e. [DllImport(JitHelpers.QCall, ..)])
RuntimeTypeHandle::GetTypeByName(.., QCall::StackCrawlMarkHandle pStackMark, ..) implementation
TypeHandle TypeName::GetTypeManaged(.., StackCrawlMark* pStackMark, ..) implementation
TypeHandle TypeName::GetTypeWorker(.. , StackCrawlMark* pStackMark, ..) implementation
SystemDomain::GetCallersAssembly(StackCrawlMark *stackMark,..) implementation
SystemDomain::GetCallersModule(StackCrawlMark* stackMark, ..) implementation
SystemDomain::CallersMethodCallbackWithStackMark(..) callback implementation

In addition the JIT (via the VM) has to ensure that all relevant methods are available in the call-stack, i.e. they can’t be removed:

Prevent in-lining CEEInfo::canInline(..) implementation
Prevent removal via a ‘tail call’ CEEInfo::canTailCall(..) implementation

However, the StackCrawlMark feature is currently being cleaned up, so it may look different in the future:

Exception Handling

The place that most .NET Developers will run into ‘stack traces’ is when dealing with exceptions. I originally intended to also describe ‘exception handling’ here, but then I opened up /src/vm/exceptionhandling.cpp and saw that it contained over 7,000 lines of code!! So I decided that it can wait for a future post 😁.

However, if you want to learn more about the ‘internals’ I really recommend Chris Brumme’s post The Exception Model (2003) which is the definitive guide on the topic (also see his Channel9 Videos) and as always, the ‘BotR’ chapter ‘What Every (Runtime) Dev needs to Know About Exceptions in the Runtime’ is well worth a read.

Also, I recommend talking a look at the slides from the ‘Internals of Exceptions’ talk’ and the related post .NET Inside Out Part 2 — Handling and rethrowing exceptions in C# both by Adam Furmanek.

The ‘Stack Walking’ API

Now that we’ve seen where it’s used, let’s look at the ‘stack walking’ API itself. Firstly, how is it used?

How to use it

It’s worth pointing out that the only way you can access it from C#/F#/VB.NET code is via the StackTrace class, only the runtime itself can call into Thread::StackWalkFrames(..) directly. The simplest usage in the runtime is EventPipe::WalkManagedStackForThread(..) (see here), which is shown below. As you can see it’s as simple as specifying the relevant flags, in this case ALLOW_ASYNC_STACK_WALK | FUNCTIONSONLY | HANDLESKIPPEDFRAMES | ALLOW_INVALID_OBJECTS and then providing the callback, which in the EventPipe class is the StackWalkCallback method (here)

bool EventPipe::WalkManagedStackForThread(Thread *pThread, StackContents &stackContents)
{
    CONTRACTL
    {
        NOTHROW;
        GC_NOTRIGGER;
        MODE_ANY;
        PRECONDITION(pThread != NULL);
    }
    CONTRACTL_END;

    // Calling into StackWalkFrames in preemptive mode violates the host contract,
    // but this contract is not used on CoreCLR.
    CONTRACT_VIOLATION( HostViolation );

    stackContents.Reset();

    StackWalkAction swaRet = pThread->StackWalkFrames(
        (PSTACKWALKFRAMESCALLBACK) &StackWalkCallback,
        &stackContents,
        ALLOW_ASYNC_STACK_WALK | FUNCTIONSONLY | HANDLESKIPPEDFRAMES | ALLOW_INVALID_OBJECTS);

    return ((swaRet == SWA_DONE) || (swaRet == SWA_CONTINUE));
}

The StackWalkFrame(..) function then does the heavy-lifting of actually walking the stack, before triggering the callback shown below. In this case it just records the ‘Instruction Pointer’ (IP/CP) and the ‘managed function’, which is an instance of the MethodDesc obtained via the pCf->GetFunction() call:

StackWalkAction EventPipe::StackWalkCallback(CrawlFrame *pCf, StackContents *pData)
{
    CONTRACTL
    {
        NOTHROW;
        GC_NOTRIGGER;
        MODE_ANY;
        PRECONDITION(pCf != NULL);
        PRECONDITION(pData != NULL);
    }
    CONTRACTL_END;

    // Get the IP.
    UINT_PTR controlPC = (UINT_PTR)pCf->GetRegisterSet()->ControlPC;
    if (controlPC == 0)
    {
        if (pData->GetLength() == 0)
        {
            // This happens for pinvoke stubs on the top of the stack.
            return SWA_CONTINUE;
        }
    }

    _ASSERTE(controlPC != 0);

    // Add the IP to the captured stack.
    pData->Append(controlPC, pCf->GetFunction());

    // Continue the stack walk.
    return SWA_CONTINUE;
}

How it works

Now onto the most interesting part, how to the runtime actually walks the stack. Well, first let’s understand what the stack looks like, from the ‘BotR’ page:

Stack Description from BotR

The main thing to note is that a .NET ‘stack’ can contain 3 types of methods:

Managed - this represents code that started off as C#/F#/VB.NET, was turned into IL and then finally compiled to native code by the ‘JIT Compiler’.
Unmanaged - completely native code that exists outside of the runtime, i.e. a OS function the runtime calls into or a user call via P/Invoke. The runtime only cares about transitions into or out of regular unmanaged code, is doesn’t care about the stack frame within it.
Runtime Managed - still native code, but this is slightly different because the runtime case more about this code. For example there are quite a few parts of the Base-Class libraries that make use of InternalCall methods, for more on this see the ‘Helper Method’ Frames section later on.

So the ‘stack walk’ has to deal with these different scenarios as it proceeds. Now let’s look at the ‘code flow’ starting with the entry-point method StackWalkFrames(..):

Thread::StackWalkFrames(..) here
- the entry-point function, the type of ‘stack walk’ can be controlled via these flags
Thread::StackWalkFramesEx(..) here
- worker-function that sets up the StackFrameIterator, via a call to StackFrameIterator::Init(..) here
StackFrameIterator::Next() here, then hands off to the primary worker method StackFrameIterator::NextRaw() here that does 5 things:
1. CheckForSkippedFrames(..) here, deals with frames that may have been allocated inside a managed stack frame (e.g. an inlined p/invoke call).
2. UnwindStackFrame(..) here, in-turn calls:
  - x64 - Thread::VirtualUnwindCallFrame(..) here, then calls VirtualUnwindNonLeafCallFrame(..) here or VirtualUnwindLeafCallFrame(..) here. All of of these functions make use of the Windows API function RtlLookupFunctionEntry(..) to do the actual unwinding.
  - x86 - ::UnwindStackFrame(..) here, in turn calls UnwindEpilog(..) here and UnwindEspFrame(..) here. Unlike x64, under x86 all the ‘stack-unwinding’ is done manually, within the CLR code.
3. PostProcessingForManagedFrames(..) here, determines if the stack-walk is actually within a managed method rather than a native frame.
4. ProcessIp(..) here has the job of looking up the current managed method (if any) based on the current instruction pointer (IP). It does this by calling into EECodeInfo::Init(..) here and then ends up in one of:
  - EEJitManager::JitCodeToMethodInfo(..) here, that uses a very cool looking data structure refereed to as a ‘nibble map’
  - NativeImageJitManager::JitCodeToMethodInfo(..) here
  - ReadyToRunJitManager::JitCodeToMethodInfo(..) here
5. ProcessCurrentFrame(..) here, does some final house-keeping and tidy-up.
CrawlFrame::GotoNextFrame() here
- in-turn calls pFrame->Next() here to walk through the ‘linked list’ of frames which drive the ‘stack walk’ (more on these ‘frames’ later)
StackFrameIterator::Filter() here
- essentially a huge switch statement that handles all the different Frame States and decides whether or not the ‘stack walk’ should continue.

When it gets a valid frame it triggers the callback in Thread::MakeStackwalkerCallback(..) here and passes in a pointer to the current CrawlFrame class defined here, this exposes methods such as IsFrameless(), GetFunction() and GetThisPointer(). The CrawlFrame actually represents 2 scenarios, based on the current IP:

Native code, represented by a Frame class defined here, which we’ll discuss more in a moment.
Managed code, well technically ‘managed code’ that was JITted to ‘native code’, so more accurately a managed stack frame. In this situation the MethodDesc class defined here is provided, you can read more about this key CLR data-structure in the corresponding BotR chapter.

See it ‘in Action’

Fortunately we’re able to turn on some nice diagnostics in a debug build of the CLR (COMPLUS_LogEnable, COMPLUS_LogToFile & COMPLUS_LogFacility). With that in place, given C# code like this:

internal class Program {
    private static void Main() {
        MethodA();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private void MethodA() {
        MethodB();
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    private void MethodB() {
        MethodC();
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    private void MethodC() {
        var stackTrace = new StackTrace(fNeedFileInfo: true);
        Console.WriteLine(stackTrace.ToString());
    }
}

We get the output shown below, in which you can see the ‘stack walking’ process. It starts in InitializeSourceInfo and CaptureStackTrace which are methods internal to the StackTrace class (see here), before moving up the stack MethodC -> MethodB -> MethodA and then finally stopping in the Main function. Along the way its does a ‘FILTER’ and ‘CONSIDER’ step before actually unwinding (‘finished unwind for …’):

TID 4740: STACKWALK    starting with partial context
TID 4740: STACKWALK: [000] FILTER  : EXPLICIT : PC= 00000000`00000000  SP= 00000000`00000000  Frame= 00000002`9977cc48  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: [001] CONSIDER: EXPLICIT : PC= 00000000`00000000  SP= 00000000`00000000  Frame= 00000002`9977cc48  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: [001] FILTER  : EXPLICIT : PC= 00000000`00000000  SP= 00000000`00000000  Frame= 00000002`9977cc48  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: [002] CONSIDER: EXPLICIT : PC= 00000000`00000000  SP= 00000000`00000000  Frame= 00000002`9977cdd8  vtbl= 00007ffd`74995220 
TID 4740: STACKWALK    LazyMachState::unwindLazyState(ip:00007FFD7439C45C,sp:000000029977C338)
TID 4740: STACKWALK: [002] CALLBACK: EXPLICIT : PC= 00000000`00000000  SP= 00000000`00000000  Frame= 00000002`9977cdd8  vtbl= 00007ffd`74995220 
TID 4740: STACKWALK    HelperMethodFrame::UpdateRegDisplay cached ip:00007FFD72FE9258, sp:000000029977D300
TID 4740: STACKWALK: [003] CONSIDER: FRAMELESS: PC= 00007ffd`72fe9258  SP= 00000002`9977d300  method=InitializeSourceInfo 
TID 4740: STACKWALK: [003] CALLBACK: FRAMELESS: PC= 00007ffd`72fe9258  SP= 00000002`9977d300  method=InitializeSourceInfo 
TID 4740: STACKWALK: [004] about to unwind for 'InitializeSourceInfo', SP: 00000002`9977d300 , IP: 00007ffd`72fe9258 
TID 4740: STACKWALK: [004] finished unwind for 'InitializeSourceInfo', SP: 00000002`9977d480 , IP: 00007ffd`72eeb671 
TID 4740: STACKWALK: [004] CONSIDER: FRAMELESS: PC= 00007ffd`72eeb671  SP= 00000002`9977d480  method=CaptureStackTrace 
TID 4740: STACKWALK: [004] CALLBACK: FRAMELESS: PC= 00007ffd`72eeb671  SP= 00000002`9977d480  method=CaptureStackTrace 
TID 4740: STACKWALK: [005] about to unwind for 'CaptureStackTrace', SP: 00000002`9977d480 , IP: 00007ffd`72eeb671 
TID 4740: STACKWALK: [005] finished unwind for 'CaptureStackTrace', SP: 00000002`9977d5b0 , IP: 00007ffd`72eeadd0 
TID 4740: STACKWALK: [005] CONSIDER: FRAMELESS: PC= 00007ffd`72eeadd0  SP= 00000002`9977d5b0  method=.ctor 
TID 4740: STACKWALK: [005] CALLBACK: FRAMELESS: PC= 00007ffd`72eeadd0  SP= 00000002`9977d5b0  method=.ctor 
TID 4740: STACKWALK: [006] about to unwind for '.ctor', SP: 00000002`9977d5b0 , IP: 00007ffd`72eeadd0 
TID 4740: STACKWALK: [006] finished unwind for '.ctor', SP: 00000002`9977d5f0 , IP: 00007ffd`14c620d3 
TID 4740: STACKWALK: [006] CONSIDER: FRAMELESS: PC= 00007ffd`14c620d3  SP= 00000002`9977d5f0  method=MethodC 
TID 4740: STACKWALK: [006] CALLBACK: FRAMELESS: PC= 00007ffd`14c620d3  SP= 00000002`9977d5f0  method=MethodC 
TID 4740: STACKWALK: [007] about to unwind for 'MethodC', SP: 00000002`9977d5f0 , IP: 00007ffd`14c620d3 
TID 4740: STACKWALK: [007] finished unwind for 'MethodC', SP: 00000002`9977d630 , IP: 00007ffd`14c62066 
TID 4740: STACKWALK: [007] CONSIDER: FRAMELESS: PC= 00007ffd`14c62066  SP= 00000002`9977d630  method=MethodB 
TID 4740: STACKWALK: [007] CALLBACK: FRAMELESS: PC= 00007ffd`14c62066  SP= 00000002`9977d630  method=MethodB 
TID 4740: STACKWALK: [008] about to unwind for 'MethodB', SP: 00000002`9977d630 , IP: 00007ffd`14c62066 
TID 4740: STACKWALK: [008] finished unwind for 'MethodB', SP: 00000002`9977d660 , IP: 00007ffd`14c62016 
TID 4740: STACKWALK: [008] CONSIDER: FRAMELESS: PC= 00007ffd`14c62016  SP= 00000002`9977d660  method=MethodA 
TID 4740: STACKWALK: [008] CALLBACK: FRAMELESS: PC= 00007ffd`14c62016  SP= 00000002`9977d660  method=MethodA 
TID 4740: STACKWALK: [009] about to unwind for 'MethodA', SP: 00000002`9977d660 , IP: 00007ffd`14c62016 
TID 4740: STACKWALK: [009] finished unwind for 'MethodA', SP: 00000002`9977d690 , IP: 00007ffd`14c61f65 
TID 4740: STACKWALK: [009] CONSIDER: FRAMELESS: PC= 00007ffd`14c61f65  SP= 00000002`9977d690  method=Main 
TID 4740: STACKWALK: [009] CALLBACK: FRAMELESS: PC= 00007ffd`14c61f65  SP= 00000002`9977d690  method=Main 
TID 4740: STACKWALK: [00a] about to unwind for 'Main', SP: 00000002`9977d690 , IP: 00007ffd`14c61f65 
TID 4740: STACKWALK: [00a] finished unwind for 'Main', SP: 00000002`9977d6d0 , IP: 00007ffd`742f9073 
TID 4740: STACKWALK: [00a] FILTER  : NATIVE   : PC= 00007ffd`742f9073  SP= 00000002`9977d6d0 
TID 4740: STACKWALK: [00b] CONSIDER: EXPLICIT : PC= 00007ffd`742f9073  SP= 00000002`9977d6d0  Frame= 00000002`9977de58  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: [00b] FILTER  : EXPLICIT : PC= 00007ffd`742f9073  SP= 00000002`9977d6d0  Frame= 00000002`9977de58  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: [00c] CONSIDER: EXPLICIT : PC= 00007ffd`742f9073  SP= 00000002`9977d6d0  Frame= 00000002`9977e7e0  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: [00c] FILTER  : EXPLICIT : PC= 00007ffd`742f9073  SP= 00000002`9977d6d0  Frame= 00000002`9977e7e0  vtbl= 00007ffd`74a105b0 
TID 4740: STACKWALK: SWA_DONE: reached the end of the stack

To find out more, you can search for these diagnostic message in \vm\stackwalk.cpp, e.g. in Thread::DebugLogStackWalkInfo(..) here

Unwinding ‘Native’ Code

As explained in this excellent article:

There are fundamentally two main ways to implement exception propagation in an ABI (Application Binary Interface):

“dynamic registration”, with frame pointers in each activation record, organized as a linked list. This makes stack unwinding fast at the expense of having to set up the frame pointer in each function that calls other functions. This is also simpler to implement.

“table-driven”, where the compiler and assembler create data structures alongside the program code to indicate which addresses of code correspond to which sizes of activation records. This is called “Call Frame Information” (CFI) data in e.g. the GNU tool chain. When an exception is generated, the data in this table is loaded to determine how to unwind. This makes exception propagation slower but the general case faster.

It turns out that .NET uses the ‘table-driven’ approach, for the reason explained in the ‘BotR’:

The exact definition of a frame varies from platform to platform and on many platforms there isn’t a hard definition of a frame format that all functions adhere to (x86 is an example of this). Instead the compiler is often free to optimize the exact format of frames. On such systems it is not possible to guarantee that a stackwalk will return 100% correct or complete results (for debugging purposes, debug symbols such as pdbs are used to fill in the gaps so that debuggers can generate more accurate stack traces).

This is not a problem for the CLR, however, since we do not require a fully generalized stack walk. Instead we are only interested in those frames that are managed (i.e. represent a managed method) or, to some extent, frames coming from unmanaged code used to implement part of the runtime itself. In particular there is no guarantee about fidelity of 3rd party unmanaged frames other than to note where such frames transition into or out of the runtime itself (i.e. one of the frame types we do care about).

Frames

To enable ‘unwinding’ of native code or more strictly the transitions ‘into’ and ‘out of’ native code, the CLR uses a mechanism of Frames, which are defined in the source code here. These frames are arranged into a hierachy and there is one type of Frame for each scenario, for more info on these individual Frames take a look at the excellent source-code comments here.

Frame (abstract/base class)
- GCFrame
- FaultingExceptionFrame
- HijackFrame
- ResumableFrame
  - RedirectedThreadFrame
- InlinedCallFrame
- HelperMethodFrame
  - HelperMethodFrame_1OBJ
  - HelperMethodFrame_2OBJ
  - HelperMethodFrame_3OBJ
  - HelperMethodFrame_PROTECTOBJ
- TransitionFrame
  - StubHelperFrame
  - SecureDelegateFrame
    - MulticastFrame
  - FramedMethodFrame
    - ComPlusMethodFrame
    - PInvokeCalliFrame
    - PrestubMethodFrame
    - StubDispatchFrame
    - ExternalMethodFrame
    - TPMethodFrame
- UnmanagedToManagedFrame
  - ComMethodFrame
    - ComPrestubMethodFrame
  - UMThkCallFrame
- ContextTransitionFrame
- TailCallFrame
- ProtectByRefsFrame
- ProtectValueClassFrame
- DebuggerClassInitMarkFrame
- DebuggerSecurityCodeMarkFrame
- DebuggerExitFrame
- DebuggerU2MCatchHandlerFrame
- FuncEvalFrame
- ExceptionFilterFrame

‘Helper Method’ Frames

But to make sense of this, let’s look at one type of Frame, known as HelperMethodFrame (above). This is used when .NET code in the runtime calls into C++ code to do the heavy-lifting, often for performance reasons. One example is if you call Environment.GetCommandLineArgs() you end up in this code (C#), but note that it ends up calling an extern method marked with InternalCall:

[MethodImplAttribute(MethodImplOptions.InternalCall)]
private static extern string[] GetCommandLineArgsNative();

This means that the rest of the method is implemented in the runtime in C++, you can see how the method call is wired up, before ending up SystemNative::GetCommandLineArgs here, which is shown below:

FCIMPL0(Object*, SystemNative::GetCommandLineArgs)
{
    FCALL_CONTRACT;

    PTRARRAYREF strArray = NULL;

    HELPER_METHOD_FRAME_BEGIN_RET_1(strArray); // <-- 'Helper method Frame' started here

    // Error handling and setup code removed for clarity

    strArray = (PTRARRAYREF) AllocateObjectArray(numArgs, g_pStringClass);
    // Copy each argument into new Strings.
    for(unsigned int i=0; i<numArgs; i++)
    {
        STRINGREF str = StringObject::NewString(argv[i]);
        STRINGREF * destData = ((STRINGREF*)(strArray->GetDataPtr())) + i;
        SetObjectReference((OBJECTREF*)destData, (OBJECTREF)str, strArray->GetAppDomain());
    }
    delete [] argv;

    HELPER_METHOD_FRAME_END(); // <-- 'Helper method Frame' ended/closed here

    return OBJECTREFToObject(strArray);
}
FCIMPLEND

Note: this code makes heavy use of macros, see this gist for the original code and then the expanded versions (Release and Debug). In addition, if you want more information on these mysterious FCalls as they are known (and the related QCalls) see Mscorlib and Calling Into the Runtime in the ‘BotR’.

But the main thing to look at in the code sample is the HELPER_METHOD_FRAME_BEGIN_RET_1() macro, with ultimately installs an instance of the HelperMethodFrame_1OBJ class. The macro expands into code like this:

FrameWithCookie < HelperMethodFrame_1OBJ > __helperframe(__me, Frame::FRAME_ATTR_NONE, (OBJECTREF * ) & strArray); 
{
  __helperframe.Push(); // <-- 'Helper method Frame' pushed

  Thread * CURRENT_THREAD = __helperframe.GetThread();
  const bool CURRENT_THREAD_AVAILABLE = true;
  (void) CURRENT_THREAD_AVAILABLE;; {
	Exception * __pUnCException = 0;
	Frame * __pUnCEntryFrame = ( & __helperframe);
	bool __fExceptionCatched = false;;
	try {;

	  // Original code from SystemNative::GetCommandLineArgs goes in here

	} catch (Exception * __pException) {;
	  do {} while (0);
	  __pUnCException = __pException;
	  UnwindAndContinueRethrowHelperInsideCatch(__pUnCEntryFrame, __pUnCException);
	  __fExceptionCatched = true;;
	}
	if (__fExceptionCatched) {;
	  UnwindAndContinueRethrowHelperAfterCatch(__pUnCEntryFrame, __pUnCException);
	}
  };
  
  __helperframe.Pop(); // <-- 'Helper method Frame' popped
};

Note: the Push() and Pop() against _helperMethodFrame that make it available for ‘stack walking’. You can also see the try/catch block that the CLR puts in place to ensure any exceptions from native code are turned into managed exceptions that C#/F#/VB.NET code can handle. If you’re interested the full macro-expansion is available in this gist.

So in summary, these Frames are pushed onto a ‘linked list’ when calling into native code and popped off the list when returning from native code. This means that are any moment the ‘linked list’ contains all the current or active Frames.

Native Unwind Information

In addition to creating ‘Frames’, the CLR also ensures that the C++ compiler emits ‘unwind info’ for native code. We can see this if we use the DUMPBIN tool and run dumpbin /UNWINDINFO coreclr.dll. We get the following output for SystemNative::GetCommandLineArgs(..) (that we looked at before):

  0002F064 003789B0 00378B7E 004ED1D8  ?GetCommandLineArgs@SystemNative@@SAPEAVObject@@XZ (public: static class Object * __cdecl SystemNative::GetCommandLineArgs(void))
    Unwind version: 1
    Unwind flags: EHANDLER UHANDLER
    Size of prologue: 0x3B
    Count of codes: 13
    Unwind codes:
      29: SAVE_NONVOL, register=r12 offset=0x1C8
      25: SAVE_NONVOL, register=rdi offset=0x1C0
      21: SAVE_NONVOL, register=rsi offset=0x1B8
      1D: SAVE_NONVOL, register=rbx offset=0x1B0
      10: ALLOC_LARGE, size=0x190
      09: PUSH_NONVOL, register=r15
      07: PUSH_NONVOL, register=r14
      05: PUSH_NONVOL, register=r13
    Handler: 00148F14 __GSHandlerCheck_EH
    EH Handler Data: 00415990
    GS Unwind flags: EHandler UHandler
    Cookie Offset: 00000180

  0002F070 00378B7E 00378BB4 004ED26C
    Unwind version: 1
    Unwind flags: EHANDLER UHANDLER
    Size of prologue: 0x0A
    Count of codes: 2
    Unwind codes:
      0A: ALLOC_SMALL, size=0x20
      06: PUSH_NONVOL, register=rbp
    Handler: 0014978C __CxxFrameHandler3
    EH Handler Data: 00415990

If you want to understand more of what’s going on here I really recommend reading the excellent article x64 Manual Stack Reconstruction and Stack Walking. But in essence the ‘unwind info’ describes which registers are used within a method and how big stack is for that method. These pieces of information are enough to tell the runtime how to ‘unwind’ that particular method when walking the stack.

Differences between Windows and Unix

However, to further complicate things, the ‘native code unwinding’ uses a different mechanism for ‘Windows’ v. ‘Unix’, as explained in coreclr/issues/#177 (comment):

Stack walker for managed code. JIT will generate regular Windows style unwinding info. We will reuse Windows unwinder code that we currently have checked in for debugger components for unwinding calls in managed code on Linux/Mac. Unfortunately, this work requires changes in the runtime that currently cannot be tested in the CoreCLR repo so it is hard to do this in the public right now. But we are working on fixing that because, as I mentioned at the beginning, our goal is do most work in the public.

Stack walker for native code. Here, in addition to everything else, we need to allow GC to unwind native stack of any thread in the current process until it finds a managed frame. Currently we are considering using libunwind (http://www.nongnu.org/libunwind) for unwinding native call stacks. @janvorli did some prototyping/experiments and it seems to do what we need. If you have any experience with this library or have any comments/suggestions please let us know.

This also shows that there are 2 different ‘unwind’ mechanisms for ‘managed’ or ‘native’ code, we will discuss how the “stack walker for managed code” works in Unwinding ‘JITted’ Code.

There is also some more information in coreclr/issues/#177 (comment):

My current work has two parts, as @sergiy-k has already mentioned. The windows style unwinder that will be used for the jitted code and Unix unwinder for native code that uses the libunwind’s low level unw_xxxx functions like unw_step etc.

So, for ‘native code’ the runtime uses an OS specific mechanism, i.e. on Unix the Open Source ‘libunwind’ library is used. You can see the differences in the code below (from here), under Windows Thread::VirtualUnwindCallFrame(..) (implementation) is called, but on Unix (i.e. FEATURE_PAL) PAL_VirtualUnwind(..) (implementation) is called instead:

#ifndef FEATURE_PAL
    pvControlPc = Thread::VirtualUnwindCallFrame(&ctx, &nonVolRegPtrs);
#else // !FEATURE_PAL
    ...
    BOOL success = PAL_VirtualUnwind(&ctx, &nonVolRegPtrs);
    ...
    pvControlPc = GetIP(&ctx);
#endif // !FEATURE_PAL

Before we more on, here are some links to the work that was done to support ‘stack walking’ when .NET Core CLR was ported to Linux:

Unwinding ‘JITted’ Code

Finally, we’re going to look at what happens with ‘managed code’, i.e. code that started off as C#/F#/VB.NET, was turned into IL and then compiled into native code by the ‘JIT Compiler’. This is the code that you generally want to see in your ‘stack trace’, because it’s code you wrote yourself!

Help from the ‘JIT Compiler’

Simply, what happens is that when the code is ‘JITted’, the compiler also emits some extra information, stored via the EECodeInfo class, which is defined here. Also see the ‘Unwind Info’ section in the JIT Compiler <-> Runtime interface, note how it features seperate sections for TARGET_ARM, TARGET_ARM64, TARGET_X86 and TARGET_UNIX.

In addition, in CodeGen::genFnProlog() here the JIT emits a function ‘prologue’ that contains several pieces of ‘unwind’ related data. This is also imlemented in CEEJitInfo::allocUnwindInfo(..) in this piece of code, which behaves differently for each CPU architecture:

#if defined(_TARGET_X86_)
    // Do NOTHING
#elif defined(_TARGET_AMD64_)
    pUnwindInfo->Flags = UNW_FLAG_EHANDLER | UNW_FLAG_UHANDLER;
    ULONG * pPersonalityRoutine = (ULONG*)ALIGN_UP(&(pUnwindInfo->UnwindCode[pUnwindInfo->CountOfUnwindCodes]), sizeof(ULONG));
    *pPersonalityRoutine = ExecutionManager::GetCLRPersonalityRoutineValue();
#elif defined(_TARGET_ARM64_)
    *(LONG *)pUnwindInfo |= (1 << 20); // X bit
    ULONG * pPersonalityRoutine = (ULONG*)((BYTE *)pUnwindInfo + ALIGN_UP(unwindSize, sizeof(ULONG)));
    *pPersonalityRoutine = ExecutionManager::GetCLRPersonalityRoutineValue();
#elif defined(_TARGET_ARM_)
    *(LONG *)pUnwindInfo |= (1 << 20); // X bit
    ULONG * pPersonalityRoutine = (ULONG*)((BYTE *)pUnwindInfo + ALIGN_UP(unwindSize, sizeof(ULONG)));
    *pPersonalityRoutine = (TADDR)ProcessCLRException - baseAddress;
#endif

Also, the JIT has several Compiler::unwindXXX(..) methods, that are all implemented in per-CPU source files:

Fortunately, we can ask the JIT to output the unwind info that it emits, however this only works with a Debug version of the CLR. Given a simple method like this:

private void MethodA() {
    try {
        MethodB();
    } catch (Exception ex) {
        Console.WriteLine(ex.ToString());
    }
}

if we call SET COMPlus_JitUnwindDump=MethodA, we get the following output with 2 ‘Unwind Info’ sections, one for the try and the other for the catch block:

Unwind Info:
  >> Start offset   : 0x000000 (not in unwind data)
  >>   End offset   : 0x00004e (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x07
  CountOfUnwindCodes: 4
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 11 * 8 + 8 = 96 = 0x60
    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
Unwind Info:
  >> Start offset   : 0x00004e (not in unwind data)
  >>   End offset   : 0x0000e2 (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x07
  CountOfUnwindCodes: 4
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x07 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 5 * 8 + 8 = 48 = 0x30
    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

This ‘unwind info’ is then looked up during a ‘stack walk’ as explained in the How it works section above.

So next time you encounter a ‘stack trace’ remember that a lot of work went into making it possible!!

Performance is a Feature!