This is the retelling of a fairly short debugging investigation. While it’s not as intricate as most of the investigations that I post on my blog, I thought this one is interesting because it showcases one of the most difficult and unexpected challenges you may face when writing a profiler.

It started as I opened Visual Studio 2026 to prototype some code, and the window seemingly froze.

Since I’m working on the performance of ReSharper, I immediately look into any unexpected delay when using Visual Studio. At this point, I thought that something was just taking a long time to initialize, and that it would only last a few seconds. As fast as I could, I launched the Superluminal Performance profiler and attached it to the instance of Visual Studio. I then waited for Visual Studio to finish loading, but after about 30 seconds I realized that it was taking way too long. I stopped the trace and checked the results.

That was very intriguing as it showed that the process was doing pretty much… nothing. Meanwhile, Visual Studio was still frozen, so I finally understood that it was some sort of deadlock rather than a performance issue. I then captured a memory dump and loaded it into a fresh instance of Visual Studio.

In that situation you could be tempted to attach a debugger directly to the frozen process, but I strongly recommend not doing that unless you know that you can reproduce the issue. It’s not rare that a debugger causes the target process to completely crash, especially when it’s in an unstable state (which is the reason why you’re debugging it to begin with). I prefer to start my investigations from a memory dump while leaving the frozen process in the background, and only attach a debugger when I need some live data that can’t be extracted from the dump.

When investigating a deadlock, I always start by checking the “Parallel Stacks” view of Visual Studio:

There were a ton of threads so at this point it was hard to know what was relevant or not. But I immediately noticed a few things:

  • The main thread was JIT compiling some function (as indicated by the CILJit::compileMethod frame), and was executing RtlAddGrowableFunctionTable.
  • Most of the other threads were stuck in RtlEnterCriticalSection while JIT compiling some code, probably waiting on a lock held by the main thread.

To be 100% sure that the main thread was indeed stuck, I captured a second memory dump and compared it to the first one. Sure enough, I found the same callstack.

To understand what exactly the main thread was blocked on, I checked the disassembly:

That was very unexpected. The thread was not waiting on a lock, or stuck in a syscall. It was just executing… test rax, rax. In other words, a simple null check. I double-checked in the other memory dump and yes: the thread was stuck on the exact same instruction.

I know only one situation where this could happen: if the thread was suspended. This struck a bell because it looked a lot like a problem we had to deal with back at Datadog when writing profilers (foreshadowing). I tried to think of a way to verify this, and it’s surprisingly difficult. If you attach a debugger, you can’t run any command until you pause the execution. And when you do, of course the thread is going to be suspended, since you paused the execution! Eventually, after a completely unproductive discussion with ChatGPT, I checked Process Explorer and thankfully the “threads” tab gives the suspend count:

So Visual Studio was frozen because the main thread was suspended… Which begs the question: who suspended it? I already knew what to look for: in ReSharper we experimented with some watchdog to collect information when the UI thread is stuck for a long time, and I fully expected Visual Studio to have a similar mechanism. Sure enough, I found this thread:

NtWaitForAlertByThreadId()
RtlpAcquireSRWLockSharedContended()
RtlAcquireSRWLockShared()
RtlpLookupDynamicFunctionEntry()
RtlLookupFunctionEntry()
CUserModeStackTrace::GetStackTraceX64(struct _CONTEXT,unsigned short,unsigned __int64 *,unsigned short,unsigned short *)
CUserModeStackTrace::GetStackTraceOfThreadHandleImpl(void *,unsigned short,bool,unsigned __int64 *,unsigned short,unsigned short *)
CUserModeStackTrace::GetStackTraceOfThreadHandle(void *,unsigned __int64 *,unsigned short,unsigned short *,unsigned __int64 *)
`anonymous namespace'::PerfMonitoringHelper::GetBestCallStackForTargetThreadByHandle()
CResponsivenessMonitor::MonitorWorker(void)
CResponsivenessMonitor::ResponsivenessThreadProc(void *)

This “responsiveness monitor” was trying to inspect a thread (GetStackTraceX64) and got stuck while acquiring a lock in RtlLookupFunctionEntry. That lock was probably held by the main thread, which got suspended. To confirm that, I wanted to check what thread that “responsiveness monitor” was targeting. Thankfully, the name of some of the functions was quite explicit (GetStackTraceOfThreadHandle) so I loaded the stackframe for that function and inspected the registers to see if I could find a thread handle:

0:047> k
 # Child-SP          RetAddr               Call Site
00 00000017`4b27eae8 00007ffd`28030f4d     ntdll!NtWaitForAlertByThreadId+0x14
01 00000017`4b27eaf0 00007ffd`2803afa3     ntdll!RtlpAcquireSRWLockSharedContended+0x27d
02 00000017`4b27eb70 00007ffd`28100f0d     ntdll!RtlAcquireSRWLockShared+0x53
03 00000017`4b27eba0 00007ffd`2802e2fe     ntdll!RtlpLookupDynamicFunctionEntry+0x1d
04 00000017`4b27ebd0 00007ffc`26161eee     ntdll!RtlLookupFunctionEntry+0x35e
05 00000017`4b27ec20 00007ffc`26161e28     VsLog!CUserModeStackTrace::GetStackTraceX64+0x86
06 00000017`4b27edb0 00007ffc`26163baa     VsLog!CUserModeStackTrace::GetStackTraceOfThreadHandleImpl+0x150
07 00000017`4b27f7c0 00007ffc`26163a92     VsLog!CUserModeStackTrace::GetStackTraceOfThreadHandle+0x5a
08 00000017`4b27f800 00007ffc`261391ec     VsLog!`anonymous namespace'::PerfMonitoringHelper::GetBestCallStackForTargetThreadByHandle+0x102
09 00000017`4b27f8a0 00007ffc`2613897b     Microsoft_VisualStudio_PerfWatson_Provider!CResponsivenessMonitor::MonitorWorker+0x65c
0a 00000017`4b27fa80 00007ffd`2652e8d7     Microsoft_VisualStudio_PerfWatson_Provider!CResponsivenessMonitor::ResponsivenessThreadProc+0x1b
0b 00000017`4b27fab0 00007ffd`280ac53c     kernel32!BaseThreadInitThunk+0x17
0c 00000017`4b27fae0 00000000`00000000     ntdll!RtlUserThreadStart+0x2c

0:047> .frame /r 6
06 00000017`4b27edb0 00007ffc`26163baa     VsLog!CUserModeStackTrace::GetStackTraceOfThreadHandleImpl+0x150
rax=00000000000001e3 rbx=00000000000014a4 rcx=00007ffd281f4250
rdx=0000000000000000 rsi=000000174b27f900 rdi=0000000000000000
rip=00007ffc26161e28 rsp=000000174b27edb0 rbp=000001580648ce50
 r8=0000480febfdc84d  r9=0000480febfdc845 r10=00007ffd281f4250
r11=00007ffd281f4250 r12=00000000000014a4 r13=000000000000005b
r14=000000174b27f928 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
VsLog!CUserModeStackTrace::GetStackTraceOfThreadHandleImpl+0x150:

k displays the callstack, then .frame /r displays the registers for the given stackframe. In this case, I loaded the registers for the stackframe 6, which was GetStackTraceOfThreadHandleImpl. Then I looked for anything that looked like a handle. On Windows, a handle is actually just an index to some internal table, so the value is expected to be very small. Two registers had promising values: rax=1e3 and rbx=14a4. I inspected each of them with the !handle command:

0:047> !handle 1e3
Handle 00000000000001e3
  Type         	Mutant
0:047> !handle 14a4
Handle 00000000000014a4
  Type         	Thread

One of the values was indeed a thread handle! I then used the f flag to display all the information about the handle:

0:047> !handle 14a4 f
Handle 00000000000014a4
  Type         	Thread
  Attributes   	0
  GrantedAccess	0x1fffff:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
  HandleCount  	7
  PointerCount 	196433
  Name         	<none>
  Object specific information
    Thread Id   86a0.854c
    Priority    10
    Base Priority 0

The Thread Id 86a0.854c part matched the id of the main thread, which confirmed my theory.

Writing profilers is hard

So, what happened? For some reason, the “responsiveness monitor” built into Visual Studio decided to inspect the main thread. To walk the callstack of a thread on Windows x64, what you need to do is:

The issue happened during step 3: to read the function table, RtlLookupFunctionEntry needs to acquire a lock, and that lock was held by the suspended thread! This is a common situation that profilers must deal with. And this is not limited to RtlLookupFunctionEntry, anything that the profiler does that requires locking could cause a deadlock if it’s done while another thread is suspended (some common examples include allocating memory, which requires the heap lock, or loading a DLL, which requires the loader lock). And this is not limited to Windows, Linux profilers face the exact same issue.

What can be done about it? Unfortunately there is no great solution. Speaking from experience, the Datadog continuous profiler uses two different approaches:

  • On Windows, it uses an additional “deadlock monitor” thread. Its responsibility is to make sure that the profiler thread is making some progress. If the profiler thread takes too long, it assumes that a deadlock occured and forcefully resumes the suspended thread, thus resolving the deadlock. It generally works but it has a few drawbacks: because the approach is based on a timeout, it takes some time to detect the deadlock, which negatively impacts the performance of the profiled application. Also, the deadlock monitor cannot use any Windows API that requires a lock, lest it itself becomes deadlocked.
  • On Linux, it detours libc functions that may cause deadlocks (dlopen, pthread_create, to name a few), so that the profiler knows not to suspend a thread that is currently executing them. The drawbacks here are that it’s hard to build an exhaustive list of calls to detour (you have to discover them the hard way), and it skews the profiling (since a thread is never profiled while calling these functions, it might appear as if it’s doing less work than it actually is).

It was really interesting for me to see the Visual Studio team, with all their experience, fall into the same pitfalls as we did back when I worked at Datadog (to be clear, I’m not shaming. This is a very, very tough problem to solve). Nowadays, when writing a profiler with Silhouette, I usually rely on ICorProfilerInfo10::SuspendRuntime, which delegates to the .NET runtime the responsibility of finding a good place to stop the threads. Under the hood it uses the same APIs as the garbage collector, so it’s battle-tested. The major drawback is that suspends all threads instead of just one, so this is not a great approach when performance is a concern.