Recently, Dan Shechter sent me a code sample
that illustrates a certain peculiarity of the .NET
garbage collector. Here's a version of the sample:
using System;
using System.Text;
class Program
{
static long g_LastMemory;
static void
DisplayMemory(string caption,
bool showDelta)
{
long memory =
GC.GetTotalMemory(true);
StringBuilder builder =
new StringBuilder();
builder.AppendFormat("{0}: {1:###,###,###,##0} bytes",
caption, memory);
if (showDelta)
builder.AppendFormat(" (delta:
{0:###,###,###,##0} bytes)",
memory - g_LastMemory);
Console.WriteLine(builder.ToString());
g_LastMemory = memory;
}
static void Main()
{
DisplayMemory("Initial memory", false);
byte[] myBigArray = new byte[1024 *
1024];
DisplayMemory("After allocating 1 megabyte", true);
for (int i = 0; i < myBigArray.Length; i++)
myBigArray[i] = (byte)(i % 255);
DisplayMemory("After manipulating 1
megabyte", true);
myBigArray = null;
DisplayMemory("After releasing 1 megabyte",
true);
}
}
The sample application is fairly simple; it manipulates an array and displays
the total memory size. One important detail we should pay attention to is that a garbage collection is forced when "DisplayMemory()"
calls "GC.GetTotalMemory()" with a value of "true."
Consequently, the
application flow looks like this:
- Force a garbage collection and display the initial memory size.
- Allocate a megabyte-sized byte array.
- Force a garbage collection and display the current memory size.
- Loop through the byte array and assign values to each element in the
array.
- Force a garbage collection and display the current memory size.
- Set the byte array to null.
- Force a garbage collection and display the current memory size.
When the application is built with Visual Studio 2005 in the "Debug"
configuration and run at the command-line, the results are predictable.
Initial memory: 212,520 bytes
After allocating 1 megabyte: 1,263,748 bytes (delta: 1,051,228 bytes)
After manipulating 1 megabyte: 1,263,748 bytes (delta: 0 bytes)
After releasing 1 megabyte: 215,156 bytes (delta: -1,048,592 bytes)
However, when built in the "Release" configuration, the results are quite
unexpected.
Initial memory: 212,552 bytes
After allocating 1 megabyte: 1,263,752 bytes (delta: 1,051,200 bytes)
After manipulating 1 megabyte: 215,160 bytes (delta: -1,048,592 bytes)
After releasing 1 megabyte: 215,160 bytes (delta: 0 bytes)
In the "Release" build,
the byte array is reclaimed by the garbage collector before it is set to null.
From a memory-optimization perspective, this makes sense because the byte array
isn't actually used after its elements are assigned except to set it to null.
Releasing the memory early relieves pressure on the garbage collector. However,
to the old-school programmer who is accustomed to memory remaining allocated until
explicitly freed, this is harder to swallow.
The lesson we learn from this example is not to expect the memory in a garbage-collected environment
to be treated in predictable ways. Anyone who tries to pin down the garbage
collector (excepting CLR developers) will find it to be an elusive and magical creature. I
am always looking for the "why" behind the magic, so let's track down
exactly how this optimization is applied under the covers.
Our first step is to check the IL emitted by the C# compiler to see if the
compiler simply moves the null assignment. Here is the disassembled IL
code with annotations to show exactly what each chunk of instructions does:
.method private
hidebysig static
void Main() cil
managed
{
.entrypoint
.maxstack 4
.locals init (
[0] uint8[] myBigArray,
[1] int32 i)
ldstr "Initial memory"
ldc.i4.0
call void
GCTest.Program::DisplayMemory(string,
bool)
/
ldc.i4 1048576
newarr uint8
stloc.0
ldstr "After allocating
1 megabyte"
ldc.i4.1
call void
GCTest.Program::DisplayMemory(string,
bool)
ldc.i4.0
stloc.1
br.s LengthCheck
LoopBody:
ldloc.0
ldloc.1
ldloc.1
ldc.i4 255
rem
conv.u1
stelem.i1
ldloc.1
ldc.i4.1
add
stloc.1
LengthCheck:
ldloc.1
ldloc.0
ldlen
conv.i4
blt.s LoopBody
ldstr "After
manipulating 1 megabyte"
ldc.i4.1
call void
GCTest.Program::DisplayMemory(string,
bool)
ldnull
stloc.0
ldstr "After releasing
1 megabyte"
ldc.i4.1
call void
GCTest.Program::DisplayMemory(string,
bool)
ret
}
The code above shows us that the byte array is explicitly set to null
after "DisplayMemory()" is called. While the IL code is interesting,
the optimization isn't there. That leaves only one possibility. The optimization
must happen at JIT time when the IL is compiled to native instructions.
The Visual Studio 2005 debugger is the most robust tool for viewing native
JIT-compiled code. However, we must set a few options first.
- Switch to the "Release" solution configuration.
- In the Solution Explorer, right-click on the project node and choose
Properties from the context menu to bring up the Application Designer.
- On the Build tab of the Application Designer, press the Advanced button
to display the Advanced Build Settings dialog. In this dialog, set Debug
Info to "pdb-only" and click OK.
- On the Debug tab of the Application Designer, ensure that "Enabled
unmanaged code debugging" is checked.
At this point, we can return to the sample code, set a breakpoint at the beginning of the "Main()"
method and
run the application by pressing F5. When the application enters break mode, we
can view the disassembled native code by selecting Debug -> Windows ->
Disassembly from the Visual Studio menu. Here's what it looks like:
static
void Main()
{
DisplayMemory("Initial memory",
false);
00000000 push edi
00000001 push esi
00000002 cmp dword ptr ds:[00912DC8h],0
00000009 je 00000010
0000000b call 78F9228E
00000010 xor esi,esi
00000012 xor edi,edi
00000014 mov ecx,dword ptr ds:[022B303Ch]
0000001a xor edx,edx
0000001c call dword ptr ds:[00913040h]
byte[] myBigArray =
new byte[1024 * 1024];
00000022 mov edx,100000h
00000027 mov ecx,7915985Ah
0000002c call FF802120
00000031 mov esi,eax
DisplayMemory("After allocating 1 megabyte",
true);
00000033 mov ecx,dword ptr ds:[022B3040h]
00000039 mov edx,1
0000003e call dword ptr ds:[00913040h]
for (int i = 0; i < myBigArray.Length; i++)
00000044 xor edi,edi
00000046 nop
00000047 jmp 00000062
myBigArray[i] = (byte)(i % 255);
00000049 mov eax,edi
0000004b mov ecx,0FFh
00000050 cdq
00000051 idiv eax,ecx
00000053 cmp edi,dword ptr [esi+4]
00000056 jb 0000005D
00000058 call 78F94671
0000005d mov byte ptr [esi+edi+8],dl
for (int i = 0; i < myBigArray.Length; i++)
00000061 inc edi
00000062 cmp edi,dword ptr [esi+4]
00000065 jl 00000049
DisplayMemory("After manipulating 1 megabyte",
true);
00000067 mov ecx,dword ptr ds:[022B3044h]
0000006d mov edx,1
00000072 call dword ptr ds:[00913040h]
myBigArray = null;
00000078 xor esi,esi
DisplayMemory("After releasing 1 megabyte",
true);
0000007a mov ecx,dword ptr ds:[022B3048h]
00000080 mov edx,1
00000085 call dword ptr ds:[00913040h]
}
0000008b nop
0000008c pop esi
0000008d pop edi
0000008e ret
Examining the native instructions reveals that the optimization
isn't there, either. The
byte array is stored in the ESI register at 00000031 and is explicitly cleared
using the old assembly trick of XOR'ing a register with itself at 00000078.
However, the registry is cleared after "DisplayMemory()" is called. In fact,
when we remove
the breakpoint and allow the application to finish running, we see similar results
to what we saw in the "Debug" configuration.
Initial memory: 150,584 bytes
After allocating 1 megabyte: 1,211,176 bytes (delta: 1,060,592 bytes)
After manipulating 1 megabyte: 1,211,176 bytes (delta: 0 bytes)
After releasing 1 megabyte: 162,584 bytes (delta: -1,048,592 bytes)
What did we do wrong?
The problem is that, by default, Visual Studio 2005 disables the JIT optimizer
during debugging. Doing so significantly simplifies managed code debugging. For
those of you who are interested,
Vance Morrison has
written a great article about this
"feature".
Here is a summary of his instructions to force Visual Studio to show optimized
JIT-compiled code:
- Select Tools -> Options from the Visual Studio main menu to open the
Options dialog.
- In the Options dialog, choose the Debugging | General options page.
- Ensure that "Enable Just My Code (Managed only)" is unchecked.
- Ensure that "Suppress JIT optimization on module load (Managed only)" is
unchecked.
- Click OK to close the Options dialog.
Now when we run the application, the results indicate that the optimization we saw when we ran it in "Release" configuration at the
command-line is present.
Initial memory: 150,612 bytes
After allocating 1 megabyte: 1,211,176 bytes (delta: 1,060,564 bytes)
After manipulating 1 megabyte: 162,584 bytes (delta: -1,048,592 bytes)
After releasing 1 megabyte: 162,584 bytes (delta: 0 bytes)
With the additional options selected, we again set a breakpoint at the beginning
of the "Main()" method and run the application. Here is the disassembled (and
fully optimized) native code:
static
void Main()
{
DisplayMemory("Initial memory",
false);
00000000 push edi
00000001 push esi
00000002 push ebx
00000003 mov ecx,dword ptr ds:[022B303Ch]
00000009 xor edx,edx
0000000b call dword ptr ds:[00913040h]
byte[] myBigArray =
new byte[1024 * 1024];
00000011 mov edx,100000h
00000016 mov ecx,7915985Ah
0000001b call FF802120
00000020 mov edi,eax
DisplayMemory("After allocating 1 megabyte",
true);
00000022 mov ecx,dword ptr ds:[022B3040h]
00000028 mov edx,1
0000002d call dword ptr ds:[00913040h]
for (int i = 0; i < myBigArray.Length; i++)
00000033 xor esi,esi
00000035 mov ebx,dword ptr [edi+4]
00000038 test ebx,ebx
0000003a jle 0000004F
myBigArray[i] = (byte)(i % 255);
0000003c mov eax,esi
0000003e mov ecx,0FFh
00000043 cdq
00000044 idiv eax,ecx
00000046 mov byte ptr [edi+esi+8],dl
for (int i = 0; i < myBigArray.Length; i++)
0000004a inc esi
0000004b cmp ebx,esi
0000004d jg 0000003C
DisplayMemory("After manipulating 1 megabyte",
true);
0000004f mov ecx,dword ptr ds:[022B3044h]
00000055 mov edx,1
0000005a call dword ptr ds:[00913040h]
DisplayMemory("After releasing 1 megabyte",
true);
00000060 mov ecx,dword ptr ds:[022B3048h]
00000066 mov edx,1
0000006b call dword ptr ds:[00913040h]
00000071 pop ebx
}
00000072 pop esi
00000073 pop edi
00000074 ret
This time, the byte array is stored in the EDI register (ESI is used for the
index variable of the for-loop) at 00000020. However, this register isn't ever
cleared! The code to set the byte array to null has been completely removed. How
does the garbage collector know that it can be reclaimed? To find out, we'll
have to peek into the internal data structures of the CLR. Fortunately,
Microsoft has provided us with a tool to allow us to do so.
The SOS.DLL ("Son of Strike") debugger extension is one of the most
powerful tools available for debugging .NET applications. With it, we can
examine the inner workings of the CLR at runtime. However, because it is
so low-level, many .NET developers don't know that it exists or have never tried
using it. In the case of our byte array, figuring out why the memory is being
reclaimed earlier than we expect would be nearly impossible without SOS. It can
show us how the JIT optimizer provides the garbage collector with the
information that it needs to reclaim the memory.
Loading the SOS extension is easy. While still at the breakpoint, we can open
the Immediate Window by selecting Debug -> Windows -> Immediate from the Visual
Studio main menu. Then, we just have type the following into the Immediate
Window:
.load sos
We should be rewarded with the following message:
extension C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll loaded
First, we need to examine the "MethodDesc" for the current method. This data
structure describes important runtime details about a managed method. SOS provides a
command called "ip2md" (Instruction-Pointer-To-MethodDesc). According
to the documentation, "given an address in managed jitted code, ip2md attempts
to find the MethodDesc associated with it." Since we're interested in the
MethodDesc for the current method, we can pass the CPU's instruction pointer
(stored in the EIP register) to ip2md. To display the contents of the CPU
registers, we select Debug -> Windows -> Registers from the
Visual Studio main menu. Using the
current value of the EIP register, we type the
following command into the Intermediate Window (your EIP value may differ from
the one shown here):
!ip2md 01100073
SOS will display the MethodDesc information for the current method.
MethodDesc: 00912ff8
Method Name: Program.Main()
Class: 00911284
MethodTable: 00913008
mdToken: 06000002
Module: 00912c14
IsJitted: yes
m_CodeOrIL: 01100070
Now that we have the MethodDesc, its address can be used with the "gcinfo"
command to gather detailed information about how the garbage collector tracks
registers containing managed objects. I should point out that the gcinfo
documentation states that it is really intended for CLR developers and is "very
difficult for anyone else to make sense of." In other words, we're being a bit
naughty, but I'm OK with that. To use the gcinfo command with the MethodDesc
address, we type the following:
!gcinfo 00912ff8
And here's what comes back:
entry point 01100070
Normal JIT generated code
GC info 1120aae800911800
Method info block:
method size = 0075
prolog size = 3
epilog size = 4
epilog count = 1
epilog end = yes
callee-saved regs = EDI ESI EBX
ebp frame = no
fully interruptible= yes
double align = no
arguments size = 0 DWORDs
stack frame size = 0 DWORDs
untracked count = 0
var ptr tab count = 0
epilog at 0071
75 B1 40 |
Pointer table:
F0 49 | 0009 reg ECX becoming live
F0 08 | 0011 reg ECX becoming dead
F0 47 | 0020 reg EAX becoming live
7A | 0022 reg EDI becoming live
4E | 0028 reg ECX becoming live
F0 03 | 0033 reg EAX becoming dead
08 | 0033 reg ECX becoming dead
F2 3C | 004F reg EDI becoming dead
4E | 0055 reg ECX becoming live
F0 0B | 0060 reg ECX becoming dead
4E | 0066 reg ECX becoming live
F0 0B | 0071 reg ECX becoming dead
FF |
The highlighted lines in the pointer table show the range of instructions
where the EDI register (which holds the byte array) is live. While live, the EDI
register and not a
candidate for garbage collection. EDI becomes live at 00000022 and is considered
dead at 0000004f. If we refer back to the method's optimized JIT-compiled code, we can
highlight this range of instructions.
static
void Main()
{
DisplayMemory("Initial memory",
false);
00000000 push edi
00000001 push esi
00000002 push ebx
00000003 mov ecx,dword ptr ds:[022B303Ch]
00000009 xor edx,edx
0000000b call dword ptr ds:[00913040h]
byte[] myBigArray =
new byte[1024 * 1024];
00000011 mov edx,100000h
00000016 mov ecx,7915985Ah
0000001b call FF802120
00000020 mov edi,eax
DisplayMemory("After allocating 1 megabyte",
true);
00000022 mov ecx,dword ptr ds:[022B3040h]
00000028 mov edx,1
0000002d call dword ptr ds:[00913040h]
for (int i = 0; i < myBigArray.Length; i++)
00000033 xor esi,esi
00000035 mov ebx,dword ptr [edi+4]
00000038 test ebx,ebx
0000003a jle 0000004F
myBigArray[i] = (byte)(i % 255);
0000003c mov eax,esi
0000003e mov ecx,0FFh
00000043 cdq
00000044 idiv eax,ecx
00000046 mov byte ptr [edi+esi+8],dl
for (int i = 0; i < myBigArray.Length; i++)
0000004a inc esi
0000004b cmp ebx,esi
0000004d jg 0000003C
DisplayMemory("After manipulating 1 megabyte",
true);
0000004f mov ecx,dword ptr ds:[022B3044h]
00000055 mov edx,1
0000005a call dword ptr ds:[00913040h]
DisplayMemory("After releasing 1 megabyte",
true);
00000060 mov ecx,dword ptr ds:[022B3048h]
00000066 mov edx,1
0000006b call dword ptr ds:[00913040h]
00000071 pop ebx
}
00000072 pop esi
00000073 pop edi
00000074 ret
Since the EDI register is live only until DisplayMemory() is called at
0000004f, the garbage collector is free to reclaim the byte array's memory .
That's the magic. Although we thought that we were holding a strong reference to
the byte array, the JIT optimizer decided otherwise.
Like most magic, this puzzle turns out to have a simple solution in the end.
Instead of some convoluted reordering of code, the JIT optimizer simply marks a
range of instructions.
Spelunking the JIT optimizer has been a whirlwind adventure. Each step has taken us deeper under the hood of the CLR and revealed new sparkling gems of
information. Whether this information can be practically applied to daily coding
is unlikely. However, the tools that we have explored can help us to investigate some
of the most elusive bugs that we might encounter. Also, pinning down the garbage
collector has served to strengthen our faith in that magical creature's ability
to do its job. And finally, we have learned that the JIT optimizer is a sorcerer that
toils especially hard to work its own magic.