A few days ago I got an email from Sebastien (the originator of the Gendarme project) telling me that the rule I’d written for Gendarme was barfing with a NullReferenceException while processing Mono’s 1.0 build of System.DirectoryServices.dll. This of course is not good - NullReferenceException is always the fault of the programmer - if you haven’t checked to make sure a variable is non-null, you shouldn’t be using it :) The stack trace is below:

Error executing rules on assembly '/usr/lib/mono/1.0/System.DirectoryServices.dll'
Details: System.NullReferenceException: Object reference not set to an instance of an object
at Gendarme.Rules.Exceptions.DontDestroyStackTrace.GetNumPops (IInstruction instr)
at Gendarme.Rules.Exceptions.DontDestroyStackTrace.ProcessCatchPath ([snip])
at Gendarme.Rules.Exceptions.DontDestroyStackTrace.CheckMethod ([snip])
at Gendarme.Framework.Runner.Process (IAssemblyDefinition assembly)
at ConsoleRunner.Main (System.String[] args)

I had tested the DontDestroyStackTrace rule by running it against many of the larger assemblies in the Mono and MS .NET distributions. It even found some code in some of these assemblies that was improperly re-throwing exceptions, so it was obviously working normally under the majority of circumstances.

From examining the stack trace above, it’s easy to see that the exception is coming from the DontDestroyStackTrace.GetNumPops() function. This function’s job is to examine an IL instruction and determine the number of items that it will pop off of the stack. After poking through GetNumPops, I was convinced that something odd was up.

I fired up Gendarme in VS.NET and ran it against Mono 1.1.5’s shipped version of System.DirectoryServices.dll. Lo and behold, the same error popped up. Walking back up the stack, I determined that the error itself came during examination of the IL for the method System.DirectoryServices.DirectorySearcher.DoSearch().

The IL instruction that was causing the error was a simple ret. My first thought was “How did I miss this in GetNumPops()?”, which was immediately followed by “How on earth did this error not occur before? Seriously, ret isn’t exactly an uncommon instruction.” But then it dawned on me: ret is not a legal instruction inside of an IL catch handler. The only valid way to leave a catch handler in IL is via the leave[.s] instruction. Since the entirety of the IL inspection code in DontDestroyStackTrace is built to deal with catch blocks specifically, none of the tests had dealt with a ret instruction before.

At this point, it is obvious that somehow, somewhere, the wheels had come completely off the track. The instruction that was causing GetNumPops() to throw the NullReferenceException was the ret instruction at offset 0x1ec.

.method private hidebysig instance void DoSearch() cil managed
{
.maxstack 73
.locals init ( [snip...] )

[snip...]

L_00d0: ldloc.1
L_00d1: callvirt instance [snip: LdapConnection::Search(...)]
L_00d6: stloc.3
L_00d7: br L_01e1
L_00dc: ldnull
L_00dd: stloc.s entry1
L_00df: ldloc.3
L_00e0: callvirt instance [snip: LdapSearchResults::next()]
L_00e5: stloc.s entry1
L_00e7: leave L_0119
L_00ec: stloc.s exception1
L_00ee: ldloc.s exception1
L_00f0: callvirt instance int32 [snip: LdapException::get_ResultCode()]
L_00f5: stloc.s num2
L_00f7: ldloc.s num2
L_00f9: ldc.i4.3
L_00fa: beq L_010c
L_00ff: ldloc.s num2
L_0101: ldc.i4.4
L_0102: beq L_010c
L_0107: br L_0111
L_010c: br L_01e1
L_0111: ldloc.s exception1
L_0113: throw
L_0114: leave L_0119
L_0119: ldarg.0
L_011a: ldfld [snip: DirectorySearcher::_conn]
L_011f: newobj instance void [snip: DirectoryEntry::.ctor(...)]
L_0124: stloc.s entry2

[snip...]

L_01ec: ret
L_01ed: ret
.try L_00df to L_00ec catch [Novell.Directory.Ldap]Novell.Directory.Ldap.LdapException handler L_00ec to L_0119
}

At the bottom of the IL listing above, you can see the offending ret instruction. At the very bottom, you’ll see the catch handler block definition. Notice that the catch handler starts at 0x00ec and goes through 0x0119. The offending ret instruction is way outside of that range! That means that, for some reason, DontDestroyStackTrace is examing code that’s way outside of the defined catch block - which it’s not designed to do.

I ruled out the path-generation algorithm in DontDestroyStackTrace being at fault, because an error that let it generate paths that escaped catch blocks would have shown up very quickly a long time ago (every .NET function’s IL ends with a ret).

After going over all of the IL in the defined catch handler region, it struck me that the br at 0x010c was branching to an address outside of the defined catch handler region. This IL is illegal - the only instruction that can branch out of a catch handler is the leave instruction. The code in DontDestroyStackTrace’s path-generation algorithm was not built to check to ensure that all branches were legal, so it just happily kept going until it found a function-terminating instruction. The code that checks the generated paths, however, was not built to deal with ret’s, so it was barfing.

At this point, I’m thinking “Sweet - I found a compiler bug.” In order to file a bug report against something like this, you really need to get your ducks in a row, however. I needed a good explanation of how I found the bug and where, along with a small peice of code that would easily reproduce it for all to see. I tracked down the generated code to the following C#:

private void DoSearch() {
...
    while(lsc.hasMore())
    {
        LdapEntry nextEntry = null;
        try 
        {
            ...
        }
        catch(LdapException e)
        {
            switch (e.ResultCode) {
            case LdapException.SIZE_LIMIT_EXCEEDED:
            case LdapException.TIME_LIMIT_EXCEEDED:
                continue; // <- here is the problem - this isn't being compiled properly
            default :
                throw e;
            }
        }
        ...
}

The above code is perfectly valid. However, the compiler did not properly emit code for the continue instruction, which requires a branch back to the beginning of the while loop. Since the continue was inside a catch, it should have emitted a leave instead of the regular br.

I coded up a small snipped that mirrored the above code: a continue inside a switch in a catch block inside a while loop. I compiled it with mcs, and ran Gendarme against it.

No error.

I checked over the IL for my snippet - there were no illegal branches this time. I thought for sure that my code snippet would exhibit the problem. Then I checked through the bugzilla database for mono, and found a number of bug fixes for the compiler (which I had, because I regularly build mono from source on my machine) that recently addressed issues of illegal branches being generated inside catch blocks. To be doubly sure, I touched the DirectorySearcher.cs file and rebuilt. When I ran Gendarme against the newly compiled System.DirectoryServices.dll assembly, it didn’t report any errors.

So, what’s the moral of the story? When you find a bug, develop a reproducable test case for it, check the bug database for something similar, and finally, try to reproduce it using the current build from source. :)