<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>"You Are Hearing Me Talk"</title>
	<link>http://blog.russellsprojects.com</link>
	<description>... about programming stuff</description>
	<pubDate>Wed, 15 Nov 2006 14:55:15 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.2</generator>
	<language>en</language>
			<item>
		<title>Accidentally finding a bug that doesn&#8217;t exist&#8230;</title>
		<link>http://blog.russellsprojects.com/?p=3</link>
		<comments>http://blog.russellsprojects.com/?p=3#comments</comments>
		<pubDate>Sat, 12 Aug 2006 23:20:24 +0000</pubDate>
		<dc:creator>Russ</dc:creator>
		
	<category>.NET Programming</category>
	<category>Gendarme</category>
	<category>Mono</category>
		<guid isPermaLink="false">http://blog.russellsprojects.com/?p=3</guid>
		<description><![CDATA[<p>A few days ago I got an email from Sebastien (the originator of the Gendarme project) telling me that the rule I&#8217;d written for Gendarme was barfing with a NullReferenceException while processing Mono&#8217;s 1.0 build of System.DirectoryServices.dll.  This of course is not good - NullReferenceException is always the fault of the programmer - if [...]</p>
]]></description>
			<content:encoded><![CDATA[<p>A few days ago I got an email from Sebastien (the originator of the Gendarme project) telling me that the rule I&#8217;d written for <a target="_blank" title="Gendarme" href="http://www.mono-project.com/Gendarme">Gendarme</a> was barfing with a <code>NullReferenceException</code> while processing Mono&#8217;s 1.0 build of <code>System.DirectoryServices.dll</code>.  This of course is not good - <code>NullReferenceException</code> is always the fault of the programmer - if you haven&#8217;t checked to make sure a variable is non-<code>null</code>, you shouldn&#8217;t be using it <img src='http://blog.russellsprojects.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   The stack trace is below:</p>

<pre>
Error executing rules on assembly '/usr/lib/mono/1.0/System.DirectoryServices.dll'
Details: System.NullReferenceException: Object reference not set to an instance of an object
at Gendarme.Rules.Exceptions.DontDestroyStackTrace.GetNumPops (IInstruction instr)
at Gendarme.Rules.Exceptions.DontDestroyStackTrace.ProcessCatchPath ([snip])
at Gendarme.Rules.Exceptions.DontDestroyStackTrace.CheckMethod ([snip])
at Gendarme.Framework.Runner.Process (IAssemblyDefinition assembly)
at ConsoleRunner.Main (System.String[] args)
</pre>

<p>I had tested the <code>DontDestroyStackTrace</code> rule by running it against many of the larger assemblies in the Mono and MS .NET distributions.  It even found some code in some of these assemblies that was improperly re-throwing exceptions, so it was obviously working normally under the majority of circumstances.</p>

<p>From examining the stack trace above, it&#8217;s easy to see that the exception is coming from the <code>DontDestroyStackTrace.GetNumPops()</code> function.  This function&#8217;s job is to examine an IL instruction and determine the number of items that it will pop off of the stack.  After poking through <code>GetNumPops</code>, I was convinced that something odd was up.</p>

<p>I fired up Gendarme in VS.NET and ran it against Mono 1.1.5&#8217;s shipped version of <code>System.DirectoryServices.dll</code>.  Lo and behold, the same error popped up.  Walking back up the stack, I determined that the error itself came during examination of the IL for the method <code>System.DirectoryServices.DirectorySearcher.DoSearch()</code>.</p>

<p>The IL instruction that was causing the error was a simple <code>ret</code>.  My first thought was &#8220;How did I miss this in <code>GetNumPops()</code>?&#8221;, which was immediately followed by &#8220;How on earth did this error not occur before? Seriously, <code>ret</code> isn&#8217;t exactly an uncommon instruction.&#8221;  But then it dawned on me: <code>ret</code> is <strong>not</strong> a legal instruction inside of an IL <code>catch</code> handler.  The only valid way to leave a <code>catch</code> handler in IL is via the <code>leave[.s]</code> instruction.  Since the entirety of the IL inspection code in <code>DontDestroyStackTrace</code> is built to deal with <code>catch</code> blocks specifically, none of the tests had dealt with a <code>ret</code> instruction before.</p>

<p>At this point, it is obvious that somehow, somewhere, the wheels had come completely off the track.  The instruction that was causing <code>GetNumPops()</code> to throw the <code>NullReferenceException</code> was the <code>ret</code> instruction at offset <code>0x1ec</code>.</p>

<pre>.method private hidebysig instance void DoSearch() cil managed
{
.maxstack 73
.locals init ( [snip...] )

[snip...]

L_00d0: ldloc.1
L_00d1: callvirt instance [snip: LdapConnection::Search(...)]
L_00d6: stloc.3
L_00d7: br L_01e1
L_00dc: ldnull
L_00dd: stloc.s entry1
L_00df: ldloc.3
L_00e0: callvirt instance [snip: LdapSearchResults::next()]
L_00e5: stloc.s entry1
L_00e7: leave L_0119
L_00ec: stloc.s exception1
L_00ee: ldloc.s exception1
L_00f0: callvirt instance int32 [snip: LdapException::get_ResultCode()]
L_00f5: stloc.s num2
L_00f7: ldloc.s num2
L_00f9: ldc.i4.3
L_00fa: beq L_010c
L_00ff: ldloc.s num2
L_0101: ldc.i4.4
L_0102: beq L_010c
L_0107: br L_0111
L_010c: br L_01e1
L_0111: ldloc.s exception1
L_0113: throw
L_0114: leave L_0119
L_0119: ldarg.0
L_011a: ldfld [snip: DirectorySearcher::_conn]
L_011f: newobj instance void [snip: DirectoryEntry::.ctor(...)]
L_0124: stloc.s entry2

[snip...]

L_01ec: ret
L_01ed: ret
.try L_00df to L_00ec catch [Novell.Directory.Ldap]Novell.Directory.Ldap.LdapException handler L_00ec to L_0119
}</pre>
<p>
At the bottom of the IL listing above, you can see the offending <code>ret</code> instruction.  At the very bottom, you&#8217;ll see the <code>catch</code> handler block definition.  Notice that the <code>catch</code> handler starts at <code>0x00ec</code> and goes through <code>0x0119</code>.  The offending <code>ret</code> instruction is way outside of that range!  That means that, for some reason, <code>DontDestroyStackTrace</code> is examing code that&#8217;s way outside of the defined <code>catch</code> block - which it&#8217;s not designed to do.
</p>
<p>
I ruled out the path-generation algorithm in <code>DontDestroyStackTrace</code> being at fault, because an error that let it generate paths that escaped <code>catch</code> blocks would have shown up very quickly a long time ago (every .NET function&#8217;s IL ends with a <code>ret</code>).
<p>
After going over all of the IL in the defined <code>catch</code> handler region, it struck me that the <code>br</code> at <code>0x010c</code> was branching to an address outside of the defined <code>catch</code> handler region.  This IL is illegal - the only instruction that can branch out of a <code>catch</code> handler is the <code>leave</code> instruction.  The code in <code>DontDestroyStackTrace</code>&#8217;s path-generation algorithm was not built to check to ensure that all branches were legal, so it just happily kept going until it found a function-terminating instruction.  The code that checks the generated paths, however, was not built to deal with <code>ret</code>&#8217;s, so it was barfing.
</p>
<p>
At this point, I&#8217;m thinking &#8220;Sweet - I found a compiler bug.&#8221;  In order to file a bug report against something like this, you really need to get your ducks in a row, however.  I needed a good explanation of how I found the bug and where, along with a small peice of code that would easily reproduce it for all to see.  I tracked down the generated code to the following C#:
</p>
<pre>
private void DoSearch() {
...
    while(lsc.hasMore())
    {
        LdapEntry nextEntry = null;
        try 
        {
            ...
        }
        catch(LdapException e)
        {
            switch (e.ResultCode) {
            case LdapException.SIZE_LIMIT_EXCEEDED:
            case LdapException.TIME_LIMIT_EXCEEDED:
                continue; // <- here is the problem - this isn't being compiled properly
            default :
                throw e;
            }
        }
        ...
}
</pre>

<p>
The above code is perfectly valid.  However, the compiler did not properly emit code for the <code>continue</code> instruction, which requires a branch back to the beginning of the <code>while</code> loop.  Since the <code>continue</code> was inside a <code>catch</code>, it should have emitted a <code>leave</code> instead of the regular <code>br</code>.
</p>

<p>
I coded up a small snipped that mirrored the above code: a <code>continue</code> inside a <code>switch</code> in a <code>catch</code> block inside a <code>while</code> loop.  I compiled it with mcs, and ran Gendarme against it.
</p>

<p>
No error.
</p>

<p>
I checked over the IL for my snippet - there were no illegal branches this time.  I thought for sure that my code snippet would exhibit the problem.  Then I checked through the bugzilla database for mono, and found a number of bug fixes for the compiler (which I had, because I regularly build mono from source on my machine) that recently addressed issues of illegal branches being generated inside <code>catch</code> blocks.  To be doubly sure, I <code>touch</code>ed the <code>DirectorySearcher.cs</code> file and rebuilt.  When I ran Gendarme against the newly compiled <code>System.DirectoryServices.dll</code> assembly, it didn&#8217;t report any errors.
</p>

<p>
So, what&#8217;s the moral of the story?  <strong>When you find a bug, develop a reproducable test case for it, check the bug database for something similar, and finally, try to reproduce it using the current build from source</strong>. <img src='http://blog.russellsprojects.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> 
</p>
]]></content:encoded>
			<wfw:commentRSS>http://blog.russellsprojects.com/?feed=rss2&amp;p=3</wfw:commentRSS>
		</item>
	</channel>
</rss>
