<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Shape of Code &#187; if statement</title>
	<atom:link href="http://shape-of-code.coding-guidelines.com/tag/if-statement/feed/" rel="self" type="application/rss+xml" />
	<link>http://shape-of-code.coding-guidelines.com</link>
	<description></description>
	<lastBuildDate>Sun, 12 Feb 2012 20:42:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Estimating the quality of a compiler implemented in mathematics</title>
		<link>http://shape-of-code.coding-guidelines.com/2011/05/02/estimating-the-quality-of-a-compiler-implemented-in-mathematics/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2011/05/02/estimating-the-quality-of-a-compiler-implemented-in-mathematics/#comments</comments>
		<pubDate>Mon, 02 May 2011 23:23:41 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[compiler]]></category>
		<category><![CDATA[compiler testing]]></category>
		<category><![CDATA[faults]]></category>
		<category><![CDATA[formal methods]]></category>
		<category><![CDATA[if statement]]></category>
		<category><![CDATA[semantics]]></category>
		<category><![CDATA[validation suite]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=403</guid>
		<description><![CDATA[How can you tell if a language implementation done using mathematical methods lives up to the claims being made about it, without doing lots of work? Answers to the following questions should give you a good idea of the quality of the implementation, from a language specification perspective, at least for C. How long did [...]]]></description>
			<content:encoded><![CDATA[<p>How can you tell if a language implementation done using mathematical methods lives up to the claims being made about it, without doing lots of work?  Answers to the following questions should give you a good idea of the quality of the implementation, from a language specification perspective, at least for C.</p>
<ul>
<li>How long did it take you to write it?  I have yet to see any full implementation of a major language done in less than a man year; just understanding and handling the semantics, plus writing the test cases will take this long.  I would expect an answer of at least several man years</li>
<li>Which professional validation suites have you tested the implementation against?  Many man years of work have gone into the <a href="http://www.peren.com/">Perennial</a> and <a href="http://www.plumhall.com/">PlumHall</a> C validation suites and correctly processing either of them is a non-trivial task.  The <a href="http://gcc.gnu.org/onlinedocs/gccint/C-Tests.html">gcc test suite</a> is too light-weight to count.  The <a href="http://www.knosof.co.uk/whoguard.html">C Model Implementation</a> passed both</li>
<li>How many faults have you found in the C Standard that have been accepted by WG14 (DRs for <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr.htm">C90</a> and <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm">C99</a>)?  Everybody I know who has created a full implementation of a C front end based on the text of the C Standard has found faults in the existing wording.  Creating a high quality formal definition requires great attention to detail and it is to be expected that some ambiguities/inconsistencies will be found in the Standard.  C Model Implementation project discoveries include <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_017.html">these</a> and <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_040.html">these</a>.</li>
<li>How many &#8216;rules&#8217; does the implementation contain?  For the C Model Implementation (originally written in Pascal and then translated to C) every <em>if-statement</em> it contained was cross referenced to either a requirement in the C90 standard or to an internal documentation reference; there were 1,327 references to the Environment and Language clauses (200 of which were in the preprocessor and 187 involved syntax).  My <a href="http://www.knosof.co.uk/cbook">C99 book</a> lists 2,043 sentences in the equivalent clauses, consistent with a 70% increase in page count over C90.  The page count for C1X is around 10% greater than C99.  So for a formal definition of C99 or C1X we are looking for at around 2,000 language specific &#8216;rules&#8217; plus others associated with internal housekeeping functions.</li>
<li>What percentage of the implementation is executed by test cases?  How do you know code/mathematics works if it has not been tested?  The front end of the C Model Implementation contains 6,900 basic blocks of which 87 are not executed by any test case (98.7% coverage); most of the unexecuted basic blocks require unusual error conditions to occur, e.g., disc full, and we eventually gave up trying to figure out whether a small number of them were dead code or just needed the right form of input (these days <a href="http://shape-of-code.coding-guidelines.com/2009/11/27/software-maintenance-via-genetic-programming/">genetic programming</a> could be used to help out and also to improve the quality of coverage to something like say <a href="http://en.wikipedia.org/wiki/Modified_Condition/Decision_Coverage">MC/DC</a>, but developing on a PC with a 16M hard disc does limit what can be done {the later arrival of a Sun 4 with 32M of RAM was mind blowing}).</li>
</ul>
<p>Other suggested questions or numbers applicable to other languages most welcome.  Some <a href="http://www.knosof.co.uk/vulnerabilities/langconform.pdf">forms of language definition</a> do not include a written specification, which makes any measurement of implementation conformance problematic.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2011%2F05%2F02%2Festimating-the-quality-of-a-compiler-implemented-in-mathematics%2F&amp;title=Estimating%20the%20quality%20of%20a%20compiler%20implemented%20in%20mathematics" id="wpa2a_2"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2011/05/02/estimating-the-quality-of-a-compiler-implemented-in-mathematics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Estimating variance when measuring source</title>
		<link>http://shape-of-code.coding-guidelines.com/2009/10/08/estimating-variance-when-measuring-source/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2009/10/08/estimating-variance-when-measuring-source/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 19:25:11 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[average]]></category>
		<category><![CDATA[binomial distribution]]></category>
		<category><![CDATA[case-label]]></category>
		<category><![CDATA[error bars]]></category>
		<category><![CDATA[if statement]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[source code]]></category>
		<category><![CDATA[source measurement]]></category>
		<category><![CDATA[switch]]></category>
		<category><![CDATA[variance]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=121</guid>
		<description><![CDATA[Yesterday I finally delivered a paper on if/switch usage measurements to the ACCU magazine editor and today I read about a switch statement usage that if common, would invalidate a chunk of my results. Does anything jump out at you in the following snippet? switch &#40;x&#41; &#123; case 1: &#123; z++; ... break; &#125; ... [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I finally delivered a paper on if/switch usage measurements to the <a href="http://www.accu.org">ACCU</a> magazine editor and today I <a href="http://peeterjoot.wordpress.com/2009/10/02/switch-in-c-confused-by-freeform-text/">read about</a> a <code>switch</code> statement usage that if common, would invalidate a chunk of my results.  Does anything jump out at you in the following snippet?</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">switch</span> <span style="color: #009900;">&#40;</span>x<span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
   <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>
             <span style="color: #009900;">&#123;</span>
             z<span style="color: #339933;">++;</span>
             ...
             <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
             <span style="color: #009900;">&#125;</span>
...</pre></div></div>

<p>Yes, those <code>{  }</code> delimiting the case-labeled statement sequence.  A quick check of my C source benchmarks showed this usage occurring in around 1% of case-labels.  Panic over.</p>
<p>What is the statistical significance, i.e., <a href="http://en.wikipedia.org/wiki/Variance">variance</a>, of that 1%?  Have I simply measured an unrepresentative sample, what would be a representative sample and what would be the expected variance within a representative sample?</p>
<p>I am interested in commercial software development and so I have selected half a dozen or so largish code bases as my source benchmark, preferably written in a commercial environment even if currently available as Open source.  I would prefer this benchmark to be an order of magnitude larger and perhaps I will get around to adding more programs soon.</p>
<p>My if/switch measurements were aimed at finding usage characteristics that varied between the two kinds of selection statements. One characteristic measured was the number of equality tests in the associated controlling expression.  For instance, in:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span> <span style="color: #339933;">||</span> x <span style="color: #339933;">==</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
   z<span style="color: #339933;">--;</span>
<span style="color: #b1b100;">else</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span>
   z<span style="color: #339933;">++;</span></pre></div></div>

<p>the first controlling expression contains two equality tests and the second one equality test.</p>
<p>Plotting the percentage of equality tests that occur in the controlling expressions of <a href="http://shape-of-code.coding-guidelines.com/2009/08/108/">if-if/if-else-if</a> sequences and <code>switch</code> statements we get the following:</p>
<p><img src="http://www.coding-guidelines.com/images/numeqvar.jpg" alt="Number of quality tests in controlling expression" /></p>
<p>Do these results indicate that if-if/if-else-if sequences and <code>switch</code> statements differ in the number of equality tests contained in their controlling expressions?  If I measured a completely different set of source code, would the results be very different?</p>
<p>To answer this question a probability model is needed. Take as an example the controlling expressions present in an if-if sequence.  If each controlling expression is independent of the others, then the probability of two equality tests, for instance, occurring in any of these expressions is constant and thus given a large sample the distribution of two equality tests in the source has a <a href="http://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>.  The same argument can be applied to other numbers of equality tests and other kinds of sequence.</p>
<p><img src="http://www.coding-guidelines.com/images/numequnk.jpg" alt="Number of quality tests in controlling expression, with error bars" /></p>
<p>For each measurement point in the above plot the associated error bars span the square-root of the variance of that point (assuming a binomial distribution, for a <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal distribution</a> the length of this span is known as the standard deviation).  The error bars overlap suggesting that the apparent difference in percentage of equality tests in each kind of sequence is not statistically significant.</p>
<p>The existence of some dependency between controlling expression equality tests would invalidate this simply analysis, or at least reduce its reliability.  I did notice that in a sequence that containing two equality tests, the controlling expression that contained it tended to appear later in the sequence (the reverse of the example given above).  Did I notice this because I tend to write this way?  A question for another day.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2009%2F10%2F08%2Festimating-variance-when-measuring-source%2F&amp;title=Estimating%20variance%20when%20measuring%20source" id="wpa2a_4"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2009/10/08/estimating-variance-when-measuring-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To if-else-if or if-if, that is the question</title>
		<link>http://shape-of-code.coding-guidelines.com/2009/08/21/108/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2009/08/21/108/#comments</comments>
		<pubDate>Fri, 21 Aug 2009 01:33:14 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[empirical]]></category>
		<category><![CDATA[cost/benefit]]></category>
		<category><![CDATA[else]]></category>
		<category><![CDATA[if statement]]></category>
		<category><![CDATA[source code]]></category>
		<category><![CDATA[switch-statement]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=108</guid>
		<description><![CDATA[I am currently measuring if-statements, occurring in visible source, that might be mapped to an equivalent switch-statement. The most obvious usage to look for is a sequence of if-else-if statements that all involve the same expression being tested against an integer constant, as in if &#40;x == 1&#41; stmt_1; else if &#40;x == 2&#41; stmt_2; [...]]]></description>
			<content:encoded><![CDATA[<p>I am currently measuring <em>if-statement</em>s, occurring in visible source, that might be mapped to an equivalent <em>switch-statement</em>.  The most obvious usage to look for is a sequence of <em>if-else-if</em> statements that all involve the same expression being tested against an integer constant, as in</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
   stmt_1<span style="color: #339933;">;</span>
<span style="color: #b1b100;">else</span>
   <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
      stmt_2<span style="color: #339933;">;</span>
   <span style="color: #b1b100;">else</span>
      <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span>
         stmt_3<span style="color: #339933;">;</span></pre></div></div>

<p>Another possible sequence is:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
   stmt_1<span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
   stmt_2<span style="color: #339933;">;</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span>
   stmt_3<span style="color: #339933;">;</span></pre></div></div>

<p>provided all but the last conditionally executed arms do not change the value of the common control variable (e.g., <code>x</code>).</p>
<p>I started to wonder about what would cause a developer to chose one of these forms over the other.  Perhaps the <em>if-if</em> form would be used when it was obvious that the common conditional variable was not modified in the conditionally executed arm. This would imply that there would be more statements in the arms of <em>if-else-if</em> sequences than <em>if-if</em> sequences.  The following plot of percentage occurrence (over all detected <em>if-else-if</em>/<em>if-if</em> forms) of line number difference between pars of associated if-statements (e.g., when the controlling expression occurs on line <equ>x</equ> and the following if-statement controlling expression occurs on line <equ>x+2</equ> the distance is 2) shows that this is not the case:</p>
<p><img src="http://www.coding-guidelines.com/images/ifarmdist.jpg" alt="Lines between if-statement controlling expressions" /></p>
<p><del datetime="2009-08-21T13:35:05+00:00">Just over a quarter of the arms contain a single statement (or to be exact the code is contained on a single line); this suggests that when using the <em>if-else-if</em> form most developers put the <code>else</code> and <code>if</code> on the same line.  At the next distance along the percentage of <em>if-else-if</em> forms is twice as great as the <em>if-if</em>, probably because of <code>else</code> and <code>if</code> appearing on separate lines (as in the introductory example) in one case and less frequently a comment/blank line in the other.  Next along, why the big increase in <em>if-if</em> forms?  A comment + blank line, or perhaps no comment or blank line but the use of curly brackets (this is too off the track of where I am supposed to be going to investigate).</del></p>
<p><ins datetime="2009-08-21T13:35:05+00:00">This morning I realized why the original plot did not look right, one of the data sets was a way off adding to 100%.  An updated version has been uploaded.</p>
<p>It turns out that a single statement (or at least a single line) is more common in the <em>if-else-if</em> form, the opposite of what I had expected.  At slightly larger distances there are still differences that can be attributed to <code>else</code> and <code>if</code> appearing on separate lines, curly brackets and a comment/blank line, but the effect is not as large as seen in the original, less accurate, plot.</ins></p>
<p>I have a feeling that I ought to say something about the <em>if-else-if</em> form being preferred to the <em>if-if</em> form.  One of the forms will have its behavior changed if the common control variable is modified in one of its arms.  But is this an intended or unintended behavior?  What is the typical characteristic usage of a common control variable, e.g., do they tend to be accessed but not modified in a given function definition?  At the moment I see no obvious cost or benefit strongly favoring one usage over the other, so I will remain silent on the issue.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2009%2F08%2F21%2F108%2F&amp;title=To%20if-else-if%20or%20if-if%2C%20that%20is%20the%20question" id="wpa2a_6"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2009/08/21/108/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Implementing the between operation</title>
		<link>http://shape-of-code.coding-guidelines.com/2009/07/30/implementing-the-between-operation/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2009/07/30/implementing-the-between-operation/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 01:09:08 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[empirical]]></category>
		<category><![CDATA[between]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[if statement]]></category>
		<category><![CDATA[semantics]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=106</guid>
		<description><![CDATA[What code do developers write to check whether a value lies between two bounds (i.e., a between operation)?  I would write (where MIN and MAX might be symbolic names or numeric literals): if &#40; x &#62;= MIN &#38;&#38; x &#60;= MAX &#41; that is I would check the lowest value first. Performing the test in [...]]]></description>
			<content:encoded><![CDATA[<p>What code do developers write to check whether a value lies between two bounds (i.e., a between operation)?  I would write (where <code>MIN</code> and <code>MAX</code> might be symbolic names or numeric literals):</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">   <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span> x <span style="color: #339933;">&gt;=</span> MIN <span style="color: #339933;">&amp;&amp;</span> x <span style="color: #339933;">&lt;=</span> MAX <span style="color: #009900;">&#41;</span></pre></div></div>

<p>that is I would check the lowest value first.  Performing the test in this order just seems the natural thing to do, perhaps because I live in a culture that writes left to write and a written sequence of increasing numbers usually has the lowest number on the left.</p>
<p>I am currently measuring various forms of <em>if-statement</em> conditional expressions that occur in visible source as part of some research on if/switch usage by developers and the <em>between</em> operation falls within the set of expressions of interest.  I was not expecting to see any usage of the form:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">   <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span> x <span style="color: #339933;">&lt;=</span> MAX <span style="color: #339933;">&amp;&amp;</span> x <span style="color: #339933;">&gt;=</span> MIN <span style="color: #009900;">&#41;</span></pre></div></div>

<p>that is with the maximum value appearing first.  The first program measured  threw up seven instances of this usage, all with the minimum value being negative and in five cases the maximum value being zero.  Perhaps left to right ordering still applied, but to the absolute value of the bounds.</p>
<p>Measurements of the second and subsequent programs threw up instances that did not follow any of the patterns I had dreamt up.  Of the 326 <em>between</em> operations appearing in the measured source 24 had what I consider to be the unnatural order.  Presumably the developers using this form of <em>between</em> consider it to be natural, so what is their line of thinking?  Are they thinking in terms of the semantics behind the numbers (in about a third of cases symbolic constants appear in the source rather than literals) and this semantics has an implied left to right order?  Perhaps the authors come from a culture where the maximum value often appears on the left.</p>
<p>Suggestions welcome.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2009%2F07%2F30%2Fimplementing-the-between-operation%2F&amp;title=Implementing%20the%20between%20operation" id="wpa2a_8"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2009/07/30/implementing-the-between-operation/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

