<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Shape of Code &#187; average</title>
	<atom:link href="http://shape-of-code.coding-guidelines.com/tag/average/feed/" rel="self" type="application/rss+xml" />
	<link>http://shape-of-code.coding-guidelines.com</link>
	<description></description>
	<lastBuildDate>Sun, 29 Jan 2012 23:49:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Estimating variance when measuring source</title>
		<link>http://shape-of-code.coding-guidelines.com/2009/10/08/estimating-variance-when-measuring-source/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2009/10/08/estimating-variance-when-measuring-source/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 19:25:11 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[average]]></category>
		<category><![CDATA[binomial distribution]]></category>
		<category><![CDATA[case-label]]></category>
		<category><![CDATA[error bars]]></category>
		<category><![CDATA[if statement]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[source code]]></category>
		<category><![CDATA[source measurement]]></category>
		<category><![CDATA[switch]]></category>
		<category><![CDATA[variance]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=121</guid>
		<description><![CDATA[Yesterday I finally delivered a paper on if/switch usage measurements to the ACCU magazine editor and today I read about a switch statement usage that if common, would invalidate a chunk of my results. Does anything jump out at you in the following snippet? switch &#40;x&#41; &#123; case 1: &#123; z++; ... break; &#125; ... [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I finally delivered a paper on if/switch usage measurements to the <a href="http://www.accu.org">ACCU</a> magazine editor and today I <a href="http://peeterjoot.wordpress.com/2009/10/02/switch-in-c-confused-by-freeform-text/">read about</a> a <code>switch</code> statement usage that if common, would invalidate a chunk of my results.  Does anything jump out at you in the following snippet?</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">switch</span> <span style="color: #009900;">&#40;</span>x<span style="color: #009900;">&#41;</span>
   <span style="color: #009900;">&#123;</span>
   <span style="color: #b1b100;">case</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>
             <span style="color: #009900;">&#123;</span>
             z<span style="color: #339933;">++;</span>
             ...
             <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
             <span style="color: #009900;">&#125;</span>
...</pre></div></div>

<p>Yes, those <code>{  }</code> delimiting the case-labeled statement sequence.  A quick check of my C source benchmarks showed this usage occurring in around 1% of case-labels.  Panic over.</p>
<p>What is the statistical significance, i.e., <a href="http://en.wikipedia.org/wiki/Variance">variance</a>, of that 1%?  Have I simply measured an unrepresentative sample, what would be a representative sample and what would be the expected variance within a representative sample?</p>
<p>I am interested in commercial software development and so I have selected half a dozen or so largish code bases as my source benchmark, preferably written in a commercial environment even if currently available as Open source.  I would prefer this benchmark to be an order of magnitude larger and perhaps I will get around to adding more programs soon.</p>
<p>My if/switch measurements were aimed at finding usage characteristics that varied between the two kinds of selection statements. One characteristic measured was the number of equality tests in the associated controlling expression.  For instance, in:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span> <span style="color: #339933;">||</span> x <span style="color: #339933;">==</span> <span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
   z<span style="color: #339933;">--;</span>
<span style="color: #b1b100;">else</span> <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>x <span style="color: #339933;">==</span> <span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span>
   z<span style="color: #339933;">++;</span></pre></div></div>

<p>the first controlling expression contains two equality tests and the second one equality test.</p>
<p>Plotting the percentage of equality tests that occur in the controlling expressions of <a href="http://shape-of-code.coding-guidelines.com/2009/08/108/">if-if/if-else-if</a> sequences and <code>switch</code> statements we get the following:</p>
<p><img src="http://www.coding-guidelines.com/images/numeqvar.jpg" alt="Number of quality tests in controlling expression" /></p>
<p>Do these results indicate that if-if/if-else-if sequences and <code>switch</code> statements differ in the number of equality tests contained in their controlling expressions?  If I measured a completely different set of source code, would the results be very different?</p>
<p>To answer this question a probability model is needed. Take as an example the controlling expressions present in an if-if sequence.  If each controlling expression is independent of the others, then the probability of two equality tests, for instance, occurring in any of these expressions is constant and thus given a large sample the distribution of two equality tests in the source has a <a href="http://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>.  The same argument can be applied to other numbers of equality tests and other kinds of sequence.</p>
<p><img src="http://www.coding-guidelines.com/images/numequnk.jpg" alt="Number of quality tests in controlling expression, with error bars" /></p>
<p>For each measurement point in the above plot the associated error bars span the square-root of the variance of that point (assuming a binomial distribution, for a <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal distribution</a> the length of this span is known as the standard deviation).  The error bars overlap suggesting that the apparent difference in percentage of equality tests in each kind of sequence is not statistically significant.</p>
<p>The existence of some dependency between controlling expression equality tests would invalidate this simply analysis, or at least reduce its reliability.  I did notice that in a sequence that containing two equality tests, the controlling expression that contained it tended to appear later in the sequence (the reverse of the example given above).  Did I notice this because I tend to write this way?  A question for another day.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2009%2F10%2F08%2Festimating-variance-when-measuring-source%2F&amp;title=Estimating%20variance%20when%20measuring%20source" id="wpa2a_2"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2009/10/08/estimating-variance-when-measuring-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Average distance between two fields</title>
		<link>http://shape-of-code.coding-guidelines.com/2008/12/02/average-distance-between-two-fields/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2008/12/02/average-distance-between-two-fields/#comments</comments>
		<pubDate>Wed, 03 Dec 2008 00:39:47 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Datatypes]]></category>
		<category><![CDATA[average]]></category>
		<category><![CDATA[datatype]]></category>
		<category><![CDATA[distance]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=12</guid>
		<description><![CDATA[If I randomly pick two fields from an aggregate type definition containing N fields what will be the average distance between them (adjacent fields have distance 1, if separated by one field they have distance 2, separated by two fields they have distance 3 and so on)? For example, a struct containing five fields has [...]]]></description>
			<content:encoded><![CDATA[<p>If I randomly pick two fields  from an aggregate type definition containing N fields what will be the average distance between them (adjacent fields have distance 1, if separated by one field they have distance 2, separated by two fields they have distance 3 and so on)?</p>
<p>For example, a <code>struct</code> containing five fields has four field pairs having distance 1 from each other, three distance 2, two distance 2, and one field pair having distance 4; the average is 2.</p>
<p>The surprising answer, to me at least, is (N+1)/3.</p>
<p><strong>Proof</strong>: The average distance can be obtained by summing the distances between all possible field pairs and dividing this value by the number of possible different pairs.</p>
<pre>                  Distance 1  2  3  4  5  6
Number of fields
            4              3  2  1
            5              4  3  2  1
            6              5  4  3  2  1
            7              6  5  4  3  2  1</pre>
<p>The above table shows the pattern that occurs as the number of fields in a definition increases.</p>
<p>In the case of a definition containing five fields the sum of the distances of all field pairs is: (4*1 + 3*2 + 2*3 + 1*4) and the number of different pairs is: (4+3+2+1). Dividing these two values gives the average distance between two randomly chosen fields, e.g., 2.</p>
<p>Summing the distance over every field pair for a definition containing 3, 4, 5, 6, 7, 8, &#8230; fields gives the sequence: 1, 4, 10, 20, 35, 56, &#8230; This is sequence <a href="http://www.research.att.com/~njas/sequences/A000292">A000292</a> in the On-Line Encyclopedia of Integer sequences and is given by the formula n*(n+1)*(n+2)/6 (where n = N − 1, i.e., the number of fields minus 1).</p>
<p>Summing the number of different field pairs for definitions containing increasing numbers of fields gives the sequence: 1, 3, 6, 10, 15, 21, 28, &#8230; This is sequence <a href="http://www.research.att.com/~njas/sequences/A000217">A000217</a> and is given by the formula n*(n + 1)/2.</p>
<p>Dividing these two formula and simplifying yields (N + 1)/3.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2008%2F12%2F02%2Faverage-distance-between-two-fields%2F&amp;title=Average%20distance%20between%20two%20fields" id="wpa2a_4"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2008/12/02/average-distance-between-two-fields/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

