<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Shape of Code &#187; probability distribution</title>
	<atom:link href="http://shape-of-code.coding-guidelines.com/tag/probability-distribution/feed/" rel="self" type="application/rss+xml" />
	<link>http://shape-of-code.coding-guidelines.com</link>
	<description></description>
	<lastBuildDate>Sun, 12 Feb 2012 20:42:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Benford&#8217;s law and numeric literals in source code</title>
		<link>http://shape-of-code.coding-guidelines.com/2008/12/13/benfords-law-and-numeric-literals-in-source-code/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2008/12/13/benfords-law-and-numeric-literals-in-source-code/#comments</comments>
		<pubDate>Sat, 13 Dec 2008 01:02:40 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[data analysis]]></category>
		<category><![CDATA[Benford's law]]></category>
		<category><![CDATA[C source]]></category>
		<category><![CDATA[literals]]></category>
		<category><![CDATA[probability distribution]]></category>
		<category><![CDATA[random sample]]></category>
		<category><![CDATA[scale invariant]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=30</guid>
		<description><![CDATA[Benford&#8217;s law applies to values derived from a surprising number number of natural and man-made processes. I was very optimistic that it would also apply to numeric literals in source code. Measurements of C source showed that I was wrong (the chi-square fit was 1,680 for decimal integer literals and 132,398 for floating literals). Probability [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Benford's_law">Benford&#8217;s law</a> applies to values derived from a surprising number number of natural and man-made processes.  I was very optimistic that it would also apply to numeric literals in source code.  Measurements of C source showed that I was wrong (the chi-square fit was 1,680 for decimal integer literals and 132,398 for floating literals).</p>
<p><img src="http://www.coding-guidelines.com/images/benfordint.jpg" alt="Image goes here." /></p>
<p>Probability that the leading digit of an (decimal or hexadecimal) integer literal has a particular value (dotted lines predicted by Benford&#8217;s law).</p>
<p>What are the conditions necessary for a sample of values to follow Benford&#8217;s law?  A number of circumstances have been found to result in sample values having a leading digit that follows Benford&#8217;s law, including:</p>
<li>Selecting <a href="http://www.tphill.net/publications/BENFORD%20PAPERS/statisticalDerivationSigDigitLaw1995.pdf">random samples from different sets of values</a> where each set has a different <a href="http://en.wikipedia.org/wiki/Probability_distribution">probability distribution</a> (i.e, select the distributions at random and then collect a sample of values from each of these distributions)</li>
<li>If the sample <a href="http://www.tphill.net/publications/BENFORD%20PAPERS/baseInvarianceBenford1995.pdf">values are derived from a process</a> that is <a href="http://en.wikipedia.org/wiki/Scale_invariance">scale invariant</a>.</li>
<li>If the sample values are derived from a process that involves <a href="cswww.essex.ac.uk/technical-reports/2001/CSM-349.pdf">multiplying independent values</a> having a uniform distribution.</li>
<p>Samples that have been found to follow Benford&#8217;s law include lists of physical constants and accounting data (so much so that it has been used to detect <a href="http://www.journalofaccountancy.com/Issues/1999/May/nigrini.htm">accounting fraud</a>).  However, the number of data-sets containing values whose leading digit follows Benford&#8217;s law is <a href="cswww.essex.ac.uk/technical-reports/2001/CSM-349.pdf">not a great as some would make us believe</a>.</p>
<p>Why don&#8217;t the leading digits of numeric literals in source code follow Benford&#8217;s law?</p>
<li>Perhaps small values are over represented because they are used as offsets to access the storage either side of some pointer (in C/C++/Java/(not Pascal/Fortran) the availability of the <code>++</code>/<code>--</code> operators reduces the number of instances of  <code>1</code> to increment/decrement a value).  But this only applies to integer types, not floating types</li>
<p><img src="http://www.coding-guidelines.com/images/benfordflt.jpg" alt="Image goes here." /></p>
<p>Probability that the leading, first non-zero, digit of a floating literal has a particular value (dashed line predicted by Benford&#8217;s law).</p>
<li>Perhaps there exists a high degree of correlation between the value of literals.  I&#8217;m not yet sure how to look for this.</li>
<li>Why is there a huge spike at <code>5</code> for the floating-point literals?  Have values been rounded to produce <code>0.5</code>?  This looks like an area where methods used for accounting fraud detection might be applied (not that any fraud is implied, just irregularity).</li>
<li>Why is the distribution of the leading digit fairly uniform for hexadecimal literals?</li>
<p>These surprising measurements show that there is a lot to the shape of numeric literals that is yet to be discovered.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2008%2F12%2F13%2Fbenfords-law-and-numeric-literals-in-source-code%2F&amp;title=Benford%26%238217%3Bs%20law%20and%20numeric%20literals%20in%20source%20code" id="wpa2a_2"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2008/12/13/benfords-law-and-numeric-literals-in-source-code/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

