<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Shape of Code &#187; reverse engineer</title>
	<atom:link href="http://shape-of-code.coding-guidelines.com/tag/reverse-engineer/feed/" rel="self" type="application/rss+xml" />
	<link>http://shape-of-code.coding-guidelines.com</link>
	<description></description>
	<lastBuildDate>Sun, 12 Feb 2012 20:42:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Using evolution to reduce competition</title>
		<link>http://shape-of-code.coding-guidelines.com/2011/05/18/using-evolution-to-reduce-competition/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2011/05/18/using-evolution-to-reduce-competition/#comments</comments>
		<pubDate>Wed, 18 May 2011 02:07:24 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[communications protocol]]></category>
		<category><![CDATA[court case]]></category>
		<category><![CDATA[document]]></category>
		<category><![CDATA[EU]]></category>
		<category><![CDATA[evolution]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[reverse engineer]]></category>
		<category><![CDATA[Skype]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=443</guid>
		<description><![CDATA[The Microsoft purchase of Skype got me thinking back to my time as an advisor to the Monitoring Trustee appointed by the European Commission in the EU/Microsoft competition court case. The Commission wanted to introduce competition into the Windows Work Group server market and it hoped that by requiring Microsoft to license all of the [...]]]></description>
			<content:encoded><![CDATA[<p>The Microsoft purchase of <a href="http://en.wikipedia.org/wiki/Skype">Skype</a> got me thinking back to my time as an advisor to the <a href="http://ec.europa.eu/competition/antitrust/cases/microsoft/implementation.html">Monitoring Trustee</a> appointed by the European Commission in the <a href="http://en.wikipedia.org/wiki/European_Union_Microsoft_competition_case">EU/Microsoft competition court case</a>.  The Commission wanted to introduce competition into the Windows Work Group server market and it hoped that by requiring Microsoft to license all of the necessary <a href="http://msdn.microsoft.com/en-us/library/cc216517%28v=prot.10%29.aspx">communication protocols</a> companies would produce products that were plug-compatible with Microsoft products.  The major flaw in this plan turned out to be economics, we estimated it would cost around £100 million to implement the protocols and making a worthwhile profit on this investment looked decidedly problematic.</p>
<p>Microsoft&#8217;s approach to publishing protocol specifications went through three stages: 1) doing everything they could not to do it, 2) following the judgment handed down by the court, 3) actively documenting additional protocols and making all the documents publicly available.  Yes, as the documentation process progressed Microsoft started to see the benefits of having English prose documentation (previously the documentation was the source code) but I suspect the switch from (2) to (3) was made possible by the economic analysis that implied there would not be any competition in the server market.</p>
<p>Skype have not made their client/server protocols public, will Microsoft do so?  I suspect not because there is no benefit for them to do so.  Also I&#8217;m sure that Microsoft will want to steer clear of anti-trust authorities and will not be making Skype an integral part of Windows&#8217; internal functionality.</p>
<p>What progress has been made in reverse engineering the Skype protocols? There is a <a href="http://www1.cs.columbia.edu/~salman/skype/">community of people trying to figure them out</a> but they have not made the progress that enabled <a href="http://en.wikipedia.org/wiki/Andrew_Tridgell">Andrew Tridgell</a> to quickly get something useful up and running that could then <a href="http://www.samba.org/samba/docs/10years.html">evolve into</a> a full blown <a href="http://en.wikipedia.org/wiki/Samba_%28software%29">implementation of a Microsoft protocol</a>.</p>
<p>What lesson can Skype product managers learn from the Microsoft experience of having to make their proprietary protocols available to third parties?  I don&#8217;t think Microsoft intentionally did any the following:</p>
<ol>
<li>Don&#8217;t write any English prose documentation; ensure that the source code is the only specification of the protocols.  This will make it easier for point 3) to occur,</li>
<li>proprietary protocols are your friend, even designing &#8216;better&#8217; alternatives to non-proprietary protocols,</li>
<li>don&#8217;t put too much of a brake on evolution, i.e., allow developers to do what they always want to do which is to make quick fixes to the code and tweak it here and there <a href="http://shape-of-code.coding-guidelines.com/2010/06/18/network-protocols-also-evolve-into-a-tangle-of-dependencies">resulting in a tangle</a> that cannot be simplified.  This will significantly drive up third-party costs as they will not be able to create a product handling a useful subset (i.e., they will have to implement everything) and the tangle make sit harder form them to sure that what they have done is correct.</li>
</ol>
<p>What might be the short term costs of following this strategy?  Very good developers are used to learning by reading code (lack of documentation is a fact of life for may of them).  Experience has shown that allowing developers to make quick fixes and tweak code often results in difficult to maintain code (ok, so a small group of developers have to be paid above the market rate to ensure access to their code memory).  If developers really do dig themselves into a very large hole it is always possible to completely redesign the protocols and provide a very major upgrade (Skype can always reinvent its own protocols, an option not available to third parties which have to follow slavishly behind; this option has always been open to Microsoft with its protocols, i.e., the courts did not place any restrictions on protocol changes).</p>
<p>Where did the £100 million figure come from?  The problem of estimating development cost was approached from various angles.  The one I used was to estimate the number of requirements at 50,000 (there are 38,158 MUSTs in the first public release of the documents) of which 1,651 occur in the SMB specification for which there is a 450KLOC implementation (i.e., <a href="http://ftp.samba.org/pub/samba/stable/">samba source in 2006</a>), giving an estimate of (50000/1651)*450K -> 13.6 MLOC in the final implementation.  At £10 per line we get a bit more than £100 million. </p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2011%2F05%2F18%2Fusing-evolution-to-reduce-competition%2F&amp;title=Using%20evolution%20to%20reduce%20competition" id="wpa2a_2"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2011/05/18/using-evolution-to-reduce-competition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Minimum information needed for writing a code generator</title>
		<link>http://shape-of-code.coding-guidelines.com/2010/01/29/minimum-information-needed-for-writing-a-code-generator/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2010/01/29/minimum-information-needed-for-writing-a-code-generator/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 00:12:04 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[compiler writer]]></category>
		<category><![CDATA[instruction set]]></category>
		<category><![CDATA[iPhone OS]]></category>
		<category><![CDATA[literal generation]]></category>
		<category><![CDATA[loader]]></category>
		<category><![CDATA[reverse engineer]]></category>
		<category><![CDATA[Turing machine]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=175</guid>
		<description><![CDATA[If a compiler writer is faced with writing a back-end for an undocumented processor, what is the minimum amount of information that needs to be reverse engineered? It is possible to implement a universal computer using a cpu that has a single instruction, subtract-and-branch-if-lessthan-or-equal-to-zero. This is all very well, but processors based on using a [...]]]></description>
			<content:encoded><![CDATA[<p>If a compiler writer is faced with writing a back-end for an <a href="http://shape-of-code.coding-guidelines.com/2010/01/secret-instruction-sets-about-to-make-a-come-back/">undocumented processor</a>, what is the minimum amount of information that needs to be reverse engineered?</p>
<p>It is possible to implement a <a href="http://en.wikipedia.org/wiki/Universal_computer">universal computer</a> using a cpu that has a single instruction, <a href="http://en.wikipedia.org/wiki/One_instruction_set_computer"><code>subtract-and-branch-if-lessthan-or-equal-to-zero</code></a>.  This is all very well, but <a href="http://ce.et.tudelft.nl/MOVE/">processors based on using a single instruction</a> are a bit thin on the ground and the processor to hand is likely to support a larger number of simpler instructions.</p>
<p>A <code>subtract-and-branch-if-lessthan-or-equal-to-zero</code> instruction could be implemented on a register based machine using the appropriate sequence of <code>load-from-memory</code>, <code>subtract-two-registers</code>, <code>store-register-to-memory</code> and <code>jump-if-subtract-lessthan-or-equal</code> instructions.  Information about other instructions, such as add and multiply, would be useful for code optimization. (The <a href="http://en.wikipedia.org/wiki/Turing_machine">Turing machine</a> model of computation is sufficiently far removed from how most programs and computers operate that it is not considered further.)</p>
<p>Are we done?  In theory yes, in practice no.  A couple pf practical problems have been glossed over; how do source literals (e.g., <code>"Hello World"</code>) initially get written to storage, where does the storage used by the program come from and what is the file format of an executable?</p>
<p>Literals that are not created using an instruction (most processors have instructions for loading an integer constant into a register) are written to a part of the executable file that is read into storage by the <a href="http://en.wikipedia.org/wiki/Loader_%28computing%29">loader</a> on program startup.  All well and good if we know enough about the format of an executable file to be able to correct generate this information and can get the operating system to put in the desired storage location.  Otherwise we have to figure out some other solution.</p>
<p>If we know two storage locations containing values that differ by one a sequence of instructions could subtract one value from the other to eventually obtain any desired value.  This bootstrap process would speed up as a wider range of know value/location pairs was built up.</p>
<p>How do we go about obtaining a chunk of storage?  An executable file usually contains information on the initial amount of storage needed by a program when it is loaded.  Calls to the heap manager are another way of obtaining storage.  Again we need to know where to write the appropriate information in the executable file.</p>
<p>What minimum amount of storage might be expected to be available?  A program executing within a stack/heap based memory model has a default amount of storage allocated for the stack (a minimum of 16k I believe under Mac OS X or iPhone OS).  A program could treat the stack as its storage.  Ideally what we need is the ability to access storage via an offset from the stack pointer, at worse we would have to adjust the stack pointer to the appropriate offset, pop the value into a register and then reset the stack pointer; storing would involve a push.</p>
<p>Having performed some calculation we probably want to communicate one or more values to the outside world.  A call to a library routine, such as <code>printf</code>, needs information on the parameter passing conventions (e.g., which parameters get stored in which registers or have storage allocated for them {a function returning a structure type usually has the necessary storage allocated by the calling function with the address passed as an extra parameter}) and the location of any return value.  If <a href="http://en.wikipedia.org/wiki/Application_binary_interface">ABI</a> information is not available a bit of lateral thinking might be needed to come up with an <a href="http://web.archive.org/web/20051205153506/http://ipodlinux.org/stories/piezo/index.html">alternative output method</a>.</p>
<p>I have not said anything about making use of signals and exception handling.  These are hard enough to get right when documentation is available.  The Turing machine theory folk usually skip over these real-world issues and I will join them for the time being.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2010%2F01%2F29%2Fminimum-information-needed-for-writing-a-code-generator%2F&amp;title=Minimum%20information%20needed%20for%20writing%20a%20code%20generator" id="wpa2a_4"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2010/01/29/minimum-information-needed-for-writing-a-code-generator/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

