<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Shape of Code &#187; grammar</title>
	<atom:link href="http://shape-of-code.coding-guidelines.com/tag/grammar/feed/" rel="self" type="application/rss+xml" />
	<link>http://shape-of-code.coding-guidelines.com</link>
	<description></description>
	<lastBuildDate>Sun, 29 Jan 2012 23:49:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Simple generator for compiler stress testing source</title>
		<link>http://shape-of-code.coding-guidelines.com/2011/04/25/simple-generator-for-compiler-stress-testing-source/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2011/04/25/simple-generator-for-compiler-stress-testing-source/#comments</comments>
		<pubDate>Mon, 25 Apr 2011 02:14:50 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[code generation]]></category>
		<category><![CDATA[compiler]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[source code]]></category>
		<category><![CDATA[test generator]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=390</guid>
		<description><![CDATA[Since writing my C book I have been interested in the problem of generating source that has the syntactic and semantic statistical characteristics of human written code. Generating code that obeys a language&#8217;s syntax is straight forward. Take a specification of the syntax (say is some yacc-like form) and &#8216;generate&#8217; each of the terminals/nonterminals on [...]]]></description>
			<content:encoded><![CDATA[<p>Since writing my <a href="http://www.knosof.co.uk/cbook">C book</a> I have been interested in the problem of generating source that has the syntactic and semantic statistical characteristics of human written code.</p>
<p>Generating code that obeys a language&#8217;s syntax is straight forward.  Take a specification of the syntax (say is some yacc-like form) and &#8216;generate&#8217; each of the terminals/nonterminals on the right-hand-side of the start symbol.  Nonterminals will lead to rules having right-hand-sides that in turn need to be &#8216;generated&#8217;, a random selection being made when a nonterminal has more than one possible rhs rule.  Output occurs when a terminal is &#8216;generated&#8217;.</p>
<p>For the code to mimic human written code it is necessary to bias the random selection process; a numeric value at the start of each rhs rule can be used to specify the percentage probability of that rule being chosen for the corresponding nonterminal.</p>
<p>The following example generates a subset of C expressions;  nonterminals in lowercase,  terminals in uppercase and implemented as a call to a function having that name:</p>

<div class="wp_syntax"><div class="code"><pre class="bnf" style="font-family:monospace;">%grammar
&nbsp;
first_rule <span style="color: #006600; font-weight: bold;">:</span> def_ident <span style="color: #a00;">&quot; = &quot;</span> expr <span style="color: #a00;">&quot; ;\n&quot;</span> END_EXPR_STMT <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
def_ident <span style="color: #006600; font-weight: bold;">:</span> MK_IDENT <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
constant <span style="color: #006600; font-weight: bold;">:</span> MK_CONSTANT <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
identifier <span style="color: #006600; font-weight: bold;">:</span> KNOWN_IDENT <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
primary_expr <span style="color: #006600; font-weight: bold;">:</span>
	       <span style="">30</span> constant <span style="color: #006600; font-weight: bold;">|</span>
               <span style="">60</span> identifier <span style="color: #006600; font-weight: bold;">|</span>
               <span style="">10</span> <span style="color: #a00;">&quot; (&quot;</span> expr <span style="color: #a00;">&quot;) &quot;</span> <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
multiplicative_expr <span style="color: #006600; font-weight: bold;">:</span>
		<span style="">50</span> primary_expr <span style="color: #006600; font-weight: bold;">|</span>
                <span style="">40</span> multiplicative_expr <span style="color: #a00;">&quot; * &quot;</span> primary_expr <span style="color: #006600; font-weight: bold;">|</span>
                <span style="">10</span> multiplicative_expr <span style="color: #a00;">&quot; / &quot;</span> primary_expr <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
additive_expr <span style="color: #006600; font-weight: bold;">:</span>
		<span style="">50</span> multiplicative_expr <span style="color: #006600; font-weight: bold;">|</span>
                <span style="">25</span> additive_expr <span style="color: #a00;">&quot; + &quot;</span> multiplicative_expr <span style="color: #006600; font-weight: bold;">|</span>
                <span style="">25</span> additive_expr <span style="color: #a00;">&quot; - &quot;</span> multiplicative_expr <span style="color: #666666; font-style: italic;">;</span>
&nbsp;
expr <span style="color: #006600; font-weight: bold;">:</span> START_EXPR additive_expr FINISH_EXPR <span style="color: #666666; font-style: italic;">;</span></pre></div></div>

<p>A 250 line awk program (awk only because I use it often enough for simply text processing that it is second nature) translates this into two Python lists:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">productions = <span style="color: black;">&#91;</span> <span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1</span>, <span style="color: #808080; font-style: italic;"># first_rule</span>
<span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">5</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1001</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1002</span>, <span style="color: #ff4500;">1003</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1</span>, <span style="color: #808080; font-style: italic;"># def_ident</span>
<span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1004</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1</span>, <span style="color: #808080; font-style: italic;"># constant</span>
<span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1005</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">5</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1</span>, <span style="color: #808080; font-style: italic;"># identifier</span>
<span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1006</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">6</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">0</span>, <span style="color: #808080; font-style: italic;"># primary_expr</span>
<span style="color: #ff4500;">30</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">4</span>, <span style="color: black;">&#93;</span>,
<span style="color: #ff4500;">60</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">5</span>, <span style="color: black;">&#93;</span>,
<span style="color: #ff4500;">10</span>, <span style="color: #ff4500;">3</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1007</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1008</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">7</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">0</span>, <span style="color: #808080; font-style: italic;"># multiplicative_expr</span>
<span style="color: #ff4500;">50</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">6</span>, <span style="color: black;">&#93;</span>,
<span style="color: #ff4500;">40</span>, <span style="color: #ff4500;">3</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">7</span>, <span style="color: #ff4500;">1009</span>, <span style="color: #ff4500;">6</span>, <span style="color: black;">&#93;</span>,
<span style="color: #ff4500;">10</span>, <span style="color: #ff4500;">3</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">7</span>, <span style="color: #ff4500;">1010</span>, <span style="color: #ff4500;">6</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">8</span>, <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">0</span>, <span style="color: #808080; font-style: italic;"># additive_expr</span>
<span style="color: #ff4500;">50</span>, <span style="color: #ff4500;">1</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">7</span>, <span style="color: black;">&#93;</span>,
<span style="color: #ff4500;">25</span>, <span style="color: #ff4500;">3</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">8</span>, <span style="color: #ff4500;">1011</span>, <span style="color: #ff4500;">7</span>, <span style="color: black;">&#93;</span>,
<span style="color: #ff4500;">25</span>, <span style="color: #ff4500;">3</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">8</span>, <span style="color: #ff4500;">1012</span>, <span style="color: #ff4500;">7</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> <span style="color: #ff4500;">3</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">1</span>, <span style="color: #808080; font-style: italic;"># expr</span>
<span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">3</span>, <span style="color: black;">&#91;</span><span style="color: #ff4500;">1013</span>, <span style="color: #ff4500;">8</span>, <span style="color: #ff4500;">1014</span>, <span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span>
&nbsp;
terminal = <span style="color: black;">&#91;</span> <span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; = &quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; ;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> FUNC_TERM, END_EXPR_STMT<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> FUNC_TERM, MK_IDENT<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> FUNC_TERM, MK_CONSTANT<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> FUNC_TERM, KNOWN_IDENT<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; (&quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot;) &quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; * &quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; / &quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; + &quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> STR_TERM, <span style="color: #483d8b;">&quot; - &quot;</span><span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> FUNC_TERM, START_EXPR<span style="color: black;">&#93;</span>,
<span style="color: black;">&#91;</span> FUNC_TERM, FINISH_EXPR<span style="color: black;">&#93;</span>,
<span style="color: black;">&#93;</span></pre></div></div>

<p>which can be executed by a simply interpreter:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> exec_rule<span style="color: black;">&#40;</span>some_rule<span style="color: black;">&#41;</span> :
 rule_len=<span style="color: #008000;">len</span><span style="color: black;">&#40;</span>some_rule<span style="color: black;">&#41;</span>
 cur_action=<span style="color: #ff4500;">0</span>
 <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: black;">&#40;</span>cur_action <span style="color: #66cc66;">&lt;</span> rule_len<span style="color: black;">&#41;</span> :
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: black;">&#40;</span>some_rule<span style="color: black;">&#91;</span>cur_action<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> term_start_base<span style="color: black;">&#41;</span> :
       gen_terminal<span style="color: black;">&#40;</span>some_rule<span style="color: black;">&#91;</span>cur_action<span style="color: black;">&#93;</span>-term_start_base<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">else</span> :
       exec_rule<span style="color: black;">&#40;</span>select_rule<span style="color: black;">&#40;</span>productions<span style="color: black;">&#91;</span>some_rule<span style="color: black;">&#91;</span>cur_action<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    cur_action+=<span style="color: #ff4500;">1</span>
&nbsp;
productions.<span style="color: black;">sort</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
start_code<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
ns=<span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: black;">&#40;</span>ns <span style="color: #66cc66;">&lt;</span> <span style="color: #ff4500;">2000</span><span style="color: black;">&#41;</span> : <span style="color: #808080; font-style: italic;"># Loop generating lots of test cases</span>
   exec_rule<span style="color: black;">&#40;</span>select_rule<span style="color: black;">&#40;</span>productions<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
   ns+=<span style="color: #ff4500;">1</span>
&nbsp;
end_code<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Naive syntax-directed generation results in a lot of code that violates one or more fundamental semantic constraints.  For instance the assignment <code>(1+1)=3</code> is syntactically valid in many languages, which invariably specify a semantic constraint on the lhs of an assignment operator being some kind of modifiable storage location.  The simplest solution to this problem is to change the syntax to limit the kinds of constructs that can be generated on the lhs of an assignment.</p>
<p>The hardest semantic association to get right is the connection between variable declarations and references to those variables in expressions.  One solution is to mimic how I think many developers write code, that is to generate the statements first and then generate the required definitions for the appropriate variables.</p>
<p>A whole host of minor semantic issues require the syntax generated code to be tweaked, e.g., division by zero occurs more often in untweaked generated code than human code.  There are also statistical patterns within the semantics of human written code, e.g., frequency of use of local variables, that need to be addressed.</p>
<p>A few weeks ago the source of <a href="http://embed.cs.utah.edu/csmith/">Csmith</a>, a C source generator designed to stress the code generation phase of a compiler, was released.  Over the years various people have written C compiler stress testers, most recently <a href="http://www.npl.co.uk/">NPL</a> implemented one in Java, but this is the first time that the source has been released.  Imagine my disappointment on discovering that Csmith contained around 40 KLOC of code, only a bit smaller than a <a href="http://www.knosof.co.uk/whoguard.html">C compiler</a> I had once help write.  I decided to see if my &#8216;human characteristics&#8217; generator could be used to create a compiler code generator stress tester.</p>
<p>The idea behind compiler code generator stress testing is to generate a program containing some complicated sequence of code, compile and run it, comparing the value produced against the value that is supposed to be produced.</p>
<p>I modified the human characteristics generator to produce pairs of statements like the following:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">i <span style="color: #339933;">=</span> i_3 <span style="color: #339933;">*</span> i_6 <span style="color: #339933;">&amp;</span> i_2 <span style="color: #339933;">&lt;&lt;</span> i_7 <span style="color: #339933;">;</span>
chk_result<span style="color: #009900;">&#40;</span>i<span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span> <span style="color: #339933;">*</span> <span style="color: #0000dd;">6</span> <span style="color: #339933;">&amp;</span> <span style="color: #0000dd;">2</span> <span style="color: #339933;">&lt;&lt;</span> <span style="color: #0000dd;">7</span><span style="color: #339933;">,</span> __LINE__<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>the second argument to <code>chk_result</code> is the value that <code>i</code> should contain (while generating the expression to assign to <code>i</code> the corresponding constant expression with the variables replaced by their known values is also created).</p>
<p>Having the compiler evaluate the constant expression simplifies the stress tester and provides another check that the compiler gets things right (or gets two different things wrong in the same way, in which case we probably don&#8217;t get to see any failure message).  The <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48742">first gcc bug I found</a> concerned this constant expression (in fact this same compiler bug crops up with alarming regularity in the generated code).</p>
<p>As previously mentioned connecting variables in expressions to a corresponding definition is a lot of work.  I simplified this problem by assuming that an integer variable <code>i</code> would be predefined in the surrounding support code and that this would be the only variable ever assigned to in the generated code.</p>
<p>There is some simple house-keeping that wraps everything within a program and provides the appropriate variable definitions.</p>
<p>The grammar used to generate full C expressions is 228 lines, the awk translator 252 lines and the Python interpreter 55 lines; just over 1% of Csmith in LOC and it is very easy to configure.  However, an awful lot functionality needs to be added before it starts to rival Csmith, not least of which is support for assignment to more than one integer variable!</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2011%2F04%2F25%2Fsimple-generator-for-compiler-stress-testing-source%2F&amp;title=Simple%20generator%20for%20compiler%20stress%20testing%20source" id="wpa2a_2"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2011/04/25/simple-generator-for-compiler-stress-testing-source/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>GLR parsing is the future</title>
		<link>http://shape-of-code.coding-guidelines.com/2009/08/27/glr-parsing-is-the-future/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2009/08/27/glr-parsing-is-the-future/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 15:54:23 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ambiguity]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[runtime error]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[syntax]]></category>
		<category><![CDATA[the future]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=113</guid>
		<description><![CDATA[Traditionally parser generators have required that their input grammar be LALR(1) or some close variant (I would include LL(1) in this set). Back when 64k was an unimaginably large amount of memory being able to squeeze parser tables in a few kilobytes was very important; people received PhDs on parser table compression. There is still [...]]]></description>
			<content:encoded><![CDATA[<p>Traditionally parser generators have required that their input grammar be LALR(1) or some close variant (I would include LL(1) in this set).  Back when 64k was an unimaginably large amount of memory being able to squeeze parser tables in a few kilobytes was very important; people received PhDs on parser table compression.</p>
<p>There is still a market for compact, fast parsers.  Formal language grammars abound in communication protocols and vendors of communications hardware are very interested in keeping down costs by using minimizing the storage needed by their devices.</p>
<p>The trouble with LALR(1) is that value 1.  It means that the parser only  looks ahead one token in the input stream.  This often means that a grammar is flagged as being ambiguous (i.e., it contains shift/reduce or reduce/reduce conflicts) when it is actually just locally ambiguous, i.e., reading tokens further head on the input stream would provide sufficient context to unambiguously specify the appropriate grammar production.</p>
<p>Restructuring a grammar to make it LALR(1) requires a lot of thought and skill and inexperienced users often give up.  I once spent a month trying to remove the conflicts in the SQL/2 grammar specified by the SQL ISO standard; I managed to get the number down from over 1,000 to a small number that I decided I could live with.</p>
<p>It has taken a long time for parser generators to break out of the 64k mentality, but over the last few years it has started to happen.  There have been two main approaches: 1) LR(n) provides a mechanism to look further ahead than one token, ie, <equ>n</equ> tokens, and 2) <a href="http://en.wikipedia.org/wiki/GLR_parser">GLR</a> parsing.</p>
<p>I think that GLR parsing is the future for two reasons:</p>
<ul>
<li>It is supported by the most widely used parser generator, <a href="http://www.gnu.org/software/bison/">bison</a>.</li>
<li>It enables working parsers to be created with much less thought and effort than a LALR(1) parser.  (I don&#8217;t know how it compares against LR(n)).</li>
</ul>
<p>GLR parsers resolve any language ambiguities by effectively delaying decisions until runtime in the hope that reading enough tokens will resolve local ambiguities.  If an ambiguity in the token stream cannot be resolved a runtime error occurs (this is the one big downside of a GLR parser, the parser generated by a LALR(1) parser generator may produce lots of build time warnings but never produces errors when the parser is executed).</p>
<p>One example of a truly ambiguous construct (discussed <a href="http://shape-of-code.coding-guidelines.com/2008/12/parsing-without-a-symbol-table">here</a> a while ago) is:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">x <span style="color: #339933;">*</span> y<span style="color: #339933;">;</span></pre></div></div>

<p>which in C/C++ could be a declaration of <code>y</code> to be a pointer to <code>x</code>, or an expression that multiplies <code>x</code> and <code>y</code>.</p>
<p>Tools that can detect these global ambiguities in a grammar are starting to appear, e.g., <a href="http://www.lsv.ens-cachan.fr/~schmitz/software">DTWA</a> is a bison extension.</p>
<p>I reviewed an early draft of the new O&#8217;Reilly book &#8220;flex &#038; bison&#8221; and tried to get the <a href="http://www.johnlevine.com/">author</a> to be more upbeat on GLR support in bison; I think I got him to be a bit less cautious.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2009%2F08%2F27%2Fglr-parsing-is-the-future%2F&amp;title=GLR%20parsing%20is%20the%20future" id="wpa2a_4"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2009/08/27/glr-parsing-is-the-future/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Finding the &#8216;minimum&#8217; faulty program</title>
		<link>http://shape-of-code.coding-guidelines.com/2009/03/17/finding-the-minimum-faulty-program/</link>
		<comments>http://shape-of-code.coding-guidelines.com/2009/03/17/finding-the-minimum-faulty-program/#comments</comments>
		<pubDate>Tue, 17 Mar 2009 00:43:07 +0000</pubDate>
		<dc:creator>Derek-Jones</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ACCU]]></category>
		<category><![CDATA[compiler testing]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[mentoring]]></category>
		<category><![CDATA[source code]]></category>
		<category><![CDATA[test generator]]></category>

		<guid isPermaLink="false">http://shape-of-code.coding-guidelines.com/?p=91</guid>
		<description><![CDATA[A few weeks ago I received an inquiry about running a course/workshop on compiler writing. This does not does not happen very often and it reminded me that many years ago the ACCU asked if I would run a mentored group on compiler writing, I was busy writing a book at the time. The inquiry [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I received an inquiry about running a course/workshop on compiler writing.  This does not does not happen very often and it reminded me that many years ago the <a href="http://www.accu.org">ACCU</a> asked if I would run a mentored group on compiler writing, I was busy <a href="http://www.knosof.co.uk/cbook">writing a book</a> at the time.  The inquiry got me thinking it would be fun to run a compiler writing mentored group over a 6-9 month period and I emailed the general ACCU reflector asking if anybody was interested in joining such a group (any reader wanting to join the group has to be a member of the ACCU).</p>
<p>Over the weekend I had a brainwave for a project, automatic compiler test generation coupled with a program source code minimizer (I need a better name for this bit).  Automatic test generation sounds great in theory but in practice whittling down the source code of those programs that result in a fault being exhibited, to create a usable sized test case that is practical for debugging purposes can be a major effort.  What is needed is a tool to automatically do the whittling, i.e., a test case minimizer.</p>
<p>A simple algorithm for whittling down the source of a large test program is to continually throw away that half/third/quarter of the code that is not needed for the fault to manifest itself.  A compiler project that took as input source code, removed half/third/quarter of the code and generated output that could be compiled and executed is realistic.  The input/reduce/output process could be repeated until the generated source was considered to have reached some minima.  Ok, this will soak up some cpu time, but computers are cheap and people are expensive.</p>
<p>Where does the test source code come from?  Easy, it is generated from the same yacc grammar that the compiler, written by the mentored group member, uses to parse its input.  Fortunately such a <a href="http://search.cpan.org/~dcoppit/yagg-1.4001/yagg ">generation tool</a> is available and ready to use.</p>
<p>The beauty is using the same grammar to generate tests and parse input.  This means there is no need to worry about which language subset to use initially and support for additional language syntax can be added incrementally.  </p>
<p>Experience shows that automatically generated test programs quickly uncover faults in production compilers, even when working with language subsets.  Compiler implementors are loath to spend time cutting down a large program to find the statement/expression where the fault lies, this project will produce a tool that does the job for them.</p>
<p>So to recap, the mentored group is going to write one or more automatic source code generators that will be used to stress test compilers written by other people (e.g., gcc and Microsoft).  Group members will also write their own compiler that reads in this automatically generated source code, throws some of it away and writes out syntactically/semantically correct source code.  Various scripts will be be written to glue this all together.</p>
<p>Group members can pick the language they want to work with.  The initial subset could just include supports for integer types, if-statements and binary operators.</p>
<p>If you had trouble making any sense all this, don&#8217;t join the group.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fshape-of-code.coding-guidelines.com%2F2009%2F03%2F17%2Ffinding-the-minimum-faulty-program%2F&amp;title=Finding%20the%20%26%238216%3Bminimum%26%238217%3B%20faulty%20program" id="wpa2a_6"><img src="http://shape-of-code.coding-guidelines.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://shape-of-code.coding-guidelines.com/2009/03/17/finding-the-minimum-faulty-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

