Posts Tagged ‘if’

Using Coccinelle to match if sequences

August 31st, 2009 No comments

I have been using Coccinelle to obtain measurements of various properties of C if and switch statements. It is rare to find a tool that does exactly what is desired but it is often possible to combine various tools to achieve the desired result.

I am interested in measuring sequences of if-else-if statements and one of the things I wanted to know was how many sequences of a given length occurred. Writing a pattern for each possible sequence was the obvious solution, but what is the longest sequence I should search for? A better solution is to use a pattern that matches short sequences and writes out the position (line/column number) where they occur in the code, as in the following Coccinelle pattern:

@ if_else_if_else @
expression E_1, E_2; 
statement S_1, S_2, S_3;
position p_1, p_2;
if@p_1 (E_1)
else if@p_2 (E_2)
script:python @ expr_1 << if_else_if_else.E_1;
                expr_2 << if_else_if_else.E_2;
                loc_1 << if_else_if_else.p_1;
                loc_2 << if_else_if_else.p_2;
print "--- ifelseifelse"
print loc_1[0].line, " ", loc_1[0].column, " ", expr_1
print loc_2[0].line, " ", loc_2[0].column, " ", expr_2

noting that in a sequence of source such as:

if (x == 1)
   if (x == 2)
      if (x == 3)

the tokens if (x == 2) will be matched twice, the first setting the position metavariable p_2 and then setting p_1. An awk script was written to read the Coccinelle output and merge together adjacent pairs of matches that were part of a longer if-else-if sequence.

The first pattern did not concern itself with the form of the controlling expression, it simply wrote it out. A second set of patterns was used to match those forms of controlling expression I was interested in, but first I had to convert the output into syntactically correct C so that it could be processed by Coccinelle. Again awk came to the rescue, converting the output:

--- ifelseifelse
186   2   op == FFEBLD_opSUBRREF
191   7   op == FFEBLD_opFUNCREF
--- ifelseifelse
1094   3   anynum && commit
1111   8   ( c [ colon + 1 ] == '*' ) && commit

into a separate function for each matched sequence:

void f_1(void) {
// --- ifelseifelse
/* 186   2 */   op == FFEBLD_opSUBRREF ;
/* 191   7 */   op == FFEBLD_opFUNCREF ;
void f_2(void) {
// --- ifelseifelse
/* 1094   3 */   anynum && commit ;
/* 1111   8 */   ( c [ colon + 1 ] == '*' ) && commit ;

The Coccinelle pattern:

@ if_eq_1 @
expression E_1;
constant C_1, C_2;
position p_1, p_2;
E_1 == C_1@p_1 ;
E_1 == C_2@p_2 ;
script:python @ expr_1<< if_eq_1.E_1;
                const_1 << if_eq_1.C_1;
                const_2 << if_eq_1.C_2;
                loc_1 << if_eq_1.p_1;
                loc_2 << if_eq_1.p_2;
print loc_1[0].line, " ", loc_1[0].column, " 3 ", expr_1, " == ", const_1
print loc_2[0].line, " ", loc_2[0].column, " 2 ", expr_1, " == ", const_2

matches a sequence of two statements which consist of an expression being compared for equality against a constant, with the expression being identical in both statements. Again positions were written out for post-processing, i.e., joining together matched sequences.

I was interested in any sequence of if-else-if that could be converted to an equivalent switch-statement. Equality tests against a constant is just one form of controlling expression that meets this requirement, another is the between operation. Separate patterns could be written and run over the generated C source containing the extracted controlling expressions.

Breaking down the measuring process into smaller steps reduced the amount of time needed to get a final result (with Coccinelle 0.1.19 the first pattern takes round 70 minutes, thanks to Julia Lawall‘s work to speed things up, an overhead that only has to occur once) and allows the same controlling expression patterns to be run against the output of both the if-else-if and if-if patterns.

At the end of this process I ended up with a list information (line numbers in source code and form of controlling expression) on if-statement sequences that could be rewritten as a switch-statement.