Archive

Archive for the ‘data analysis’ Category

Designing a processor for increased source portability costs

February 9th, 2010 Derek-Jones 4 comments

How might a vendor make it difficult for developers to port open source applications to their proprietary cpu? Keeping the instruction set secret is one technique, another is to design a cpu that breaks often relied upon assumptions that developers have about the characteristics of the architecture on which their code executes.

Of course breaking architectural assumptions does not prevent open source being ported to a platform, but could significantly slow down the migration; giving more time for customers to become locked into the software shipped with the product.

Which assumptions should be broken to have the maximum impact on porting open source? The major open source applications (e.g., Firefox, MySQL, etc) run on 32/64-bit architectures that have an unsigned address space, whose integer representation uses two’s complement arithmetic and arithmetic operations on these integer values wrap on over/underflow.

32/64-bit. There is plenty of experience showing that migrating code from 16-bit to 32-bit environments can involve a lot of effort (e.g., migrating Windows 286/386 code to the Intel 486) and plenty of companies are finding the migration from 32 to 64-bits costly.

Designing a 128-bit processor might not be cost effective, but what about a 40-bit processor, like a number of high end DSP chips? I suspect that there are many power-of-2 assumptions lurking in a lot of code. A 40-bit integer type could prove very expensive for ports of code written with a 32/64-bit mindset (dare I suggest a 20-bit short; DSP vendors have preferred 16-bits because it uses less storage?).

Unsigned address space (i.e., lowest address is zero). Some code assumes that addresses with the top bit set are at the top end of memory and not just below the middle (e.g., some garbage collectors). Processors having a signed address space (i.e., zero is in the middle of storage) are sufficiently rare (e.g., the Inmos Transputer) that source is unlikely to support a HAS_SIGNED_ADDRESS build option.

How much code might need to be rewritten? I have no idea. While the code is likely to be very important there might not be a lot of it.

Two’s complement. Developers are constantly told not to write code that relies on the internal representation of data types. However, they might be forgiven for thinking that nobody uses anything other than two’s complement to represent integer types these days (I suspect Univac does not have that much new code ported to it’s range of one’s complement machines).

How much code will break when ported to a one’s complement processor? The representation of negative numbers in one’s complement and two’s complement is different and the representation of positive numbers the same. In common usage positive values are significantly more common than negative values and many variables (having a signed type) never get to hold a negative value.

While I have no practical experience, or know of anybody who has, I suspect the use of one’s complement might not be that big a problem. If you have experience please comment.

Arithmetic that wraps (i.e., positive values overflow negative and negative values underflow positive). While expressions explicitly written to wrap might be rare, how many calculations contain intermediate values that have wrapped but deliver a correct final result because they are ‘unwrapped’ by a subsequent operation?

Arithmetic operation that saturate are needed in applications such as graphics where, for instance, increasing the brightness should not suddenly cause the darkest setting to occur. Some graphics processors include support for arithmetic operations that saturate.

The impact of saturation arithmetic on portability is difficult to judge. A lot of code contains variables having signed char and short types, but when they appear as the operand in a binary operation these are promoted to int in C/C++/etc which probably has sufficient range not to overflow (most values created during program execution are small). Again I am lacking in practical experience and comments are welcome.

Floating-point. Many programs do not make use of floating-point arithmetic and those that do rarely manipulate such values at the bit level. Using a non-IEEE 754 floating-point representation will probably have little impact on the portability of applications of interest to most users.

Update. Thanks to Cate for pointing out that I had forgotten to discuss why using non-8-bit chars does is not a worthwhile design decision.

Both POSIX and the C/C++ Standards require that the char type be represented in at least 8 bits. Computers supporting less than 8-bits were still being used in the early 80s (e.g., the much beloved ICL 1900 supported 6-bit characters). The C Standard also requires that char be the smallest unit of addressable storage, which means that it must be possible for a pointer to point at an object having a char type.

Designing a processor where the smallest unit of storage is greater than 8-bits but not a power-of-2 is likely to substantially increase all sorts of costs and complicate things enormously (e.g., interfaces to main memory which are designed to work with power of two interfaces). The purpose of this design is to increase other people’s cost, not the proprietary vendor’s cost.

What about that pointer requirement? Perhaps the smallest unit of storage that a pointer could address might be 16 or 40 bits? Such processors exist and compiler writers have used both solutions to the problems they present. One solution is for a pointer to contain the address of the storage location + offset of the byte within that storage (Cray used this approach on a processor whose pointers could only point at 64-bit chunks of storage, with the compiler generating the code to extract the appropriate byte), the other is to declare that the char type occupies 40-bits (several DSP compilers have taken this approach).

Having the compiler declare that char is not 8-bits wide would cause all sorts of grief, so lets not go there. What about the Cray compiler approach?

Some of the address bits on 64-bit processors are not used yet (because few customers need that amount of storage) so compiler writers could get around host-processor pointers not supporting the granularity needed to point at 8-bit objects by storing the extra information in ‘unused’ pointer bits (the compiler generating the appropriate insertion and extraction code). The end result is that the compiler can hide pointer addressability issues :-).

Using third party measurement data

February 17th, 2009 Derek-Jones No comments

Until today, to the best of my knowledge, all of the source code analysis papers I have read were written by researchers who had control of the code analysis tools they used and had some form of localised access to the source. By control of the code analysis tools I mean that the researchers specified the tool options and had the ability to check the behavior of the tool, in many cases the source of the tool was available to them and often even written by them, and the localised access may have involved downloading lots of code from the web.

I have just been reading about a broad brush analysis of comment usage based on data provided by a commercial code repository that offers API access to some basic code metrics.

At first I was very frustrated by the lack of depth to the analysis provided in the paper, but then I realised that the authors’ intent was to investigate a few broad ideas about comment usage in a large number of projects (around 10,000). The authors complained in their blog about some of the referees comments and having to submit a shorter paper. I can see where the referees are coming from, the papers are lacking in depth of analysis, but they do contain some interesting results.

I was very interested in Figure 2:
Comment density as a function of source code lines in a given commit
which plots the comment density of the lines in a source code commit. I would expect the ratio to be higher for small commits because a developer probably has a relatively fixed amount to say about updates involving a smallish number of lines (which probably fixes a problem). Larger commits are probably updated functionality and so would have a comment density similar to the ‘average’.

The problem with relying on third parties to supply the data is that obtaining the answers to follow up questions invariably involves lots of work, e.g., creating an environment to perform the measurements needed for the follow up questions. However the third party approach can significantly reduce the amount of work needed to get to a point where the interestingness of the results can be gauged.

Benford’s law and numeric literals in source code

December 13th, 2008 Derek-Jones No comments

Benford’s law applies to values derived from a surprising number number of natural and man-made processes. I was very optimistic that it would also apply to numeric literals in source code. Measurements of C source showed that I was wrong (the chi-square fit was 1,680 for decimal integer literals and 132,398 for floating literals).

Image goes here.

Probability that the leading digit of an (decimal or hexadecimal) integer literal has a particular value (dotted lines predicted by Benford’s law).

What are the conditions necessary for a sample of values to follow Benford’s law? A number of circumstances have been found to result in sample values having a leading digit that follows Benford’s law, including:

  • Selecting random samples from different sets of values where each set has a different probability distribution (i.e, select the distributions at random and then collect a sample of values from each of these distributions)
  • If the sample values are derived from a process that is scale invariant.
  • If the sample values are derived from a process that involves multiplying independent values having a uniform distribution.
  • Samples that have been found to follow Benford’s law include lists of physical constants and accounting data (so much so that it has been used to detect accounting fraud). However, the number of data-sets containing values whose leading digit follows Benford’s law is not a great as some would make us believe.

    Why don’t the leading digits of numeric literals in source code follow Benford’s law?

  • Perhaps small values are over represented because they are used as offsets to access the storage either side of some pointer (in C/C++/Java/(not Pascal/Fortran) the availability of the ++/-- operators reduces the number of instances of 1 to increment/decrement a value). But this only applies to integer types, not floating types
  • Image goes here.

    Probability that the leading, first non-zero, digit of a floating literal has a particular value (dashed line predicted by Benford’s law).

  • Perhaps there exists a high degree of correlation between the value of literals. I’m not yet sure how to look for this.
  • Why is there a huge spike at 5 for the floating-point literals? Have values been rounded to produce 0.5? This looks like an area where methods used for accounting fraud detection might be applied (not that any fraud is implied, just irregularity).
  • Why is the distribution of the leading digit fairly uniform for hexadecimal literals?
  • These surprising measurements show that there is a lot to the shape of numeric literals that is yet to be discovered.

    www.wenn.com
    FireStats icon Powered by FireStatswww.tinynibbles.com discount drug propecia

    cialis delivered overnight

    cialis from mexico

    levitra online

    herbal propecia

    generic viagra canadian

    online pharmacy propecia viagra

    indian cialis generic

    get levitra

    cialis transdermal

    for sale levitra

    bestellen levitra online

    cheap propecia online prescription

    info levitra

    brand name cialis overnight

    lowest priced propecia

    buy propecia online prescription

    cheap levitra prescription

    buy levitra vardenafil

    cialis professional 100 mg

    buy cialis 5 mg

    buy propecia where

    name brand cialis

    low cost levitra

    buy branded viagra

    generic propecia alternative

    buy levitra online from canada

    cheap fast levitra

    buy levitra online no prescription

    cheapest overnight cialis

    buy cheapest propecia

    generic viagra 100 mg

    indian generic levitra

    cheap cialis from india

    online levitra

    levitra buy online

    hydrochlorothiazide cialis

    levitra in canada

    discount propecia propecia

    buying propecia

    cialis price

    lowest price propecia best

    best price for generic cialis

    cialis alternative

    levitra online sales

    cheap propecia online

    cialis 5 mg

    levitra viagra cialis

    order cheap propecia

    levitra pill

    canadian healthcare

    how much is viagra

    cialis soft pills

    female viagra pills

    get propecia online pharmacy

    cheap propecia no prescription

    online propecia uk

    cialis cheap us pharmacy

    cialis fast delivery usa

    lowest propecia 1 mg

    canada online pharmacy levitra

    best viagra

    generic viagra canada

    once daily cialis

    canada meds viagra

    buy can from i propecia who

    buy propecia online pharmacy

    discount propecia online

    buy levitra online viagra

    cost of daily cialis

    generic propecia fda approved

    levitra in india

    canadian pharmacy viagra

    cialis for woman

    generic cialis next day shipping

    drug generic propecia

    lowest price levitra

    buy fast propecia

    healthcare canadian pharmacy

    levitra for sale

    generic propecia online pharmacy

    buy generic propecia

    best way to use cialis

    cialis one a day

    cheap levitra without prescription

    levitra online prescription

    canada viagra

    canada cheap propecia

    generic viagra made in india

    cialis discounts

    generic propecia effective

    cialis next day

    buy viagra online

    cialis professional no prescription

    levitra tabs

    does generic cialis work

    cialis buy overnight

    buy cialis cannada

    generic propecia 5mg

    cialis en mexico

    buy cialis online canada

    discount cialis india

    lowest propecia prices

    generic viagra india

    brand viagra professional

    generic levitra canada

    canada propecia prescription

    levitra cost

    canada levitra

    best price propecia

    levitra discount

    how much cialis

    i need to buy propecia

    50 mg cialis

    cialis by mail

    low cost canadian viagra

    buy cheap levitra

    generic levitra cheap

    cialis quick shipment

    brand cialis for sale

    buy real viagra online

    generic levitra online

    levitra prescription

    online propecia prescriptions

    cialis and canada custom

    generic levitra vardenafil

    mexico levitra

    buy propecia in the uk

    cheap viagra canada or india

    order cheap levitra

    cheap prescription propecia

    canadian pharmacies cialis

    mail order levitra

    cheapest propecia uk

    buying cialis

    buy now propecia

    cialis overnight

    buying generic cialis mexico rx

    cheapest prices for viagra

    cialis discount

    lowest propecia prices in canada

    canada online pharmacy propecia

    bio viagra herbal

    cialis 5 mg buy

    levitra low price

    online viagra gel to buy

    best price cialis

    lowest price for propecia

    canadian drugs propecia

    cheap discount levitra

    cheap viagra from uk

    online generic cialis 100 mg

    cialis headaches

    buy propecia cheap

    lowest price propecia

    cialis in mexico

    how much to buy viagra in pounds

    cialis vs levitra

    cialis 50 mg

    ordering propecia online

    cialis overnight delivery

    how strong is 5 mg of cialis

    best price levitra

    cialis dosage mg

    cheap cialis soft

    cialis 100 mg generic

    canadian online pharmacy cialis

    cialis generic 100 mg

    cialis prescription

    overnight delivery cialis

    cheapest viagra usa

    generic levitra purchase

    levitra online overnight delivery

    cialis cheap

    order prescription propecia

    levitra next day delivery

    canadian healthcare pharmacy

    buy cialis online uk

    buy propecia now

    canadian viagra

    online pharmacy propecia renova

    order cheapest propecia online

    best price generic propecia

    discount levitra rx

    discount us propecia

    generic cialis sale

    buy cialis in usa

    buy levitra us

    buy propecia generic

    cheapest viagra online

    canada generic propecia

    next day viagra

    buy real cialis

    ganeric cialis

    buy cialis canada

    cheapest price propecia cheap

    canadian pharmacy

    cialis price 100 mg

    cheap levitra tablets

    getting cialis from canada

    buy viagra on line

    buying cialis soft tabs 100 mg

    levitra from canadian pharmacy

    discount levitra purchase

    low cost propecia

    buying cialis next day delivery

    cheap levitra uk

    5 mg daily cialis

    generic propecia sale

    buy propecia online

    buying propecia online

    cialis and ketoconazole

    get levitra online

    internet pharmacy propecia

    levitra online no prescription

    online propecia prescription

    canadian viagra india

    mexico pharmacy cialis

    generic cialis soft tabs

    buying online propecia

    discount levitra online

    cheap viagra online

    genuine cialis pills

    cialis refractory

    buy levitra overnight

    5 mg original brand cialis

    cialis strenght mg

    cialis pharmacy

    order propecia

    levitra canadian

    natural viagra

    cheapest propecia sale uk

    cialis 20 mg

    buy cheap levitra online

    cost of propecia

    generic propecia for sale

    buy generic cialis

    buy cheap generic propecia

    how much does cialis cost

    brand name cialis

    buy 5 mg cialis

    indian viagra

    buy cheap generic levitra

    buying levitra online

    buy prescription propecia without

    canada viagra pharmacies scam

    cheap cialis

    gele viagra

    levitra mail order

    levitra viagra online

    buy viagra china

    cialis purchase

    buy propecia on line

    buy propecia online from usa pharmacy

    discount propecia rx

    obtain viagra without prescription

    generic viagra online

    combine cialis and levitra

    buying viagra in canada

    cialis daily dosage pharmacy

    cialis 100 mg

    canadian healthcare viagra

    overnight delivery viagra

    cheap levitra

    buy cialis fedex shipping

    cost of viagra

    cialis price in canada

    mail order propecia

    order viagra or levitra

    buy viagra without prescription

    buy online prescription propecia

    buy propecia canada

    discount generic propecia

    canadian pharmacy discount code viagra

    how to get viagra

    buy discount viagra

    levitra sales uk

    buy cialis without prescription

    once a day viagra

    order levitra online

    cialis 5 mg italia

    levitra online us

    lowest cost levitra

    cialis tablets

    cialis next day delivery

    buy viagra

    buy viagra online cheap us

    buy generic viagra india rx

    buy generic levitra

    canadian viagra 50mg

    fda levitra

    cialis daily

    generic viagra made in usa

    cialis woman

    online cheap viagra

    buy viagra mexico

    generic cialis from india

    china viagra

    indian cialis

    buy cialis usa