Archive

Archive for the ‘data analysis’ Category

Designing a processor for increased source portability costs

February 9th, 2010 Derek-Jones 4 comments

How might a vendor make it difficult for developers to port open source applications to their proprietary cpu? Keeping the instruction set secret is one technique, another is to design a cpu that breaks often relied upon assumptions that developers have about the characteristics of the architecture on which their code executes.

Of course breaking architectural assumptions does not prevent open source being ported to a platform, but could significantly slow down the migration; giving more time for customers to become locked into the software shipped with the product.

Which assumptions should be broken to have the maximum impact on porting open source? The major open source applications (e.g., Firefox, MySQL, etc) run on 32/64-bit architectures that have an unsigned address space, whose integer representation uses two’s complement arithmetic and arithmetic operations on these integer values wrap on over/underflow.

32/64-bit. There is plenty of experience showing that migrating code from 16-bit to 32-bit environments can involve a lot of effort (e.g., migrating Windows 286/386 code to the Intel 486) and plenty of companies are finding the migration from 32 to 64-bits costly.

Designing a 128-bit processor might not be cost effective, but what about a 40-bit processor, like a number of high end DSP chips? I suspect that there are many power-of-2 assumptions lurking in a lot of code. A 40-bit integer type could prove very expensive for ports of code written with a 32/64-bit mindset (dare I suggest a 20-bit short; DSP vendors have preferred 16-bits because it uses less storage?).

Unsigned address space (i.e., lowest address is zero). Some code assumes that addresses with the top bit set are at the top end of memory and not just below the middle (e.g., some garbage collectors). Processors having a signed address space (i.e., zero is in the middle of storage) are sufficiently rare (e.g., the Inmos Transputer) that source is unlikely to support a HAS_SIGNED_ADDRESS build option.

How much code might need to be rewritten? I have no idea. While the code is likely to be very important there might not be a lot of it.

Two’s complement. Developers are constantly told not to write code that relies on the internal representation of data types. However, they might be forgiven for thinking that nobody uses anything other than two’s complement to represent integer types these days (I suspect Univac does not have that much new code ported to it’s range of one’s complement machines).

How much code will break when ported to a one’s complement processor? The representation of negative numbers in one’s complement and two’s complement is different and the representation of positive numbers the same. In common usage positive values are significantly more common than negative values and many variables (having a signed type) never get to hold a negative value.

While I have no practical experience, or know of anybody who has, I suspect the use of one’s complement might not be that big a problem. If you have experience please comment.

Arithmetic that wraps (i.e., positive values overflow negative and negative values underflow positive). While expressions explicitly written to wrap might be rare, how many calculations contain intermediate values that have wrapped but deliver a correct final result because they are ‘unwrapped’ by a subsequent operation?

Arithmetic operation that saturate are needed in applications such as graphics where, for instance, increasing the brightness should not suddenly cause the darkest setting to occur. Some graphics processors include support for arithmetic operations that saturate.

The impact of saturation arithmetic on portability is difficult to judge. A lot of code contains variables having signed char and short types, but when they appear as the operand in a binary operation these are promoted to int in C/C++/etc which probably has sufficient range not to overflow (most values created during program execution are small). Again I am lacking in practical experience and comments are welcome.

Floating-point. Many programs do not make use of floating-point arithmetic and those that do rarely manipulate such values at the bit level. Using a non-IEEE 754 floating-point representation will probably have little impact on the portability of applications of interest to most users.

Update. Thanks to Cate for pointing out that I had forgotten to discuss why using non-8-bit chars does is not a worthwhile design decision.

Both POSIX and the C/C++ Standards require that the char type be represented in at least 8 bits. Computers supporting less than 8-bits were still being used in the early 80s (e.g., the much beloved ICL 1900 supported 6-bit characters). The C Standard also requires that char be the smallest unit of addressable storage, which means that it must be possible for a pointer to point at an object having a char type.

Designing a processor where the smallest unit of storage is greater than 8-bits but not a power-of-2 is likely to substantially increase all sorts of costs and complicate things enormously (e.g., interfaces to main memory which are designed to work with power of two interfaces). The purpose of this design is to increase other people’s cost, not the proprietary vendor’s cost.

What about that pointer requirement? Perhaps the smallest unit of storage that a pointer could address might be 16 or 40 bits? Such processors exist and compiler writers have used both solutions to the problems they present. One solution is for a pointer to contain the address of the storage location + offset of the byte within that storage (Cray used this approach on a processor whose pointers could only point at 64-bit chunks of storage, with the compiler generating the code to extract the appropriate byte), the other is to declare that the char type occupies 40-bits (several DSP compilers have taken this approach).

Having the compiler declare that char is not 8-bits wide would cause all sorts of grief, so lets not go there. What about the Cray compiler approach?

Some of the address bits on 64-bit processors are not used yet (because few customers need that amount of storage) so compiler writers could get around host-processor pointers not supporting the granularity needed to point at 8-bit objects by storing the extra information in ‘unused’ pointer bits (the compiler generating the appropriate insertion and extraction code). The end result is that the compiler can hide pointer addressability issues :-) .

Using third party measurement data

February 17th, 2009 Derek-Jones No comments

Until today, to the best of my knowledge, all of the source code analysis papers I have read were written by researchers who had control of the code analysis tools they used and had some form of localised access to the source. By control of the code analysis tools I mean that the researchers specified the tool options and had the ability to check the behavior of the tool, in many cases the source of the tool was available to them and often even written by them, and the localised access may have involved downloading lots of code from the web.

I have just been reading about a broad brush analysis of comment usage based on data provided by a commercial code repository that offers API access to some basic code metrics.

At first I was very frustrated by the lack of depth to the analysis provided in the paper, but then I realised that the authors’ intent was to investigate a few broad ideas about comment usage in a large number of projects (around 10,000). The authors complained in their blog about some of the referees comments and having to submit a shorter paper. I can see where the referees are coming from, the papers are lacking in depth of analysis, but they do contain some interesting results.

I was very interested in Figure 2:
Comment density as a function of source code lines in a given commit
which plots the comment density of the lines in a source code commit. I would expect the ratio to be higher for small commits because a developer probably has a relatively fixed amount to say about updates involving a smallish number of lines (which probably fixes a problem). Larger commits are probably updated functionality and so would have a comment density similar to the ‘average’.

The problem with relying on third parties to supply the data is that obtaining the answers to follow up questions invariably involves lots of work, e.g., creating an environment to perform the measurements needed for the follow up questions. However the third party approach can significantly reduce the amount of work needed to get to a point where the interestingness of the results can be gauged.

Benford’s law and numeric literals in source code

December 13th, 2008 Derek-Jones No comments

Benford’s law applies to values derived from a surprising number number of natural and man-made processes. I was very optimistic that it would also apply to numeric literals in source code. Measurements of C source showed that I was wrong (the chi-square fit was 1,680 for decimal integer literals and 132,398 for floating literals).

Image goes here.

Probability that the leading digit of an (decimal or hexadecimal) integer literal has a particular value (dotted lines predicted by Benford’s law).

What are the conditions necessary for a sample of values to follow Benford’s law? A number of circumstances have been found to result in sample values having a leading digit that follows Benford’s law, including:

  • Selecting random samples from different sets of values where each set has a different probability distribution (i.e, select the distributions at random and then collect a sample of values from each of these distributions)
  • If the sample values are derived from a process that is scale invariant.
  • If the sample values are derived from a process that involves multiplying independent values having a uniform distribution.
  • Samples that have been found to follow Benford’s law include lists of physical constants and accounting data (so much so that it has been used to detect accounting fraud). However, the number of data-sets containing values whose leading digit follows Benford’s law is not a great as some would make us believe.

    Why don’t the leading digits of numeric literals in source code follow Benford’s law?

  • Perhaps small values are over represented because they are used as offsets to access the storage either side of some pointer (in C/C++/Java/(not Pascal/Fortran) the availability of the ++/-- operators reduces the number of instances of 1 to increment/decrement a value). But this only applies to integer types, not floating types
  • Image goes here.

    Probability that the leading, first non-zero, digit of a floating literal has a particular value (dashed line predicted by Benford’s law).

  • Perhaps there exists a high degree of correlation between the value of literals. I’m not yet sure how to look for this.
  • Why is there a huge spike at 5 for the floating-point literals? Have values been rounded to produce 0.5? This looks like an area where methods used for accounting fraud detection might be applied (not that any fraud is implied, just irregularity).
  • Why is the distribution of the leading digit fairly uniform for hexadecimal literals?
  • These surprising measurements show that there is a lot to the shape of numeric literals that is yet to be discovered.

    www.wenn.com
    FireStats icon Powered by FireStatswww.tinynibbles.com best price levitra

    name brand cialis

    lowest price for propecia

    cialis discount

    buy 5 mg cialis

    indian generic levitra

    generic cialis soft tabs

    buy levitra vardenafil

    cialis and ketoconazole

    cialis overnight

    hydrochlorothiazide cialis

    cialis in mexico

    generic cialis next day shipping

    cialis and diarrhea

    best price for generic cialis

    lowest propecia prices

    cheap propecia online

    cialis transdermal

    cialis 5 mg buy

    cheap cialis

    female viagra pills

    cialis headaches

    levitra cost

    how strong is 5 mg of cialis

    buying online propecia

    i need to buy propecia

    buy cheapest propecia

    how to get cialis in canada

    brand name cialis

    buy cheap generic levitra

    cialis from mexico

    buy viagra

    cialis next day delivery

    cheap propecia no prescription

    buy cialis fedex shipping

    cialis from canada

    get levitra

    lowest cost levitra

    buy viagra without prescription

    cheap levitra prescription

    buy generic propecia

    lowest propecia prices in canada

    online cheap viagra

    generic levitra vardenafil

    canada viagra

    buy generic viagra india rx

    cialis no prescription

    buy discount viagra

    cheapest viagra usa

    cheapest viagra online

    cheapest propecia uk

    generic viagra 100 mg

    buy propecia cheap

    cialis en mexico

    buy levitra online no prescription

    buying cialis

    canada viagra generic

    buy real viagra online

    once a day viagra

    order cheap propecia

    next day delivery cialis

    how much to buy viagra in pounds

    levitra mail order

    cialis purchase

    cialis refractory

    cheap propecia online prescription

    buy cheap levitra

    herbal propecia

    cost levitra low

    low price levitra

    cialis 50 mg

    cialis online

    buy cheap levitra online

    online generic cialis 100 mg

    generic propecia sale

    levitra buy online

    cheap viagra from uk

    canada meds viagra

    brand cialis for sale

    generic viagra canada

    buy levitra overnight

    buy cialis without prescription

    buy cialis once daily

    cialis by mail

    natural viagra

    overnight delivery viagra

    levitra online

    levitra cheap fast

    ganeric cialis

    buy propecia online

    cheapest propecia prescription

    cialis delivered overnight

    low cost propecia

    levitra next day delivery

    cialis uk

    get propecia online pharmacy

    cost of viagra

    cheap levitra without prescription

    cialis fast delivery usa

    canada levitra

    get cialis online

    buying generic cialis mexico rx

    online pharmacy propecia renova

    buy generic levitra

    brand viagra over the net

    cheapest overnight cialis

    canadian viagra

    discount cialis india

    lowest price levitra

    fda levitra

    levitra now online

    canadian pharmacy cialis

    buy propecia generic

    cost of propecia

    generic propecia alternative

    5 mg daily cialis

    low cost canadian viagra

    healthcare canadian pharmacy

    order cheap levitra

    cialis cheap

    china viagra

    info levitra

    buy viagra online

    best price propecia

    generic propecia 5mg

    buy propecia online pharmacy

    buy cialis cannada

    buy levitra uk

    buy branded viagra

    best price cialis

    bestellen levitra online

    buy viagra online cheap us

    canada online pharmacy levitra

    cialis 100 mg generic

    levitra pill

    discount propecia propecia

    cialis quick shipment

    generic levitra cheap

    cialis 100 mg

    obtain viagra without prescription

    order levitra online

    cialis 5 mg

    buy fast propecia

    levitra order prescription

    cialis alternative

    order cheapest propecia online

    get cialis

    buying propecia online

    buy dosages levitra

    generic viagra made in usa

    cheap cialis soft

    cialis fast delivery

    buy generic cialis

    discount generic propecia

    buy propecia prescriptions online

    mail order levitra

    buy can from i propecia who

    levitra online overnight delivery

    generic viagra online

    daily dosage cialis

    buy viagra china

    cheap levitra uk

    cialis 5 mg italia

    levitra in india

    levitra tabs

    combine cialis and levitra

    how to get viagra

    cialis professional 20 mg

    canadian healthcare

    buying viagra in canada

    buy cialis 5 mg

    buy cheap generic propecia

    buying propecia

    generic levitra online

    buy levitra online viagra

    levitra viagra online

    brand viagra professional

    canadian propecia rx

    generic cialis sale

    online cialis

    levitra online us

    discount levitra online

    canadian viagra and healthcare

    canada viagra pharmacies scam

    bio viagra herbal

    canada cheap propecia

    canadian viagra india

    best price generic propecia

    cheapest prices for viagra

    lowest price propecia best

    buy levitra us

    order generic levitra

    canadian pharmacies cialis

    generic propecia finasteride

    how much is viagra

    discount propecia online

    cialis tablets

    cost of daily cialis

    cialis for woman

    cheap viagra online

    order propecia

    levitra where to buy

    levitra online prescription

    buy propecia in the uk

    buy viagra germany canadian meds

    cialis next day

    canadian viagra 50mg

    mexico levitra

    for sale levitra

    levitra online no prescription

    cialis tablets foreign

    cialis strenght mg

    cialis professional no prescription

    cheapest propecia sale uk

    canadian healthcare pharmacy

    cheap fast levitra

    online pharmacy propecia viagra

    cialis buy overnight

    cialis on women

    5 mg original brand cialis

    cialis cheap us pharmacy

    online propecia uk

    once daily cialis

    cialis discounts

    gele viagra

    discount drug propecia

    cialis one a day

    buy propecia online from usa pharmacy

    buy cialis online uk

    buy cialis for daily use

    canada propecia prescription

    online propecia prescriptions

    indian cialis generic

    buy cialis in usa

    levitra sales uk

    mail online order propecia

    cialis daily

    generic levitra canada

    buy cialis online canada

    best price for propecia

    discount us propecia

    low cost levitra

    generic levitra purchase

    cialis price

    buying cialis next day delivery

    levitra low price

    canadian drugs propecia

    discount propecia rx

    buy propecia online prescription

    mexico pharmacy cialis

    cialis profesional

    buy cialis canada

    generic viagra made in india

    discount levitra rx

    cheap canadian viagra

    cialis generic 100 mg

    canadian online pharmacy cialis

    does generic cialis work

    generic propecia online pharmacy

    buy viagra mexico

    buying cialis soft tabs 100 mg

    lowest price on non generic levitra

    cheap discount levitra

    cheap levitra tablets

    generic levitra overnight delivery

    order prescription propecia

    cheap levitra

    cheap viagra canada or india

    cialis price in canada

    cialis price 100 mg

    overnight delivery cialis

    buying levitra online

    cialis professional 100 mg

    ordering cialis gel

    can i get viagra in mexico

    buy propecia where

    online propecia prescription

    cheap prescription propecia

    generic propecia for sale

    cialis to buy

    canada online pharmacy propecia

    generic viagra canadian

    cheapest price propecia cheap

    best viagra

    cialis and canada custom

    genuine cialis pills

    online levitra

    cialis daily in canada

    how much does cialis cost

    levitra discount

    internet pharmacy propecia

    discount levitra purchase

    canadian pharmacy

    ordering propecia online

    canadian pharmacy discount code viagra

    drug generic propecia

    cialis vs levitra

    levitra in canada

    buying generic propecia

    cialis daily dosage pharmacy