Home > Uncategorized > Randomizing a list of items using sort() and rand()

Randomizing a list of items using sort() and rand()

I’m busy putting together the experiment I will be running at the ACCU conference next week. If you are attending the conference please reserve your Thursday lunchtime slot for taking part as a subject!

Experiment generation invariably involves randomizing the sequence of items seen by every subject. While few languages support a randomise function many support sorting and random number generation. These two functions can be combined to create a randomize function; simply append a random number to the start of each item, sort it and then strip off the random number. Voilà a randomized list (awk code below).

function rand_items(items)
{
for (v in items)
   {
   items[v]=rand() " " items[v]
   }
asort(items)
for (v in items)
   {
   sp_pos=index(items[v], " ")
   items[v]=substr(items[v], sp_pos+1)
   }
}

This randomization problem is not yet listed on Rosetta code and probably has longer solutions in other languages.

Update (the next day).

The glow I have had for the last 10 years over coming up with a neat solution to a problem has now disappeared. Following the link kindly provided by D. Herring in the comments eventually lead me to the Fisher-Yates shuffle, which has O(n) performance (the call to sort is probably O(n log n). The following shuffles a deck of cards:

for (i = 0; i < 52; i++)
   {
   j = i + (rand() % (52 - i));
   tmp = card[i];
   card[i] = card[j];
   card[j] = tmp;
   }

The proof of the uniform shuffling behavior of Fisher-Yates (also known as Knuth shuffle) is straight forward but not nearly as appealing as using rand and sort.

  1. Anonymous
    April 21st, 2012 at 06:11 | #1

    Using a similar idea you can randomly select N elements from a stream of elements of unknown size if you keep the N biggest values (after you prepend the random numbers).

    For each element
    …prepend random number
    …if the new value is bigger than any of the saved values, replace the smallest saved value with this value.
    strip N saved values of prefix
    return the N saved values

    So you don’t need to know the size of the stream/list you are reading, you will always have N randomly sampled elements as long as you read N elements :)

  2. D Herring
    April 21st, 2012 at 12:33 | #2

    Here’s the same algorithm in CL, circa 2008.
    http://www.pvk.ca/Blog/Lisp/trivial_uniform_shuffling.html

    I look forward to reading about your latest ACCU experiment. The results are usually quite interesting.

  3. April 21st, 2012 at 13:32 | #3

    @D Herring
    Thanks for the link which lead me to the Fisher-Yates shuffle and an update to the post.

  4. Magnus
    May 2nd, 2012 at 05:40 | #4

    The standard C++ library has random_shuffle() in the algorithm include file, which is O(n).

    random_shuffle(v.begin(), v.end());

    which I suppose would be random_shuffle(begin(v), end(v)); in the new-fangled C++11. There’s a also an override where you can supply your own RNG.

  5. May 2nd, 2012 at 15:10 | #5

    @Magnus
    Yes, it has been there since C++99. These days there are freely available libraries for doing almost everything. I guess my post should have said something like “if you are using a language that does not easily support linking against third party libraries (awk in my case) here is a useful technique”.

  6. June 5th, 2012 at 09:24 | #6

    The Knuth (a.k.a. Fisher-Yates shuffle) is on Rosetta Code here: http://rosettacode.org/wiki/Knuth_shuffle

  7. Jonathan Thornburg
    June 7th, 2012 at 15:33 | #7

    If you’re using C, you probably don’t want to use rand() : it’s a very poor random-number generator. (For example, on many implementations the least significant bit of rand()s result is either constant, or just tlggles bacn and forth between 0 and 1 with each successive call.) random() or drand48() are much better.

  1. No trackbacks yet.

A question to answer *