{ on programming and the internets }


by Louis Brandy

Algorithms In Real Life

Sometimes I notice computer science or other assorted math nerdery in real life. These are their stories.

Alphabetizing Papers

Have you ever seen a teacher alphabetizing a couple dozen papers? This is basically a sorting problem, right? And you are a computer scientist, so sorting problems should be interesting to you. Have you ever analyzed how people alphabetize papers? How does she (for the sake of brevity I’ve made the sexist assumption that this teacher is female) do it? Almost always, she puts all the ‘A’ names in this pile, ‘B’ names in that one, and so on. The group ranges my vary (maybe A through C in this pile, etc.), but the algorithm is almost always the same. Once she’s done with that, what does she do? She tends to go letter by letter and use some other algorithm on each group. In my experience, she uses an insertion sort.

Did it ever strike you as odd that human beings used such an algorithm? Did you ever, as a young, naive, computer programmer, snicker to yourself thinking that she was using an inferior algorithm? I did. Everyone knows that quicksort is the best and fastest way to sort, right? Why don’t teachers use quicksort? My original answer to this question was that unlike computers, the human brain doesn’t do all comparisons equally. It’s just “easier” for our brains this way. Splitting into letter groups makes each smaller problem more manageable.  I was certain, in those times, there might exist some faster algorithm that teachers could use based on a quicksort. I was wrong.

Back to the computer science part: do you know the name of the sorting algorithm that these teachers use? It’s called a bucket sort. A bucket sort followed by individual insertion sorts (exactly what teachers tend to do) is a linear time sorting algorithm. Yes, linear time. Did you know there are linear time sorting algorithms? But you thought n*log(n) was the best possible sorting algorithm? That’s only the bound for comparison based sorting. When you have some notion of the distribution of the items to be sorted, you can break through that boundary and do linear time sorting. The catch with linear time sorting is that the input must follow some known distribution. This, by the way, is why teachers instinctively break the piles into various types of groupings. If there are alot of papers, you need finer group ranges. Furthermore, the ideal bucket setup would distribute the papers roughly evenly. The letter S might need its own bucket, but you can put all the letters up through F in their own bucket. Teachers have alot of experience with both the general problem and their specific problem (for example, the peculiarities of a particular class’ name distribution) and so they are optimizing the algorithm given the known distribution. They are setting up the parameters of the linear time sort (number of buckets, bucket ranges, etc.) exactly as they should to optimize the sort time.

The main downside to these linear sort algorithms is that they require alot of extra space (versus comparison-based sorting). You tend need to need an auxiliary bookkeeping array on the order of the original problem to do them. This isn’t a problem in real life! She just needs a larger table. In a very real sense, this supposedly “naive” algorithm that teachers use is among the very best possible. And woe to any computer science student who thinks to himself that she’s doing it wrong.

The Coffee Problem

So I woke up the other day, took my shower, got dressed and got on the elevator. My apartment complex has complimentary coffee in the lobby area (that’s why the rent is so high). This particular morning, when I got there, I noticed all of the little tags that tell me which dispenser was which were missing. So there I stood, with five identical coffee dispensers, virtually certain at least one of them was decaf. I don’t want decaf. What do I do?

If I was David Carruso or that dude from CSI: Las Vegas, I could have smelled the coffee, detected the bean’s nation of origin and deduced which was decaf based on the trade distribution of that country’s beans throughout the continental United States. Or maybe I could have used some nearby chemical MacGuyver style to determine which was caffeine. I’m just a dude needing caffeine, so I thought for a second, and then went for the middle one. How did they likely set these up? It seemed logical to me that since there were five, no more than two would be decaf. That seems like a good assumption. Furthermore, my “opponent”, being a rational person, likely put the decafs together, and on one end. That means the middle one is probably caffenieted.

(Another good answer I received from someone else is just to mix them all up. At worst you get 60% caffeine.)

trackback

42 Responses to “Algorithms In Real Life”

  1. October 20th, 2008 at 11:20 am

    Jeremy says:

    > A bucket sort followed by individual insertion sorts (exactly what teachers tend to do) is a linear time sorting algorithm.

    Since the size of each bucket still depends on N (assuming an even distribution, the size of each bucket is expected to be N/number_of_buckets) sorting by this method is still a quadratic time sorting algorithm. Insertion sorting the buckets, rather than the whole, just reduces the constant (by a factor of 1/number_of_buckets^2).

  2. October 20th, 2008 at 11:40 am

    Joel says:

    Jeremy:

    As hinted in the article, prior knowledge of the distribution allows one to set buckets of approximately constant size, by *(e.g.) giving “S” names their own, and sorting “A-F” together, and by adjusting the number of buckets to suit the number of entities being sorted.

  3. October 20th, 2008 at 12:03 pm

    Sam Hughes says:

    You can never do linear time sorting because you have to observe n*log(k) bits of information just to distinguish all the k distinct elements of the list you’re sorting.

    Also, knowing the expected distribution well enough to sort in expected pseudo “linear time” requires extremely fine-grained knowledge of the expected distribution. You have to be able to guess the probability density of keys on all intervals within some constant factor. That is, there must be some number X such that for all string prefixes, your guess of the proportion of keys with that string prefix is less than X times the actual value.

    What most students don’t know is that quicksort is actually no better than n*(log(n))^2, when considering the theoretical cost of comparisons. The so-called “linear” algorithms take n*log(n) time.

  4. October 20th, 2008 at 1:45 pm

    Tommy says:

    Yeah – I was going to say to just mix them.
    As stated, you’d end up with at least 60% your normal caffeine intake – enough to make it to the coffee maker at work.

  5. October 20th, 2008 at 2:25 pm

    Jeff says:

    I like to put decaf in the middle :-)

  6. October 20th, 2008 at 2:27 pm

    Pavel says:

    Maybe this isn’t a solution that would naturally come to coders, but I would probably have asked a staff member which coffee contained caffeine.

  7. October 20th, 2008 at 2:31 pm

    Bill Bennett says:

    Decaf coffee tastes a lot different. Caffeine has a bitter sharpness.

  8. October 20th, 2008 at 2:37 pm

    Rujith de Silva says:

    How about taking a sip of coffee off each dispenser?

  9. October 20th, 2008 at 2:47 pm

    Joe says:

    In this example, the teacher knows nothing about each of the buckets other than “all the papers in the stack math the regular expression [A-F].*” The papers within each bucket are essentially a randomly distributed subset of all the papers being sorted.

    Each bucket contains m=(n/k) papers, requiring O(m^2) to sort using insertion sort because the smaller stack of papers still exhibits no order. This is still quadratic time.

    The reason a teacher sorts buckets is because of the physical and psychological constraints imposed by the size, dexterity and musculature of her hands and the short term memory capacity of the human brain. She could have sorted the whole stack faster if it were comfortable for her to hold the entire stack in one hand and if she were able to locate the insertion point in log(n) time.

  10. October 20th, 2008 at 3:07 pm

    jb says:

    I agree that the teacher incurs a knowledge penalty the first time she runs the sorting, since she doesn’t know the names of all the students. Once she has the names, she can cache that knowledge and reuse the strategy. So the initial sorting setup cost is amortized over 100+ sorting instances.

    I mean, if I had to write a program to sort a shuffled deck of cards in the fastest time possible, I would establish a simple indexing rule based on the card’s suite and value, and place it in its correct slot in a 52 item array. Ignoring the time required to learn the fact that there are 52 cards with regular and consistent numbering in a deck, that’s a linear sort, isn’t it?

  11. October 20th, 2008 at 3:16 pm

    Laurie says:

    Yes, you’ve invented the American Flag sort.

    http://en.wikipedia.org/wiki/American_flag_sort

  12. October 20th, 2008 at 3:18 pm

    Dave says:

    A more common sense solution would be to fill a cup with a little of each…the majority of the coffee will be caffeinated.

  13. October 20th, 2008 at 3:21 pm

    louis says:

    @Joe,

    Bucket sort is often said to be a “linear” sort for a variety of reasons. People well learned in algorithms can add alot of caveats to that classification. Regardless, it is most certainly not a quadratic operation (in the average case, with the proper assumptions in place). For “correct” choices of the number of buckets, their distribution, and a well behaved data set, the sum of many m^2 is not the same as n^2. Therefore, it’s not correct to say it’s “still” quadratic time. Not necessarily with n, it isn’t. If we choose k correctly (as a function of n) then it is not correct to imply that O(m^2) implies O(n^2).

  14. October 20th, 2008 at 3:24 pm

    Jon says:

    The best sorting algorithm for the teacher would be to sort the students alphabetically and have them pass their papers in along their rows. It’s efficient for collection of the papers and they are pre-sorted.

  15. October 20th, 2008 at 4:00 pm

    ocd says:
  16. October 20th, 2008 at 4:10 pm

    Lev says:

    Actually the best solution is simply not to sort at all. Collect and grade in random order. To record grades into grade book just find the name on the list, or in computer-based grade book type or click the name.

    To return the papers just do a single pass calling out names and have students just come up.

  17. October 20th, 2008 at 4:24 pm

    louis says:

    I like the word alot alot. The odds of me correcting that particular mistake are slim. I’ll do my best in the future.

  18. October 20th, 2008 at 4:32 pm

    Anonymous says:

    “For “correct” choices of the number of buckets, their distribution, and a well behaved data set, the sum of many m^2 is not the same as n^2.”

    But because m is merely n times some constant 1/k (where k is the number of buckets), O(m^2) IS the same as O(n^2). Just in the same way that O(100n^2) is the same as O(n^2). It’s still quadratic even in the average case (that is, it scales quadratically with the size of n).

    You may argue that in the case where k = n, the sorting is linear–like sorting a deck of cards as mentioned above. Well that’s true, except n is supposed to be a variable and k is supposed to be a constant. If you make n a constant, then we’re not talking about algorithmic complexity at all. If you make k a variable, then for every run of the algorithm with a different n, you must re-calculate the distribution to fit correctly into k buckets, which is itself a sorting algorithm!

  19. October 20th, 2008 at 4:37 pm

    Carrington says:

    There is no such word as “alot” — it’s two words: a lot. Claiming to like a non-existent word is at best a lazy way to avoid correcting your mistake. and at worst it’s pandering to the lowest common denominator of intellectual sloth.

    I know this is the internet, home of colloquialism and illiteracy, but why not put a little effort in and rise above the cliche?

  20. October 20th, 2008 at 4:40 pm

    Caffeeeeeine Junkyie says:

    For 100% caffeine intake, Dispense equal amounts of coffee from all 5 coffee pots into two cups. Drink the first cup, and 2/3’s of the second cup. 1/0.6=1.667. Is my math good?

  21. October 20th, 2008 at 4:41 pm

    louis says:

    @anon

    As you pointed out, if k is a constant, bucket sort reduces to the complexity of the underlying sort (for insertion, n^2).

    Bucket sort, however, depends vitally on a wisely chosen k. The choice of k must take into account n to be “wise”. Therefore, the value of k necessarily is a function of n. Therefore, it is not constant.

    For our teacher example, teachers tend to use as many buckets as will make the size of each stack manageable. If there are 40 “S” papers, they will tend to make “Sa-Sc”, “Se-Sf”, etc type piles until they finally get managable stacks.

    This is exactly analogous to the case where you sort N uniformly distributed numbers by making N buckets. In this case, your expected number of a values per bucket is 1. Teachers may use some larger value (say, 5). Regardless, the value of k is chosen such that the sorting time PER BUCKET is effectively O(1).

    Now, all of this is qualified by alot of assumptions about our dataset being well-behaved and so on. It works for teachers because usual class sizes (say, 15 students to a hundred, or so) works very well with the number of “natural” buckets at their disposal.

  22. October 20th, 2008 at 4:45 pm

    louis says:

    As for the coffee problem, I wasn’t aware that caffeinated and decaf had extremely different tastes. I’ll be honest, I’m not sure I’ve ever willingly drank decaf. Maybe next time they are labeled, I’ll taste them both and hone my taste buds. That’s clearly the best solution.

  23. October 20th, 2008 at 5:03 pm

    Anonymous says:

    Louis:

    You’re assuming O(n) then, because you presume that the optimum distribution of the buckets has already been determined and can be known a priori. However, the optimum distribution of the buckets, as well as the number of buckets, is dependent upon n, as you’ve said already. That means, that if n changes, your optimum distribution must be altered. This effectively makes n a constant, because the algorithm breaks if n is changed.

    The only way for the algorithm not to break would be for you to include an algorithm that, if n changes, examines the dataset and determines the proper number of buckets k and distribution into those buckets. However, that would itself be a classical sorting problem, which would be O(n lg n). So, if n changes, then you must first run an O(n lg n) algorithm to restore your distribution’s ideal state, then run the O(n) algorithm to sort according to that distribution, which totals out to O(n lg n).

    That is exactly what algorithmic complexity measures–if n changes, how does the running time change? If n doubles, does the running time double [O(n)], or does it increase by a factor of four [O(n^2)], etc. So you can’t assume an essentially static n and then claim to measure algorithmic complexity.

  24. October 20th, 2008 at 5:19 pm

    louis says:

    @anon:

    Yes, I am assuming that the distribution is known a priori. I do not consider this an unreasonable assumption (when we consider things like student’s last names).

    You’ve used the word “optimal” which is a scary word that I would shy away from. An “optimal” distribution is not needed. Teachers do not need an “optimal” notion of the distribution. They just need a reasonable way of splitting the problem into some fixed and manageable size (say 5 papers). And teachers can accomplish this by splitting buckets into sub buckets that become too large until they reach the magical “manageable” size. This procedure can begin to resemble a radix sort.

    The important point is that the distribution is independent of n and therefore it does not “break” as you claimed, when n changes. So while the number of buckets is likely a function strictly of n, the bucket ranges is likely strictly a function of n and the distribution.

    If I wanted to make 1 element per bucket, on average, over a uniform distribution from 0 to M, I’d make n buckets spaced equally with a range of M/n. In this case, the number of buckets -is- a function of n but requires no examination of the data. Nothing “breaks” if I change n.

    So yes, if we were unlucky and our uniform distribution produced numbers that weren’t so uniform, our bucket sort would break down into the n^2 of an insertion sort. In the average case, however, it would still be O(n).

  25. October 20th, 2008 at 5:28 pm

    Gautam says:

    hmm.. going by the assumptions that
    – there are at most 2 decaf
    – both are kept together starting at any position

    I’d mix coffee from the 1st, 3rd and 5th pot.

    This gaurantees atleast 66% caffeinated coffee.

  26. October 20th, 2008 at 6:12 pm

    Anonymous says:

    “They just need a reasonable way of splitting the problem into some fixed and manageable size (say 5 papers). And teachers can accomplish this by splitting buckets into sub buckets that become too large until they reach the magical “manageable” size.”

    Okay. You’re correct that as long as the number of papers in each bucket has a fixed maximum bounded size, then sorting within the buckets is O(1).

    But let’s say you’ve come up with your buckets, which properly subdivide your problem into fixed-size manageable sets. And let’s even say you can do this in linear time, or constant time, or it is known a priori. Now you need to decide which bucket each student belongs in. That means for every student, you must perform a binary search (we’ll assume the buckets are in sorted order) to find their proper bucket–which is running an O(lg n) algorithm n times, or O(n lg n).

    “If I wanted to make 1 element per bucket, on average, over a uniform distribution from 0 to M, I’d make n buckets spaced equally with a range of M/n. In this case, the number of buckets -is- a function of n but requires no examination of the data. Nothing “breaks” if I change n.”

    In this case (essentially having a bucket for every student), you don’t need to know just how many buckets there are, you also need to know what order the buckets are in. If you create n buckets, then you must run an O(n lg n) algorithm to put them in the proper order.

    If you are using a heuristic like the first letter of the students’ last name, then you have either a constant k (26 buckets corresponding to the first letter of the last name for instance), but no constant upper bound on the number of entries in each bucket (what happens when your class doubles in size and all the new students’ last names begin with Q?), or you have a variable number of buckets that you don’t know a priori and must sort.

    This isn’t really showing an O(n) sorting algorithm, it’s showing a linear-time algorithm with low space complexity to map from a random permutation of one specific set to the permutation of that set where each element is in order, given that you know a priori the “buckets,” which requires you to have already sorted the list once using an O(n lg n) algorithm. If it were a linear-time sort, we could implement such a sorting algorithm by computer and forever be rid of the scourge of O(n lg n) sorts. :P

  27. October 20th, 2008 at 6:18 pm

    louis says:

    Anonymous,

    A few issues. First, we need to be clear that bucket sorting is only linear in the AVERAGE case. As you pointed out, if everyone’s name starts with Q, we get screwed and end up doing n^2 insertion sorts on everything. This shouldn’t be surprising, because quicksort is ALSO n^2 in the worst case scenario. It is only n*log(n) in the average case.

    The other main sticking point is your claim that putting a paper into the correct pile is not O(1). This isn’t true. If we change the problem from alphabetizing to sorting numbers, one through 100, using 10 bins. The number goes into bin x/10. I do not need to do a binary search to figure this out, all I have to do is divide by 10, and that will tell me which bin to put it in.

    Similarly, I do not need to sort my bins because every bin is defined in such a way that they are already sorted. Every number in bin 0 is before bin 1, and so forth.

    The case of alphabetizing papers is just an extension of this numerical version.

  28. October 20th, 2008 at 6:32 pm

    Anon says:

    Without knowing how the teacher would sort arbitrarily large inputs, it’s impossible to reason on the big-O cost of her sort. Yes, it is possible to do a linear time sort on arbitrarily large inputs with a fixed number of buckets (check out radix sort – i.e. use some constant upper bound n on name length, then sort on the nth letter, then the (n-1)th, and so on), but in a fixed size case like this you should never be considering O-notation, but the actual number of comparisons. I suspect that bucket-sort followed by insertion sort (given a reasonable choice of buckets) will perform better than a quicksort on a small input size.

  29. October 20th, 2008 at 7:44 pm

    Sam Hughes says:

    Bucket sort is not linear at all. If your number of buckets grows on the order of f(n), the cost of figuring out which bucket a particular name lies in will grow on the order of log(f(n)). An ideal distribution of values will leave the size of the buckets’ insides to be on the order of n/f(n). Which means they take on the order of n/f(n) * log(n/f(n)) time to sort. So the cost of sorting all the individual buckets is n * log(n/f(n)). And the cost of filling the buckets in the first place is on the order of n * log(f(n)).

    So we have that the two operations together take on the order of n*log(n/f(n)) + n*log(f(n)) time. This simplifies to n*log(n).

    At plausible disagreement with this math would be the argument that sorting each inner bucket allegedly only takes O(n/f(n)) time. But unless f(n) grows slower than n^k for every positive k, the n*log(f(n)) term alone is sufficient to simplify to n*log(n). And if f(n) does grow slower than n^k for every positive k, it can be seen just by the number of recursions that performance is worse than n^k, for any k < 2.

    So, bucket sort (with distinct elements) is not linear in the best case.

    And of course this notion of being linear — but only for small input sizes — is an absurdity.

  30. October 20th, 2008 at 7:44 pm

    anon says:

    Easier solution to the coffee problem: Drink three cups, each from a separate carafe. Then you are guaranteed to have at least one cup of caffeinated coffee.

  31. October 20th, 2008 at 7:46 pm

    Sam Hughes says:

    Oh, I see somebody is claiming you can find the bucket in O(1) time.

    This is not true. Even to construct the integer which identifies the bucket number takes O(log(n)) time, when there are n buckets.

  32. October 20th, 2008 at 7:50 pm

    anon says:

    Oh, and for the sorting: Each class is a constant size. All sorts on a particular class are O(1). The end.

  33. October 20th, 2008 at 8:08 pm

    louis says:

    @Sam

    O(log(n)) to find the appropriate bucket? Please justify.

    In the trivial example of sorting integers between 0 and M with k buckets, the bucket is X*k/M. That is not a logarithmic operation.

    Honestly, I can’t believe how much defending I’ve had to do with the relatively easy to verify fact that proper bucket sorting is a linear time operation.

  34. October 20th, 2008 at 10:50 pm

    Fritz says:

    I choose to teach at a small liberal arts college where computer science is somewhat on the fringe. Between that and the infamous downturn in enrollments, my classes are so small that sorting papers takes no time to speak of (certainly less than filling out this comment).

    I think this is called “transcending the problem”: it may not qualify as a general-purpose sorting algorithm, though … :) .

  35. October 20th, 2008 at 11:21 pm

    Bob says:

    Bullchitt – The coffee gets set up the same way everywhere. Regular on the left, sissy-decaf on the right. Grab the backup dispenser (it’s behind the 1st one from the left), and leave with the best cup in the house.
    Came here through PopUrls. You ‘guys’ are weenies.

  36. October 20th, 2008 at 11:50 pm

    Joe says:

    @louis

    In your original example the teacher was sorting names. When you argue O(1) bucket lookup you use integers and you specifically switch to evenly distributed integers because that normalizes both the range of values for the bucket (bucket width) and the number of entries in the bucket (bucket depth). That doesn’t match with the task you are comparing to.

    You can either have uneven “bucket width” to keep the bucket size under control at the cost of log(n) bucket lookup (not to mention n*log(n) to gather the data to size the buckets), or you can have fixed “bucket width” (constant lookup time) at the cost of paying an n^2 sort for clustered buckets of n elements. The problem is that the penalty for clustered buckets grows very quickly (average of the square of the element count) and it must stay under log(n) which is a very slow growing function.

    You’re trying to show that bucket sort, with insertion sorted buckets, will beat the n*log2(n) performance of quicksort quite handily for sets like names. I don’t think it is going to work. In a class of 30 students, it’s about a wash. Things get much worse the larger the set is.

  37. October 21st, 2008 at 3:50 am

    Anonymous says:

    My teachers always collected papers in alphabetical order, usually by seating students in alphabetical order.

  38. October 21st, 2008 at 8:05 am

    louis says:

    @Joe,

    You don’t need to have “fixed” width buckets to have constant time look up. But I concede that in general you are correct when you say that completely arbitrary bucket widths is rarely a constant time look up (barring the creation of some prohibitively large look up table).

    As for your second comment, I’d encourage you try to a bucket sort versus a quick sort on 30 papers (or a deck of cards). You’ll find the bucket sort is substantially better :P

    To your more important point, though, that as the data set get larger, a “naive” bucket sort becomes problematic. This is true. If you mildly alter the bucket sort into the iterative variety, like a postal sort or radix sort, it will still outperform quicksort for large N.

  39. October 23rd, 2008 at 1:28 am

    Vinh says:

    $myclass['doug'] = 3;
    $myclass['johnny'] = 23;
    $myclass['joanne'] = 22;
    $myclass['zach'] = 30;
    $myclass['abe'] = 1;

    foreach($papers as $paper)
    $sorted_papers[$myclass[$paper->name]] = $paper;

    It all simply boils down to that ya?
    that’s Linear paper arranging in my book….

  40. October 24th, 2008 at 6:04 pm

    Grover Cleveland says:

    I do not believe in buckets. Nor sorting. Nor any combination of the two.
    I believe in Bourbon Democrats.

  41. October 30th, 2008 at 12:37 pm

    Martin says:

    Concerning the paper sorting, teachers (and people in general) go for less “optimal” sorting algorithms to avoid permutations. Quick sort is nice in programming, but requires permuting many times. Try to do that with a stack of paper, and you’ll find you want to spend a bit more time on comparison than permutation.

  42. November 3rd, 2008 at 11:41 pm

    Christopher Elwell says:

    I like thinking in this way. And seeing the world around me, thinking about the molecules and atoms as a particle simulation, it is wonderful. God created it all. And beautifully.

Leave a Reply


Need a new job?