Sometimes I notice computer science or other assorted math nerdery in real life. These are their stories.
Did it ever strike you as odd that human beings used such an algorithm? Did you ever, as a young, naive, computer programmer, snicker to yourself thinking that she was using an inferior algorithm? I did. Everyone knows that quicksort is the best and fastest way to sort, right? Why don't teachers use quicksort? My original answer to this question was that unlike computers, the human brain doesn't do all comparisons equally. It's just "easier" for our brains this way. Splitting into letter groups makes each smaller problem more manageable. I was certain, in those times, there might exist some faster algorithm that teachers could use based on a quicksort. I was wrong.
Back to the computer science part: do you know the name of the sorting algorithm that these teachers use? It's called a bucket sort. A bucket sort followed by individual insertion sorts (exactly what teachers tend to do) is a linear time sorting algorithm. Yes, linear time. Did you know there are linear time sorting algorithms? But you thought n*log(n) was the best possible sorting algorithm? That's only the bound for comparison based sorting. When you have some notion of the distribution of the items to be sorted, you can break through that boundary and do linear time sorting. The catch with linear time sorting is that the input must follow some known distribution. This, by the way, is why teachers instinctively break the piles into various types of groupings. If there are alot of papers, you need finer group ranges. Furthermore, the ideal bucket setup would distribute the papers roughly evenly. The letter S might need its own bucket, but you can put all the letters up through F in their own bucket. Teachers have alot of experience with both the general problem and their specific problem (for example, the peculiarities of a particular class' name distribution) and so they are optimizing the algorithm given the known distribution. They are setting up the parameters of the linear time sort (number of buckets, bucket ranges, etc.) exactly as they should to optimize the sort time.
The main downside to these linear sort algorithms is that they require alot of extra space (versus comparison-based sorting). You tend need to need an auxiliary bookkeeping array on the order of the original problem to do them. This isn't a problem in real life! She just needs a larger table. In a very real sense, this supposedly "naive" algorithm that teachers use is among the very best possible. And woe to any computer science student who thinks to himself that she's doing it wrong.
If I was David Carruso or that dude from CSI: Las Vegas, I could have smelled the coffee, detected the bean's nation of origin and deduced which was decaf based on the trade distribution of that country's beans throughout the continental United States. Or maybe I could have used some nearby chemical MacGuyver style to determine which was caffeine. I'm just a dude needing caffeine, so I thought for a second, and then went for the middle one. How did they likely set these up? It seemed logical to me that since there were five, no more than two would be decaf. That seems like a good assumption. Furthermore, my "opponent", being a rational person, likely put the decafs together, and on one end. That means the middle one is probably caffenieted.
(Another good answer I received from someone else is just to mix them all up. At worst you get 60% caffeine.)