Weekly Thoughts: Algorithm Aversion, The Most Dangerous Equation and Small Data
Here are three statistics related things that caught our eye this week:
We recently read an interesting Financial Times op-ed by David Siegel, co-chair of algorithmic investment firm Two Sigma, which makes the case for increased use of technological solutions for many of life’s biggest problems. Many people are hesitant to give up decision making authority on important issues related to life, death, and money, among others. Failing a personal decision, most prefer human advice on such matters. Siegel suggests this tendency to avoid algorithmic solutions is increasingly sub-optimal. From the article:
“In fields as wide ranging as medical diagnosis, meteorology and finance, dozens of studies have found that algorithms at least match — and usually surpass — subjective human analysis. Researchers have devised algorithms that estimate the likelihood of events such as a particular convict lapsing back into crime after being released from custody, or a particular business start-up going bust. When they pitted the predictive powers of these programs against human observers, they found that the humans did worse.”
The speed and precision of modern computing power is only augmented by its consistency. Siegel notes that comparisons of human and computer based performance often assume peak output for both. Incorporating reasonable assumptions for human frailty (your author, for instance, could have written this article faster had he not paused to get coffee) makes the comparison considerably more lopsided.
Interestingly however, the population at large remains skeptical even when presented with compelling evidence of the advantage technology represents. In 2014, three University of Pennsylvania researchers published a study on this phenomenon which they called “algorithm aversion.” Their research indicated that:
“…people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.”
The problem, it seems, comes down to trust. Humans are quick to ditch computers at the first sign of an error, even if those errors are less severe than the mistakes humans themselves make. Siegel notes that it is up to practitioners to advocate for the increased use of algorithms. In business settings, where owners or employees can be the beneficiaries of such technology, it is also important to try to overcome our own biases to ensure we are using all the tools at our disposal optimally.
The Most Dangerous Equation
Any embrace of technological or algorithmic solutions must also come with an understanding of the limitations involved. Oversimplifying statistical tools can, in some cases, do more harm than good. This week we came across an interesting 2007 article from American Scientist about the dangers posed by the misuse of certain equations in real world settings. According to the article, de Moivre’s equation, which holds that the variation of the mean is inversely proportional to the sample size, has had particularly harmful effects. As an example, consider the following graph from the article:
The map breaks up counties in the United States by decile based on kidney-cancer rates, and then shades the lowest (teal) and highest (red). We’d like to look at the data and try to infer something about the relationship between cancer rates and living in a particular area. However, each cohort (both teal and red) tends to be located in rural, Midwestern, Southern, or Western states. In fact, some of these counties are located adjacent to one another. From the article:
“What is going on? We are seeing de Moivre’s equation in action. The variation of the mean is inversely proportional to the sample size, so small counties display much greater variation than large counties. A county with, say, 100 inhabitants that has no cancer deaths would be in the lower category, but if it has 1 cancer death it would be among the highest. Counties like Los Angeles, Cook or Miami-Dade with millions of inhabitants do not bounce around like that.”
The point is that small sample sizes are far more likely to be over represented at the tails of distributions given the variability that can occur. From health statistics to school size to safety statistics, this is a lesson that, if forgotten, can have profound public policy and business ramifications. De Moivre’s equation is a timely reminder that algorithms and statistics can be helpful so long as we remember the potential limitations of these tools.
The other issue to keep in mind when increasing the utilization of algorithmic solutions is the human element. Behavioral economists have risen to prominence by highlighting that purely rational action does not always do the best job of explaining the human condition. Richard Thaler, a professor of economics and behavioral science at the University of Chicago’s Booth School of Business, recently provided an excellent example in a New York Times article. He describes his experience creating an exam which was intended to be difficult and have a wide dispersion of scores. When the students in his class scored a 72 out of 100 on average, they were furious. From the article:
“What was odd about this reaction was that I had already explained that the average numerical score on the exam had absolutely no effect on the distribution of letter grades. We employed a curve in which the average grade was a B+, and only a tiny number of students received grades below a C. I told the class this, but it had no effect on the students’ mood. They still hated my exam, and they were none too happy with me either. As a young professor worried about keeping my job, I wasn’t sure what to do.
Finally, an idea occurred to me. On the next exam, I raised the points available for a perfect score to 137. This exam turned out to be harder than the first. Students got only 70 percent of the answers right but the average numerical score was 96 points. The students were delighted!”
The difficulty here from a traditional economic perspective is that the student reaction is completely irrational. However, there is ample evidence that irrational or irrelevant factors often play quite an important role in decision making. Ultimately then, most well functioning algorithms will need to be blended with some acknowledgment of this irrationality in order to generate useful output.
Interestingly, some of the most innovative companies are following this prescription through the use of so-called “small data,” which utilizes surveys and human judgement to help guide where to take the big data analysis next. From another recent New York Times article:
“Facebook has tons of data on how people use its site. It’s easy to see whether a particular news feed story was liked, clicked, commented on or shared. But not one of these is a perfect proxy for more important questions: What was the experience like? Did the story connect you with your friends? Did it inform you about the world? Did it make you laugh?
To get to these measures, Facebook has to take an old-fashioned approach: asking. Every day, hundreds of individuals load their news feed and answer questions about the stories they see there. Big data (likes, clicks, comments) is supplemented by small data (‘Do you want to see this post in your News Feed?’) and contextualized (‘Why?’).”
For us, the key takeaway is that different types of questions require different techniques for gathering and acting on data. Big data is uniquely equipped to answer what?, but often it is humans that are best able to tackle the more nuanced questions why? or how?, the answers to which will ultimately drive the next iteration of the big data algorithm. This symbiotic relationship illustrates that as algorithmic solutions grow, the need for human insight into those solutions will grow in kind. As the authors of the article pointed out, “no one data set, no matter how big, is going to tell us exactly what we need. The new mountains of blunt data sets make human creativity, judgment, intuition and expertise more valuable, not less.”
Your Chenmark Capital Team