The Signal and the Noise: Why So Many Predictions Fail-but Some Don't
Nate Silver

Ended: Oct. 29, 2012

The instinctual shortcut that we take when we have “too much information” is to engage with it selectively, picking out the parts we like and ignoring the remainder, making allies with those who have made the same choices and enemies of the rest.
The problem, Poggio says, is that these evolutionary instincts sometimes lead us to see patterns when there are none there. “People have been doing that all the time,” Poggio said. “Finding patterns in random noise.”
Moody’s estimated the extent to which mortgage defaults were correlated with one another by building a model from past data—specifically, they looked at American housing data going back to about the 1980s.101 The problem is that from the 1980s through the mid-2000s, home prices were always steady or increasing in the United States. Under these circumstances, the assumption that one homeowner’s mortgage has little relationship to another’s was probably good enough. But nothing in that past data would have described what happened when home prices began to decline in tandem. The housing collapse was an out-of-sample event, and their models were worthless for evaluating default risk under those conditions.
But forecasters often resist considering these out-of-sample problems. When we expand our sample to include events further apart from us in time and space, it often means that we will encounter cases in which the relationships we are studying did not hold up as well as we are accustomed to. The model will seem to be less powerful. It will look less impressive in a PowerPoint presentation (or a journal article or a blog post). We will be forced to acknowledge that we know less about the world than we thought we did. Our personal and professional incentives almost always discourage us from doing this.
One of the pervasive risks that we face in the information age, as I wrote in the introduction, is that even if the amount of knowledge in the world is increasing, the gap between what we know and what we think we know may be widening. This syndrome is often associated with very precise-seeming predictions that are not at all accurate. Moody’s carried out their calculations to the second decimal place—but they were utterly divorced from reality. This is like claiming you are a good shot because your bullets always end up in about the same place—even though they are nowhere near the target
Tetlock found that some had done better than others. On the losing side were those experts whose predictions were cited most frequently in the media. The more interviews that an expert had done with the press, Tetlock found, the worse his predictions tended to be. Another subgroup of experts had done relatively well, however. Tetlock, with his training as a psychologist, had been interested in the experts’ cognitive styles—how they thought about the world. So he administered some questions lifted from personality tests to all the experts. On the basis of their responses to these questions, Tetlock was able to classify his experts along a spectrum between what he called hedgehogs and foxes. The reference to hedgehogs and foxes comes from the title of an Isaiah Berlin essay on the Russian novelist Leo Tolstoy—The Hedgehog and the Fox. Berlin had in turn borrowed his title from a passage attributed to the Greek poet Archilochus: “The fox knows many little things, but the hedgehog knows one big thing.” Unless you are a fan of Tolstoy—or of flowery prose—you’ll have no particular reason to read Berlin’s essay. But the basic idea is that writers and thinkers can be divided into two broad categories: Hedgehogs are type A personalities who believe in Big Ideas—in governing principles about the world that behave as though they were physical laws and undergird virtually every interaction in society. Think Karl Marx and class struggle, or Sigmund Freud and the unconscious. Or Malcolm Gladwell and the “tipping point.” Foxes, on the other hand, are scrappy creatures who believe in a plethora of little ideas and in taking a multitude of approaches toward a problem. They tend to be more tolerant of nuance, uncertainty, complexity, and dissenting opinion. If hedgehogs are hunters, always looking out for the big kill, then foxes are gatherers. Foxes, Tetlock found, are considerably better at forecasting than hedgehogs. They had come closer to the mark on the Soviet Union, for instance. Rather than seeing the USSR in highly ideological terms—as an intrinsically “evil empire,” or as a relatively successful (and perhaps even admirable) example of a Marxist economic system—they instead saw it for what it was: an increasingly dysfunctional nation that was in danger of coming apart at the seams.
Foxes sometimes have more trouble fitting into type A cultures like television, business, and politics. Their belief that many problems are hard to forecast—and that we should be explicit about accounting for these uncertainties—may be mistaken for a lack of self-confidence. Their pluralistic approach may be mistaken for a lack of conviction; Harry Truman famously demanded a “one-handed economist,” frustrated that the foxes in his administration couldn’t give him an unqualified answer.
Our brains, wired to detect patterns, are always looking for a signal, when instead we should appreciate how noisy the data is.
We have trouble distinguishing a 90 percent chance that the plane will land safely from a 99 percent chance or a 99.9999 percent chance, even though these imply vastly different things about whether we ought to book our ticket.
Ultimately, the right attitude is that you should make the best forecast possible today—regardless of what you said last week, last month, or last year. Making a new forecast does not mean that the old forecast just disappears. (Ideally, you should keep a record of it and let people evaluate how well you did over the whole course of predicting an event.) But if you have reason to think that yesterday’s forecast was wrong, there is no glory in sticking to it. “When the facts change, I change my mind,” the economist John Maynard Keynes famously said. “What do you do, sir?”
Sanders’s wife is a special-needs educator and pointed him toward research suggesting that most of us are still in a state of mental adolescence until about the age of twenty-four.40 Before that age, Sanders will cut a player some slack if he sees signs that their mental tools are developing. After that, he needs to see performance.
The key to making a good forecast, as we observed in chapter 2, is not in limiting yourself to quantitative information. Rather, it’s having a good process for weighing the information appropriately. This is the essence of Beane’s philosophy: collect as much information as possible, but then be as rigorous and disciplined as possible when analyzing it. The litmus test for whether you are a competent forecaster is if more information makes your predictions better. If you’re screwing it up, you have some bad habits and attitudes, like Phil Tetlock’s political pundits did. If Prospect A is hitting .300 with twenty home runs and works at a soup kitchen during his off days, and Prospect B is hitting .300 with twenty home runs but hits up nightclubs and snorts coke during his free time, there is probably no way to quantify this distinction. But you’d sure as hell want to take it into account.
The idea takes on various forms, but no one took it further than Pierre-Simon Laplace, a French astronomer and mathematician. In 1814, Laplace made the following postulate, which later came to be known as Laplace’s Demon: We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.13 Given perfect knowledge of present conditions (“all positions of all items of which nature is composed”), and perfect knowledge of the laws that govern the universe (“all forces that set nature in motion”), we ought to be able to make perfect predictions (“the future just like the past would be present”). The movement of every particle in the universe should be as predictable as that of the balls on a billiard table. Human beings might not be up to the task, Laplace conceded. But if we were smart enough (and if we had fast enough computers) we could predict the weather and everything else—and we would find that nature itself is perfect.
At loggerheads with the determinists are the probabilists, who believe that the conditions of the universe are knowable only with some degree of uncertainty.* Probabilism was, at first, mostly an epistemological paradigm: it avowed that there were limits on man’s ability to come to grips with the universe. More recently, with the discovery of quantum mechanics, scientists and philosophers have asked whether the universe itself behaves probabilistically. The particles Laplace sought to identify begin to behave like waves when you look closely enough—they seem to occupy no fixed position. How can you predict where something is going to go when you don’t know where it is in the first place? You can’t. This is the basis for the theoretical physicist Werner Heisenberg’s famous uncertainty principle.14 Physicists interpret the uncertainty principle in different ways, but it suggests that Laplace’s postulate cannot literally be true. Perfect predictions are impossible if the universe itself is random.
What could go wrong? Chaos theory. You may have heard the expression: the flap of a butterfly’s wings in Brazil can set off a tornado in Texas. It comes from the title of a paper19 delivered in 1972 by MIT’s Edward Lorenz, who began his career as a meteorologist. Chaos theory applies to systems in which each of two properties hold: The systems are dynamic, meaning that the behavior of the system at one point in time influences its behavior in the future; And they are nonlinear, meaning they abide by exponential rather than additive relationships. Dynamic systems give forecasters plenty of problems—as I describe in chapter 6, for example, the fact that the American economy is continually evolving in a chain reaction of events is one reason that it is very difficult to predict. So do nonlinear ones: the mortgage-backed securities that triggered the financial crisis were designed in such a way that small changes in macroeconomic conditions could make them exponentially more likely to default.
In 1940, the chance of an American being killed by lightning in a given year was about 1 in 400,000.33 Today, it’s just 1 chance in 11,000,000, making it almost thirty times less likely. Some of this reflects changes in living patterns (more of our work is done indoors now) and improvement in communications technology and medical care, but it’s also because of better weather forecasts.
Forecasts made eight days in advance, for example, demonstate almost no skill; they beat persistence but are barely better than climatology. And at intervals of nine or more days in advance, the professional forecasts were actually a bit worse than climatology.
The meteorologists at the Weather Channel will fudge a little bit under certain conditions. Historically, for instance, when they say there is a 20 percent chance of rain, it has actually only rained about 5 percent of the time.47 In fact, this is deliberate and is something the Weather Channel is willing to admit to. It has to do with their economic incentives. People notice one type of mistake—the failure to predict rain—more than another kind, false alarms. If it rains when it isn’t supposed to, they curse the weatherman for ruining their picnic, whereas an unexpectedly sunny day is taken as a serendipitous bonus. It isn’t good science, but as Dr. Rose at the Weather Channel acknolwedged to me: “If the forecast was objective, if it has zero bias in precipitation, we’d probably be in trouble.”
Studies from Katrina and other storms have found that having survived a hurricane makes one less likely to evacuate the next time one comes.
The magnitude scale is logarithmic; a one-point increase in the scale indicates that the energy release has multiplied by thirty-two.
As pointed out by the Nobel Prize–winning economist Robert Lucas37 in 1976, the past data that an economic model is premised on resulted in part from policy decisions in place at the time. Thus, it may not be enough to know what current policy makers will do; you also need to know what fiscal and monetary policy looked like during the Nixon administration. A related doctrine known as Goodhart’s law, after the London School of Economics professor who proposed it,38 holds that once policy makers begin to target a particular variable, it may begin to lose its value as an economic indicator. For instance, if the government artificially takes steps to inflate housing prices, they might well increase, but they will no longer be good measures of overall economic health.
Historically, for instance, there has been a reasonably strong correlation between GDP growth and job growth. Economists refer to this as Okun’s law. During the Long Boom of 1947 through 1999, the rate of job growth40 had normally been about half the rate of GDP growth, so if GDP increased by 4 percent during a year, the number of jobs would increase by about 2 percent. The relationship still exists—more growth is certainly better for job seekers. But its dynamics seem to have changed. After each of the last couple of recessions, considerably fewer jobs were created than would have been expected during the Long Boom years. In the year after the stimulus package was passed in 2009, for instance, GDP was growing fast enough to create about two million jobs according to Okun’s law.41 Instead, an additional 3.5 million jobs were lost during the period.
Just as there are political pundits who make careers out of making implausible claims to partisan audiences, there are bears, bulls, and contrarians who will always have a constituency in the marketplace for economic ideas. (Sometimes economic forecasts have expressly political purposes too. It turns out that the economic forecasts produced by the White House, for instance, have historically been among the least accurate of all,69 regardless of whether it’s a Democrat or a Republican in charge.)
“The way we think about it is if you take something like initial claims on unemployment insurance, that’s a very good predictor for unemployment rates, which is a good predictor for economic activity,” I was told by Google’s chief economist, Hal Varian, at Google’s headquarters in Mountain View, California. “We can predict unemployment initial claims earlier because if you’re in a company and a rumor goes around that there are going to be layoffs, then people start searching ‘where’s the unemployment office,’ ‘how am I going to apply for unemployment,’ and so on. It’s a slightly leading indicator.”
Robin Hanson, an economist at George Mason University, is an advocate of the supply-side alternative. I met him for lunch at one of his favorite Moroccan places in northern Virginia. He’s in his early fifties but looks much younger (despite being quite bald), and is a bit of an eccentric. He plans to have his head cryogenically frozen when he dies.71 He is also an advocate of a system he calls “futarchy” in which decisions on policy issues are made by prediction markets72 rather than politicians. He is clearly not a man afraid to challenge the conventional wisdom. Instead, Hanson writes a blog called Overcoming Bias, in which he presses his readers to consider which cultural taboos, ideological beliefs, or misaligned incentives might constrain them from making optimal decisions.
More broadly, it means recognizing that the amount of confidence someone expresses in a prediction is not a good indication of its accuracy—to the contrary, these qualities are often inversely correlated. Danger lurks, in the economy and elsewhere, when we discourage forecasters from making a full and explicit account of the risks inherent in the world around us.
Perhaps the bigger problem from a statistical standpoint, however, is that precise predictions aren’t really possible to begin with when you are extrapolating on an exponential scale.
One of the most useful quantities for predicting disease spread is a variable called the basic reproduction number. Usually designated as R0, it measures the number of uninfected people that can expect to catch a disease from a single infected individual. An R0 of 4, for instance, means that—in the absence of vaccines or other preventative measures—someone who gets a disease can be expected to pass it along to four other individuals before recovering (or dying) from it. In theory, any disease with an R0 greater than 1 will eventually spread to the entire population in the absence of vaccines or quarantines. But the numbers are sometimes much higher than this: R0 was about 3 for the Spanish flu, 6 for smallpox, and 15 for measles. It is perhaps well into the triple digits for malaria, one of the deadliest diseases in the history of civilization, which still accounts for about 10 percent of all deaths in some parts of the world today.
In many cases involving predictions about human activity, the very act of prediction can alter the way that people behave. Sometimes, as in economics, these changes in behavior can affect the outcome of the prediction itself, either nullifying it or making it more accurate. Predictions about the flu and other infectious diseases are affected by both sides of this problem. A case where a prediction can bring itself about is called a self-fulfilling prediction or a self-fulfilling prophecy. This can happen with the release of a political poll in a race with multiple candidates, such as a presidential primary. Voters in these cases may behave tactically, wanting to back a candidate who could potentially win the state rather than waste their vote, and a well-publicized poll is often the best indication of whether a candidate is fit to do that. In the late stages of the Iowa Republican caucus race in 2012, for example, CNN released a poll that showed Rick Santorum surging to 16 percent of the vote when he had been at about 10 percent before.60 The poll may have been an outlier—other surveys did not show Santorum gaining ground until after the CNN poll had been released.61 Nevertheless, the poll earned Santorum tons of favorable media coverage and some voters switched to him from ideologically similar candidates like Michele Bachmann and Rick Perry. Before long, the poll had fulfilled its own destiny, with Santorum eventually winning Iowa while Bachmann and Perry finished far out of the running.
This self-defeating quality can also be a problem for the accuracy of flu predictions because their goal, in part, is to increase public awareness of the disease and therefore change the public’s behavior. The most effective flu prediction might be one that fails to come to fruition because it motivates people toward more healthful choices.
The epidemiologists I spoke with for this chapter—in a refreshing contrast to their counterparts in some other fields—were strongly aware of the limitations of their models. “It’s stupid to predict based on three data points,” Marc Lipsitch told me, referring to the flu pandemics in 1918, 1957, and 1968. “All you can do is plan for different scenarios.”
If you can’t make a good prediction, it is very often harmful to pretend that you can. I suspect that epidemiologists, and others in the medical community, understand this because of their adherence to the Hippocratic oath. Primum non nocere: First, do no harm. Much of the most thoughtful work on the use and abuse of statistical models and the proper role of prediction comes from people in the medical profession.88 That is not to say there is nothing on the line when an economist makes a prediction, or a seismologist does. But because of medicine’s intimate connection with life and death, doctors tend to be appropriately cautious. In their field, stupid models kill people. It has a sobering effect. There is something more to be said, however, about Chip Macal’s idea of “modeling for insights.” The philosophy of this book is that prediction is as much a means as an end. Prediction serves a very central role in hypothesis testing, for instance, and therefore in all of science.
Bayes’s theorem is concerned with conditional probability. That is, it tells us the probability that a theory or hypothesis is true if some event has happened. Suppose you are living with a partner and come home from a business trip to discover a strange pair of underwear in your dresser drawer. You will probably ask yourself: what is the probability that your partner is cheating on you? The condition is that you have found the underwear; the hypothesis you are interested in evaluating is the probability that you are being cheated on. Bayes’s theorem, believe it or not, can give you an answer to this sort of question—provided that you know (or are willing to estimate) three quantities: First, you need to estimate the probability of the underwear’s appearing as a condition of the hypothesis being true—that is, you are being cheated upon. Let’s assume for the sake of this problem that you are a woman and your partner is a man, and the underwear in question is a pair of panties. If he’s cheating on you, it’s certainly easy enough to imagine how the panties got there. Then again, even (and perhaps especially) if he is cheating on you, you might expect him to be more careful. Let’s say that the probability of the panties’ appearing, conditional on his cheating on you, is 50 percent. Second, you need to estimate the probability of the underwear’s appearing conditional on the hypothesis being false. If he isn’t cheating, are there some innocent explanations for how they got there? Sure, although not all of them are pleasant (they could be his panties). It could be that his luggage got mixed up. It could be that a platonic female friend of his, whom you trust, stayed over one night. The panties could be a gift to you that he forgot to wrap up. None of these theories is inherently untenable, although some verge on dog-ate-my-homework excuses. Collectively you put their probability at 5 percent. Third and most important, you need what Bayesians call a prior probability (or simply a prior). What is the probability you would have assigned to him cheating on you before you found the underwear? Of course, it might be hard to be entirely objective about this now that the panties have made themselves known. (Ideally, you establish your priors before you start to examine the evidence.) But sometimes, it is possible to estimate a number like this empirically. Studies have found, for instance, that about 4 percent of married partners cheat on their spouses in any given year,33 so we’ll set that as our prior. If we’ve estimated these values, Bayes’s theorem can then be applied to establish a posterior possibility. This is the number that we’re interested in: how likely is it that we’re being cheated on, given that we’ve found the underwear? The calculation (and the simple algebraic expression that yields it) is in figure 8-3. As it turns out, this probability is still fairly low: 29 percent. This may still seem counterintuitive—aren’t those panties pretty…
Usually, however, we focus on the newest or most immediately available information, and the bigger picture gets lost.
This is not to suggest that our priors always dominate the new evidence, however, or that Bayes’s theorem inherently produces counterintuitive results. Sometimes, the new evidence is so powerful that it overwhelms everything else, and we can go from assigning a near-zero probability of something to a near-certainty of it almost instantly. Consider a somber example: the September 11 attacks. Most of us would have assigned almost no probability to terrorists crashing planes into buildings in Manhattan when we woke up that morning. But we recognized that a terror attack was an obvious possibility once the first plane hit the World Trade Center. And we had no doubt we were being attacked once the second tower was hit.
In 2005, Ioannidis published an influential paper, “Why Most Published Research Findings Are False,”40 in which he cited a variety of statistical and theoretical arguments to claim that (as his title implies) the majority of hypotheses deemed to be true in journals in medicine and most other academic and scientific professions are, in fact, false. Ioannidis’s hypothesis, as we mentioned, looks to be one of the true ones; Bayer Laboratories found that they could not replicate about two-thirds of the positive findings claimed in medical journals when they attempted the experiments themselves.41 Another way to check the veracity of a research finding is to see whether it makes accurate predictions in the real world—and as we have seen throughout this book, it very often does not. The failure rate for predictions made in entire fields ranging from seismology to political science appears to be extremely high.
Meanwhile, as we know from Bayes’s theorem, when the underlying incidence of something in a population is low (breast cancer in young women; truth in the sea of data), false positives can dominate the results if we are not careful. Figure 8-6 represents this graphically. In the figure, 80 percent of true scientific hypotheses are correctly deemed to be true, and about 90 percent of false hypotheses are correctly rejected. And yet, because true findings are so rare, about two-thirds of the findings deemed to be true are actually false! Unfortunately, as Ioannidis figured out, the state of published research in most fields that conduct statistical testing is probably very much like what you see in figure 8-6.* Why is the error rate so high? To some extent, this entire book represents an answer to that question. There are many reasons for it—some having to do with our psychological biases, some having to do with common methodological errors, and some having to do with misaligned incentives. Close to the root of the problem, however, is a flawed type of statistical thinking that these researchers are applying.
Fisher and his contemporaries had no problem with the formula called Bayes’s theorem per se, which is just a simple mathematical identity. Instead, they were worried about how it might be applied. In particular, they took issue with the notion of the Bayesian prior.46 It all seemed too subjective: we have to stipulate, in advance, how likely we think something is before embarking on an experiment about it? Doesn’t that cut against the notion of objective science? So Fisher and his contemporaries instead sought to develop a set of statistical methods that they hoped would free us from any possible contamination from bias. This brand of statistics is usually called “frequentism” today, although the term “Fisherian” (as opposed to Bayesian) is sometimes applied to it.47 The idea behind frequentism is that uncertainty in a statistical problem results exclusively from collecting data among just a sample of the population rather than the whole population. This makes the most sense in the context of something like a political poll. A survey in California might sample eight hundred people rather than the eight million that will turn out to vote in an upcoming election there, producing what’s known as sampling error. The margin of error that you see reported alongside political polls is a measure of this: exactly how much error is introduced because you survey eight hundred people in a population of eight million? The frequentist methods are designed to quantify this.
The bigger problem, however, is that the frequentist methods—in striving for immaculate statistical procedures that can’t be contaminated by the researcher’s bias—keep him hermetically sealed off from the real world. These methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability. Thus, you will see apparently serious papers published on how toads can predict earthquakes,50 or how big-box stores like Target beget racial hate groups,51 which apply frequentist tests to produce “statistically significant” (but manifestly ridiculous) findings.
But perhaps the bigger problem is the way that Fisher’s statistical philosophy tends to conceive of the world. It emphasizes the objective purity of the experiment—every hypothesis could be tested to a perfect conclusion if only enough data were collected. However, in order to achieve that purity, it denies the need for Bayesian priors or any other sort of messy real-world context. These methods neither require nor encourage us to think about the plausibility of our hypothesis: the idea that cigarettes cause lung cancer competes on a level playing field with the idea that toads predict earthquakes. It is, I suppose, to Fisher’s credit that he recognized that correlation does not always imply causation. However, the Fisherian statistical methods do not encourage us to think about which correlations imply causations and which ones do not. It is perhaps no surprise that after a lifetime of thinking this way, Fisher lost the ability to tell the difference.
Recently, however, some well-respected statisticians have begun to argue that frequentist statistics should no longer be taught to undergraduates.68 And some professions have considered banning Fisher’s hypothesis test from their journals.69 In fact, if you read what’s been written in the past ten years, it’s hard to find anything that doesn’t advocate a Bayesian approach.
In most cases, we cannot test our ideas as quickly as Google, which gets feedback more or less instantaneously from hundreds of millions of users around the world. Nor do we have access to a supercomputer, as Deep Blue’s engineers did. Progress will occur at a much slower rate. Nevertheless, a commitment to testing ourselves—actually seeing how well our predictions work in the real world rather than in the comfort of a statistical model—is probably the best way to accelerate the learning process.
Poker is sometimes perceived to be a highly psychological game, a battle of wills in which opponents seek to make perfect reads on one another by staring into one another’s souls, looking for “tells” that reliably betray the contents of the other hands. There is a little bit of this in poker, especially at the higher limits, but not nearly as much as you’d think. (The psychological factors in poker come mostly in the form of self-discipline.) Instead, poker is an incredibly mathematical game that depends on making probabilistic judgments amid uncertainty, the same skills that are important in any type of prediction.
Only the very worst ones will have failed to commit the most basic odds calculations to memory: that a flush has about a 1-in-3 chance of coming in with two cards to come, or that a pair of aces will beat a pair of kings about 80 percent of the time. The core analytic skill, rather, is what players call “hand reading”: in figuring which cards your opponent might hold, and how they might affect her decisions throughout the rest of the hand.
In a televised game in 2009, two world-class players, Tom Dwan and Phil Ivey, played a hand in which the pot size eventually reached more than a million dollars.7 In the hand, Ivey caught a miracle card on the turn to make him a 5-high straight. Unfortunately, the same card also gave Dwan a 7-high straight,* the only possible better hand. “If anybody can get away from this, it’s Phil Ivey,” one of the announcers said, implying that it would be a sign of superior poker talent if he folded. In fact, throwing away the hand would have been a terrible play. Given what Ivey knew at the time, and how aggressively he and Dwan play, he should have expected to have the best hand at least 90 percent of the time. If Ivey hadn’t lost all his chips on the hand, he would have been playing badly. While television coverage has been a great boon to poker, it leaves many casual players with misleading impressions about the right way to play it, focusing too much on the results and not enough on the correct decision-making process. “It’s not very common that you can narrow someone’s holdings down to one hand,” Dwan told me. “Definitely much less common than most pros and TV shows would have you believe.”
Yet for all his apparent bravado—Dwan is fairly low-key in person—12 his approach to thinking about poker and the world in general is highly probabilistic. He profits because his opponents are too sure of themselves. “It’s important in most areas of life to come up with a probability instead of a yes or no,” he told me. “It’s a huge flaw that people make in a lot of areas that they analyze, whether they’re trying to form a fiscal union, pay for groceries, or hoping that they don’t get fired.”
Dwan seeks to exploit these tendencies by deliberately obfuscating his play. If the most important technical skill in poker is learning how to forecast your opponent’s hand range, the next-most-important one is making your own play unpredictable. “The better people are, the less certain you’re going to be about what they have or what they’re doing or what their range is,” Dwan says. “And they’ll be more apt to manipulate that to take advantage of your judgments.”
When you raise before the flop, for instance, the opponent will typically put you on big cards like those containing aces, kings, and queens. You will have those hands sometimes, of course. But I would also raise with hands like the ones we were worried about the Lawyer having, hands with small cards like 7 6. What I found is that when big cards came on the board, like an ace or king, the opponent would often give me credit for catching those cards and fold. If smaller cards came instead, meanwhile, I’d often have made a pair or some kind of good draw. Sometimes, I’d even make an unlikely-seeming hand like a straight with these cards, which could send my opponents into tilt. One interesting thing about poker is that the very best players and the very worst ones both play quite randomly, although for different reasons.* Thus, you can sometimes fool opponents into thinking you are a weak player even if you are likely to take their money.
In fact, bluffing and aggressive play is not just a luxury in poker but a necessity—otherwise your play is just too predictable. Poker games have become extremely aggressive since I stopped playing regularly five years ago, and game theory13 as well as computer simulations14 strongly suggest this is the optimal approach. Blitzing your opponent with a deluge of possibilities is the best way to complicate his probability calculations.
The name for the curve comes from the well-known business maxim called the Pareto principle or 80-20 rule (as in: 80 percent of your profits come from 20 percent of your customers16). As I apply it here, it posits that getting a few basic things right can go a long way. In poker, for instance, simply learning to fold your worst hands, bet your best ones, and make some effort to consider what your opponent holds will substantially mitigate your losses. If you are willing to do this, then perhaps 80 percent of the time you will be making the same decision as one of the best poker players like Dwan—even if you have spent only 20 percent as much time studying the game.
If you have strong analytical skills that might be applicable in a number of disciplines, it is very much worth considering the strength of the competition. It is often possible to make a profit by being pretty good at prediction in fields where the competition succumbs to poor incentives, bad habits, or blind adherence to tradition—or because you have better data or technology than they do. It is much harder to be very good in fields where everyone else is getting the basics right—and you may be fooling yourself if you think you have much of an edge.
The Bayesian method described in the book The Mathematics of Poker, for instance, would suggest that a player who had made $30,000 in his first 10,000 hands at a $100/$200 limit hold ’em game was nevertheless more likely than not to be a long-term loser.
there is strong empirical and theoretical evidence that there is a benefit in aggregating different forecasts. Across a number of disciplines, from macroeconomic forecasting to political polling, simply taking an average of everyone’s forecast rather than relying on just one has been found to reduce forecast error,14 often by about 15 or 20 percent. But before you start averaging everything together, you should understand three things. First, while the aggregate forecast will essentially always be better than the typical individual’s forecast, that doesn’t necessarily mean it will be good. For instance, aggregate macroeconomic forecasts are much too crude to predict recessions more than a few months in advance. They are somewhat better than individual economists’ forecasts, however. Second, the most robust evidence indicates that this wisdom-of-crowds principle holds when forecasts are made independently before being averaged together. In a true betting market (including the stock market), people can and do react to one another’s behavior. Under these conditions, where the crowd begins to behave more dynamically, group behavior becomes more complex. Third, although the aggregate forecast is better than the typical individual’s forecast, it does not necessarily hold that it is better than the best individual’s forecast. Perhaps there is some polling firm, for instance, whose surveys are so accurate that it is better to use their polls and their polls alone rather than dilute them with numbers from their less-accurate peers.
“If you talk to a lot of investment managers,” Blodget told me, “the practical reality is they’re thinking about the next week, possibly the next month or quarter. There isn’t a time horizon; it’s how you’re doing now, relative to your competitors. You really only have ninety days to be right, and if you’re wrong within ninety days, your clients begin to fire you. You get shamed in the media, and your performance goes to hell. Fundamentals do not help you with that.”
It may be no coincidence that many of the successful investors profiled in Michael Lewis’s The Big Short, who made money betting against mortgage-backed securities and other bubbly investments of the late 2000s, were social misfits to one degree or another.
These statistics represent a potential complication for efficient-market hypothesis: when it’s not your own money on the line but someone else’s, your incentives may change. Under some circumstances, in fact, it may be quite rational for traders to take positions that lose money for their firms and their investors if it allows them to stay with the herd and reduces their chance of getting fired.70 There is significant theoretical and empirical evidence71 for herding behavior among mutual funds and other institutional investors.72 “The answer as to why bubbles form,” Blodget told me, “is that it’s in everybody’s interest to keep markets going up.”
The third claim—that water vapor will also increase along with gases like CO2, thereby enhancing the greenhouse effect—is modestly bolder. Water vapor, not CO2, is the largest contributor to the greenhouse effect.21 If there were an increase in CO2 alone, there would still be some warming, but not as much as has been observed to date or as much as scientists predict going forward. But a basic thermodynamic principle known as the Clausius–Clapeyron relation, which was proposed and proved in the nineteenth century, holds that the atmosphere can retain more water vapor at warmer temperatures. Thus, as CO2 and other long-lived greenhouse gases increase in concentration and warm the atmosphere, the amount of water vapor will increase as well, multiplying the effects of CO2 and enhancing warming.
First, Armstrong and Green contend that agreement among forecasters is not related to accuracy—and may reflect bias as much as anything else. “You don’t vote,” Armstrong told me. “That’s not the way science progresses.” Next, they say the complexity of the global warming problem makes forecasting a fool’s errand. “There’s been no case in history where we’ve had a complex thing with lots of variables and lots of uncertainty, where people have been able to make econometric models or any complex models work,” Armstrong told me. “The more complex you make the model the worse the forecast gets.” Finally, Armstrong and Green write that the forecasts do not adequately account for the uncertainty intrinsic to the global warming problem. In other words, they are potentially overconfident.
The improvements in weather forecasts are a result of two features of their discipline. First meteorologists get a lot of feedback—weather predictions play out daily, a reality check that helps keep them well-calibrated. This advantage is not available to climate forecasters and is one of the best reasons to be skeptical about their predictions, since they are made at scales that stretch out to as many as eighty or one hundred years in advance. Meteorologists also benefit, however, from a strong understanding of the physics of the weather system, which is governed by relatively simple and easily observable laws. Climate forecasters potentially have the same advantage. We can observe clouds and we have a pretty good idea of how they behave; the challenge is more in translating that into mathematical terms.
There are also periodic fluctuations that take hold at periods of a year to a decade at a time. One is dictated by what is called the ENSO cycle (the El Niño–Southern Oscillation). This cycle, which evolves over intervals of about three years at a time,57 is instigated by temperature shifts in the waters of the tropical Pacific. El Niño years, when the cycle is in full force, produce warmer weather in much of the Northern Hemisphere, and probably reduce hurricane activity in the Gulf of Mexico.58 La Niña years, when the Pacific is cool, do just the opposite. Beyond that, relatively little is understood about the ENSO cycle. Another such medium-term process is the solar cycle. The sun gives off slightly more and slightly less radiation over cycles that last for about eleven years on average. (This is often measured through sunspots, the presence of which correlate with higher levels of solar activity.) But these cycles are somewhat irregular: Solar Cycle 24, for instance, which was expected to produce a maximum of solar activity (and therefore warmer temperatures) in 2012 or 2013, turned out to be somewhat delayed.59 Occasionally, in fact, the sun can remain dormant for decades at a time; the Maunder Minimum, a period of about seventy years during the late seventeenth and early eighteenth centuries when there was very little sunspot activity, may have triggered cooler temperatures in Europe and North America.60 Finally, there are periodic interruptions from volcanoes, which blast sulfur—a gas that has an anti-greenhouse effect and tends to cool the planet— into the atmosphere. The eruption of Mount Pinatubo in 1991 reduced global temperatures by about 0.2°C for a period of two years, equivalent to a decade’s worth of greenhouse warming.
Uncertainty in forecasts is not necessarily a reason not to act—the Yale economist William Nordhaus has argued instead that it is precisely the uncertainty in climate forecasts that compels action,86 since the high-warming scenarios could be quite bad. Meanwhile, our government spends hundreds of billions toward economic stimulus programs, or initiates wars in the Middle East, under the pretense of what are probably far more speculative forecasts than are pertinent in climate science.
Schmidt received numerous calls from reporters asking him what October blizzards in New York implied about global warming. He told them he wasn’t sure; the models didn’t go into that kind of detail. But some of his colleagues were less cautious, and the more dramatic their claims, the more likely they were to be quoted in the newspaper.
The question of sulfur emissions, the basis for those global cooling forecasts in the 1970s, may help to explain why the IPCC’s 1990 forecast went awry and why the panel substantially lowered their range of temperature predictions in 1995. The Mount Pinatubo eruption in 1991 burped sulfur into the atmosphere, and its effects were consistent with climate models.90 But it nevertheless underscored that the interactions between different greenhouse gases can be challenging to model and can introduce error into the system. Sulfur emissions from manmade sources peaked in the early 1970s before declining91 (figure 12-8), partly because of policy like the Clean Air Act signed into law by President Nixon in 1970 to combat acid rain and air pollution. Some of the warming trend during the 1980s and 1990s probably reflected this decrease in sulfur, since SO2 emissions counteract the greenhouse effect. Since about 2000, however, sulfur emissions have increased again, largely as the result of increased industrial activity in China,92 which has little environmental regulation and a lot of dirty coal-fired power plants. Although the negative contribution of sulfur emissions on global warming is not as strong as the positive contribution from carbon—otherwise those global cooling theories might have proved to be true!—this may have provided for something of a brake on warming.
Is there a plausible hypothesis that explains why 2007 was warmer than 1987 or 1947 or 1907—other than through changes in atmospheric composition? One of the most tangible contributions of climate models, in fact, is that they find it impossible to replicate the current climate unless they account for the increased atmospheric concentration of CO2 and other greenhouse gases.
This book advises you to be wary of forecasters who say that the science is not very important to their jobs, or scientists who say that forecasting is not very important to their jobs! These activities are essentially and intimately related. A forecaster who says he doesn’t care about the science is like the cook who says he doesn’t care about food. What distinguishes science, and what makes a forecast scientific, is that it is concerned with the objective world. What makes forecasts fail is when our concern only extends as far as the method, maxim, or model.
Nevertheless, this book encourages readers to think carefully about the signal and the noise and to seek out forecasts that couch their predictions in percentage or probabilistic terms. They are a more honest representation of the limits of our predictive abilities. When a prediction about a complex phenomenon is expressed with a great deal of confidence, it may be a sign that the forecaster has not thought through the problem carefully, has overfit his statistical model, or is more interested in making a name for himself than in getting at the truth.
If you had come to a proper estimate of the uncertainty in near-term temperature patterns, the downward revision would not be terribly steep. As we found, there is about a 15 percent chance that there will be no net warming over a decade even if the global warming hypothesis is true because of the variability in the climate. Conversely, if temperature changes are purely random and unpredictable, the chance of a cooling decade would be 50 percent since an increase and a decrease in temperatures are equally likely. Under Bayes’s theorem (figure 12-12), a no-net-warming decade would cause you to revise downward your estimate of the global warming hypothesis’s likelihood to 85 percent from 95 percent. On the other hand, if you had asserted that there was just a 1 percent chance that temperatures would fail to increase over the decade, your theory is now in much worse shape because you are claiming that this was a more definitive test. Under Bayes’s theorem, the probability you would attach to the global warming hypothesis has now dropped to just 28 percent. When we advance more confident claims and they fail to come to fruition, this constitutes much more powerful evidence against our hypothesis. We can’t really blame anyone for losing faith in our forecasts when this occurs; they are making the correct inference under Bayesian logic.
I do not mean to suggest that the territory occupied by the two sides is symmetrical. In the scientific argument over global warming, the truth seems to be mostly on one side: the greenhouse effect almost certainly exists and will be exacerbated by manmade CO2 emissions. This is very likely to make the planet warmer. The impacts of this are uncertain, but are weighted toward unfavorable outcomes.
In politics, one is expected to give no quarter to his opponents. It is seen as a gaffe when one says something inconvenient—and true.113 Partisans are expected to show equal conviction about a set of beliefs on a range of economic, social, and foreign policy issues that have little intrinsic relation to one another. As far as approximations of the world go, the platforms of the Democratic and Republican parties are about as crude as it gets.
Imagine that you live in a seismically active area like California. Over a period of a couple of decades, you experience magnitude 4 earthquakes on a regular basis, magnitude 5 earthquakes perhaps a few times a year, and a handful of magnitude 6s. If you have a house that can withstand a magnitude 6 earthquake but not a magnitude 7, would it be right to conclude that you have nothing to worry about? Of course not. According to the power-law distribution that these earthquakes obey, those magnitude 5s and magnitude 6s would have been a sign that larger earthquakes were possible—inevitable, in fact, given enough time. The big one is coming, eventually. You ought to have been prepared. Terror attacks behave in something of the same way. The Lockerbie bombing and Oklahoma City were the equivalent of magnitude 7 earthquakes. While destructive enough on their own, they also implied the potential for something much worse—something like the September 11 attacks, which might be thought of as a magnitude 8. It was not an outlier but instead part of the broader mathematical pattern.
Bayes’s theorem requires us to state—explicitly—how likely we believe an event is to occur before we begin to weigh the evidence. It calls this estimate a prior belief. Where should our prior beliefs come from? Ideally, we would like to build on our past experience or even better the collective experience of society. This is one of the helpful roles that markets can play. Markets are certainly not perfect, but the vast majority of the time, collective judgment will be better than ours alone. Markets form a good starting point to weigh new evidence against, particularly if you have not invested much time in studying a problem. Of course, markets are not available in every case. It will often be necessary to pick something else as a default. Even common sense can serve as a Bayesian prior, a check against taking the output of a statistical model too credulously. (These models are approximations and often rather crude ones, even if they seem to promise mathematical precision.) Information becomes knowledge only when it’s placed in context. Without it, we have no way to differentiate the signal from the noise, and our search for the truth might be swamped by false positives. What isn’t acceptable under Bayes’s theorem is to pretend that you don’t have any prior beliefs. You should work to reduce your biases, but to say you have none is a sign that you have many. To state your beliefs up front—to say “Here’s where I’m coming from”12—is a way to operate in good faith and to recognize that you perceive reality through a subjective filter.