Update (May 2): We’re finally seeing more fair-minded commentators taking a balanced view of Reinhart’s and Rogoff’s research. After investigating data discrepancies for New Zealand (pointed out by reader Margaret below), Bloomberg’s Matthew C. Klein called out the three University of Massachusetts Amherst authors for not recognizing that the so-called data “omissions” were suspect data points as explained by Reinhart and Rogoff all along (h/t Marginal Revolution). ECONJEFF would have liked to have seen more “professional courtesy” and called for economists to establish “norms of good behavior” for replications/critiques. Also, John Mauldin applied his usual clear thinking to the topic in his weekly “Outside the Box” article. And blogger George Washington looks at the bigger picture in the austerity debate, noting Reinhart’s and Rogoff’s advocacy of debt write-downs in Europe (which would have meant less austerity in countries like Ireland and Portugal) and stressing that until governments stop saving “the big banks instead of the little guy” and other basic problems are fixed, “neither stimulus nor austerity can ever work” (h/t Angry Bear).
For those surfers who haven’t already lost interest in the spirited debate over a 2% calculation discrepancy in an historical average, please read on.
I’ll share some thoughts on the academic paper that was released last week by three University of Massachusetts Amherst scholars and the Internet frenzy that followed. My goal is to clear up a handful of fallacies that have taken hold in the media and blogosphere, while also throwing in my editorial comments and “scores” on each of the parties involved.
The three Massachusetts authors – Thomas Herndon, Michael Ash and Robert Pollin, now known on the Internet as “HAP” – argued that a heavily cited 2010 paper by Carmen Reinhart and Kenneth Rogoff (RR) contained fatal errors. In a critique that was released to the public last Tuesday, they revealed a calculation error in one of RR’s spreadsheets. And they also argued that RR omitted data points without justification and used an unconventional weighting method in their statistical averages. In response, RR acknowledged the calculation error but defended their data set and weighting methods.
This is more than an obscure academic debate only because RR’s conclusions are well-known in both academic and political circles. In their 2010 paper and a second paper released last year (with Vincent Reinhart), they suggested that economic growth tends to slow after government debt rises above 90% of GDP.
They’ve backed their thesis with calculations covering a few different country groupings and time periods, and also by summarizing the work of a handful of other researchers who’ve reached similar conclusions. It’s also clear that they truly believe that excessive government debt leads to lower growth – on a conceptual basis – as do many other people.
It’s always politics. Never personal.
But their beliefs don’t sit well with HAP, who argue that RR have too much influence over public policy decisions in both the U.S. and Europe. HAP suggest that policy could be less austere in both regions. They don’t like to hear politicians repeat RR’s warnings about the dangers of high debt, and they hope to discredit RR to bring an end to their perceived role in current policies.
HAP’s paper concludes with the statement: “RR’s findings have served as an intellectual bulwark in support of austerity politics. The fact that RR’s findings are wrong should therefore lead us to reassess the austerity agenda itself in both Europe and the United States.”
But HAP haven’t challenged the full breadth of RR’s thinking and research on fiscal policy matters, at least in the critique they issued last week. Instead, they focus on just one of the calculations in one of RR’s papers – arithmetic average returns for 1946 through 2009 in the first paper.
Here are the competing views on these average returns:
From this simple chart, pundits and bloggers launched a days-long game of “whisper down the lane.” We’ve been fed a succession of incomplete, exaggerated, misleading and erroneous reports, as I’ll explain in these six observations:
- For all the public focus on RR’s spreadsheet calculation error, it didn’t have a meaningful effect on their results. As reported by HAP in their paper (see page 7), it changed the arithmetic average in the >90% bucket on the right hand side of the chart by 0.3%. That’s pocket change. But the error’s insignificance was remarkably emphasized in only two of the many accounts I’ve read (kudos to Justin Fox of the Harvard Business Review and Brad Plumer of the Washington Post, with apologies to those I didn’t read). In a couple of the very earliest reports on the paper (see here and here), the authors eventually backtracked by adding a mix of clarifications, corrections and updates to their original posts, presumably after recognizing that they overstated the error’s significance. But they both left their prose written in a way that continued to emphasize it. And their later clarifications didn’t stop other commentators from reporting that the return differences shown in the chart are explained entirely by the error, which is simply untrue. Nor did they prevent sensationalistic titles such as these: How an Excel error fueled panic over the federal debt (LA Times), FAQ: Reinhart, Rogoff and the Excel Error That Changed History (BusinessWeek), Math in a Time of Excel: Economists’ Error Undermines Influential Paper (DailyFinance)
- Much of the reporting extended beyond the 2010 paper, leading readers to believe that HAP’s critique invalidates RR’s other work, including their 2009 bestseller, This Time is Different. An LA Times report, which was also picked up by RealClearPolitics, even claimed that RR “popularized” the 90% threshold in their book. In fact, the book did no such thing, nor did RR publish any similar results before 2010.
- The dispute centers on the slope and significance of the line in the chart above, and particularly the last segment leading to the 90% bucket, not whether it’s rising or falling. But that didn’t stop pundits from writing their accounts in ways that suggested disagreement about the line’s direction. Moreover, RR pointed out that they placed more emphasis on medians than averages in their discussions (which is entirely consistent with a reread of their papers), and the medians escaped HAP’s critique without comment (more on this below). The fact that HAP’s average return calculations yield similar results to RR’s medians has received almost no attention in the public discussion.
- RR never presented 90% as a magic number – where 89.9 is a clear, sunny day and 90.1 a class 5 hurricane – nor did they neglect to recognize that correlation is not causation. I’ve cited the paper on several occasions and never saw it in the way that their critics claim it was presented. Marginal Revolution – probably the most heavily trafficked economics blog – recently republished a 2010 post that likewise didn’t consider 90% to be either “sacred” or “stable.” I doubt that any of those who read RR carefully saw 90% as more than the upper limit on one of their buckets and a reasonable marker to use in their conclusions.
- The austerity push in Europe wasn’t triggered in any way, shape or form by RR’s research. It’s based on northern Europe’s struggle to limit the potential damage to their own economies from fiscal crises in the peripheral countries, as explained by OpenEurope here.
- Similarly, RR aren’t the puppet masters controlling Republican budget strategies in the U.S., notwithstanding Paul Ryan’s reference to their research, which was discussed by HAP in their paper and repeated many times in articles last week. I’m not aware of any public comments from Ryan on the matter, but it seems unlikely that we’ll wake up tomorrow and read all about his conversion to the “debt doesn’t matter” school based on HAP’s critique.
And now for the scorecard that I promised. I’ll start with RR:
-0.5 for an Excel error that should have been caught before publication. But this is a minor issue, as I pointed out above. I’ve reread the paper to check the effect, and the error didn’t change a single word. We all make mistakes, and this one wasn’t even a factor. It’s like the stumble that costs a distance runner a fraction of a second but doesn’t change his position in the race. I’ll say it again: It didn’t change a single word.
Dead-even on the debate over the weighting method. RR have a clear and logical defense for their approach, while HAP offered a reasonable criticism. This happens all the time in academia. People think and act differently, and they also approach research differently.
Dead-even on the data omissions raised by HAP. I have no reason to doubt RR’s defense that their data set wasn’t complete when they wrote the paper. I’ve used their data on several occasions and seen it evolve over time, with significant additions to their government defaults in 2011, for example. And it’s not easy to build such a large data set that you can use with confidence, let alone share with your peers as RR have done graciously. One of the disputed countries (New Zealand), in particular, didn’t surprise me in the least, since earlier this year I collected significantly different debt and growth figures from different sources. Unless you choose to be careless, these types of discrepancies take time to sort out.
-1 for the interactional effects of their various methods. Based on the mix of methods that RR chose, HAP pointed out that their average return for the >90% bucket assigned a 14% weight to a single year’s growth in New Zealand. The year happened to be 1951, when New Zealand’s economy contracted by 7.6%. This seems too much weight for such an extreme result and it would have been helpful to report its effect on the results. But it’s not the intellectual travesty that many have made it out to be. Empirical work is always vulnerable to outlying data. The important thing is not to make your methods perfect, which is impossible, but to recognize their limitations.
+10 for their contribution to their field. Yes, I’m biased in that I believe they’ve built the world’s most comprehensive history of the types of risks that are most threatening to us today. Their data set and book are tremendous accomplishments. And remember, they operate in the field of macroeconomics. If you were to review all of the published papers in this field for the last, say, 100 years, and weigh them up against real life events, the vast majority could be shown to have major shortcomings. Many have done real damage, leading policymakers to adopt views that are hopelessly disconnected from reality. It’s no exaggeration to say that the foundations of conventional macroeconomic theory have been discredited repeatedly in the last century. And it’s the papers that rely on unrealistic, abstract theories that should concern us most, not a 2% disagreement in an historical average. If you’re interested in an example of the type of paper that deserves to be stamped out, see this post from earlier this year. By comparison, HAP vs. RR is pretty ho-hum to me.
Here are my scores for HAP:
+2 for delivering a thoroughly researched critique on one aspect of RR’s paper.
-1 for the way it was done. I don’t know this for sure, but it seems as though RR gave HAP their data and spreadsheets and yet didn’t even receive an advance copy of the critique. Their initial response suggests they saw it for the first time last Tuesday afternoon, well after blogger Rortybomb had already read the critique, interviewed the authors, examined their spreadsheets and written his article. Because of this apparent ambush, many people formed their opinions without seeing both sides of the story.
-1 for failing to even acknowledge the majority of RR’s results on the empirical relationship between growth and debt. They had no comment whatsoever on the very first result cited in RR’s 2010 paper – the finding that the median return is about 1% lower when debt rises above 90% of GDP. And they also failed to comment on the first result cited in RR’s 2012 paper, which also referenced a return difference of about a percent. Instead, they focused exclusively on arithmetic averages over a single time period and calculated a revised return difference of, well, about 1%. In other words, we’re being asked to cross out RR’s 1% and replace it with the more “accurate” 1% that’s been reported by HAP (see chart below). Can someone please remind me what we’re arguing about?
Overall, HAP certainly offered some analysis for people to consider, while pointing out weaknesses in the 2010 paper, as should be expected of a critique. But they’ve just as certainly failed to disprove RR’s thesis that high debt tends to be associated with lower growth. (Note that I’ve not claimed causation, and nor did RR in their papers.)
And lastly, let’s assign a score to the pundits
In the meantime, pundits with a predisposition toward loose fiscal policy have launched a character assassination of remarkable force. One only needs to read a few of the more critical essays and comment threads to see RR subjected to a treatment normally reserved for crooks and felons.
And the most amazing thing about the past week may be how many people became instant experts on exactly how RR described their research to policymakers all over the world. I must have been the only one who missed the nightly Reinhart and Rogoff Hour on national tv. Since I follow their work and use their data more than most, it seems odd that I was the guy left out of the loop.
Which brings me to the scoring for the pundits who unleashed this frenzy. To me, their contribution isn’t so much a number but a smell. They’ve left a stench of hypocrisy and a strong whiff of political trickery, by using sensationalistic language and misrepresenting the real issues.
It’s easy to see why they sided with HAP – they’re philosophically opposed to RR’s policy advice.
And it’s easy to see why they emphasized the insignificant spreadsheet errors – no-one would have paid attention if they hadn’t.
But in the process of using the critique as an opportunity for political chest banging, they made clear errors in articles that were written to heap scorn on someone else’s errors. And they should own up to their mistakes exactly as RR did.