Revisiting the Reinhart-Rogoff Kerfuffle and the Consequences of High Government Debt

Last September, the Centre for Independent Studies (CIS) asked us for an article updating our commentary on the Reinhart-Rogoff controversy, while combining it with one of our government debt studies. The article appeared in the latest issue of the organization’s POLICY Magazine, which hit the newsstands in Australia last month (the CIS is based in New South Wales). Kudos to the CIS and editor Stephen Kirchner for excellent editing. Here’s the first part of the article; we’ll post the second tomorrow.

The Consequences of High Government Debt: Reinhart and Rogoff Versus Pundits (Part 1)

By the onset of the 2008–09 global financial crisis, anyone with a passing interest in the consequences of excessive debt was familiar with economists Carmen Reinhart and Kenneth Rogoff.

Reinhart and Rogoff (I’ll call them ‘RR’) had spent years studying debt crises, and few economists were as qualified as RR to interpret events as they unfolded. They accurately predicted that the hangover from the crisis would be long and painful—and that public debt would increase rapidly to offset any contraction on the private side.

The second prediction, in particular, led to further research on government debt. In a 2010 paper, titled ‘Growth in a Time of Debt,’ RR suggested that economic growth tends to be unusually low after government debt rises above 90% of GDP.^{^[i]} They confirmed this result in a second paper in 2012, which dug deeper into the growth-debt relationship.[ii] The 90% result soon became RR’s best-known work, familiar to policymakers throughout the world.

RR in the crosshairs

But fame has its drawbacks, as RR learned the hard way. They were a prime target for populist economists who prefer to downplay the risks of excessive government borrowing. Even though their conclusions were consistent with the findings of other researchers, RR were the best known of the bunch and most clearly in the crosshairs.

Relatively unknown research team sparks media frenzy

Enter three University of Massachusetts scholars: Thomas Herndon, Michael Ash, and Robert Pollin, who quickly became known as ‘HAP.’ In April, HAP released a harsh critique of RR’s 2010 paper, arguing that it contained fatal errors.^{^[iii]} They revealed an embarrassing calculation error in one of RR’s spreadsheets, which they examined as part of their critique. They also argued that RR omitted data points without justification and used an unconventional weighting method in their statistical averages. In response, RR acknowledged the calculation error but defended their dataset and weighting methods.^{^[iv]}

The academic dispute quickly went viral, with heavy coverage by bloggers, newspapers and even The Colbert Report. But instead of a measured, balanced assessment of the perspectives of two teams of academics, we saw what happens when a politically charged research debate lands in the laps of pundits with preconceived ideas about what the research should say. Media reports were filled with misinformation.

My first goal here is to straighten out several fallacies that took hold. I’ll also add editorial comments and ‘scores’ on each of the parties involved, as well as observations on the poor quality of the contributions from much of the punditocracy.

The second goal is to share an example of the value of RR’s extensive government debt database. I’ve been using this data in my own research for a few years, and recently published a study exploring the outcomes of 63 high government debt episodes. The conclusions challenge conventional thinking about the implications of rising public debt.

It’s always politics—Never personal

Before starting with the fallacies in the RR-HAP controversy, it’s important to recognise the principal combatants’ biases. First, it’s clear that RR truly believe that excessive government debt leads to lower growth—on a conceptual basis—as do many other people. Second, these beliefs don’t sit well with HAP, who argue that RR have too much influence over public policy decisions in the United States and Europe.

HAP suggest that policy could be less austere in both regions. They don’t like to hear politicians repeat RR’s warnings about the dangers of high debt, and hoped to discredit RR and end RR’s perceived role in current policies. HAP’s paper concludes with the statement:

RR’s findings have served as an intellectual bulwark in support of austerity politics. The fact that RR’s findings are wrong should therefore lead us to reassess the austerity agenda itself in both Europe and the United States.

But HAP didn’t challenge the full breadth of RR’s thinking and research. Instead, they focused on one calculation in RR’s first paper on debt and growth—arithmetic average economic growth rates from 1946 to 2009. Figure 1 shows the competing views on these average growth rates.

From this simple chart, pundits launched a giant game of ‘whisper down the lane.’ We were fed a succession of incomplete, exaggerated, misleading and erroneous reports, as explained in these eight observations:

For all the public focus on RR’s calculation error, it didn’t have a meaningful effect on their results. As reported by HAP in their paper (p. 7), it changed the arithmetic average in the >90% bucket on the right hand side of the chart by 0.3%. That’s pocket change. But the error’s insignificance was emphasised in only two of the many early accounts I read (by Justin Fox of the Harvard Business Review and Brad Plumer of the Washington Post).^{^[v]} In a couple of the very earliest reports on HAP’s paper, pundits eventually backtracked by adding a mix of clarifications, corrections and updates to their original posts, presumably after recognising they overstated the error’s significance.^{^[vi]} But both left their prose written in a way that continued to emphasise it. And their later clarifications didn’t stop other commentators from reporting that the growth differences shown were explained entirely by the error, which is untrue. Nor did they prevent sensational titles such as ‘How an Excel error fueled panic over the federal debt’ (LA Times), ‘FAQ: Reinhart, Rogoff and the Excel error that changed history’ (Bloomberg BusinessWeek), ‘Math in a time of Excel: Economists’ error undermines influential paper’ (DailyFinance).^{^[vii]}
Much of the reporting extended beyond the 2010 paper, leading readers to believe that HAP’s critique invalidates RR’s other work, including their 2009 bestseller, This Time is Different.^{^[viii]} An LA Times report even claimed that RR ‘popularized’ the 90% threshold in their book. In fact, the book did no such thing, nor did RR publish any similar results before 2010.
The dispute centres on the slope and significance of the line in Figure 1, particularly the last segment leading to the 90% bucket, not whether it’s rising or falling. But that didn’t stop pundits from writing their accounts in ways that suggested disagreement about the line’s direction. Moreover, RR pointed out that they placed more emphasis on medians than averages (which is entirely consistent with a review of their work), and the medians escaped HAP’s critique without comment (more on this below). The fact that HAP’s average economic growth calculations yielded similar results to RR’s medians received almost no attention in the public discussion.
Despite RR’s data being posted on their websites for public access, pundits outrageously claimed that it wasn’t made available.[ix] It’s not clear why they got this so wrong, but the accusation became a part of many reports, just like the other falsehoods. In late May, RR finally posted screenshots from the ‘WayBack Machine,’ an independent site that stores whole web pages from the past, to prove that their data was accessible as far back as October 2010.[x] Unfortunately, it was too late to sway many of those who had read about RR’s alleged secrecy, formed their conclusions, and moved on.
Contrary to claims by HAP, the austerity push in Europe wasn’t triggered in any way, shape or form by RR’s research. It’s based on northern Europe’s struggle to limit the potential damage to its own economies from fiscal crises in the peripheral countries in the context of the European Monetary Union.^{^[xi]} In other words, it’s largely a matter of regional politics. What’s more, to the extent that policymakers even noticed RR’s advice, they would have heard a message of caution about austerity. The public record shows quite clearly that RR was opposed to policies of ‘withdrawing fiscal stimulus too quickly,’ choosing instead to emphasise the critical importance of structural reforms, and in some cases, debt write-downs.[xii]
While HAP and many others made a fuss about RR’s alleged influence over the media, with much complaining about a particular Washington Post editorial that referenced the 90% threshold, this part of their story was thankfully refuted by, well, the Washington Post. Weighing in on the dispute, the ‘WaPo’ editors noted that it was ‘preposterous’ to blame RR for global austerity, and that RR’s hold on their own thinking was ‘rather overstated in some quarters.’[xiii]
Similarly, RR aren’t puppet-masters controlling Republican budget strategies in the United States, notwithstanding Paul Ryan’s reference to their research, which was discussed by HAP in their paper and repeated many times by RR’s critics. I’m not aware of any public comments from Ryan on the matter, but it seems unlikely that we’ll wake up one day and read about his conversion to the ‘debt doesn’t matter’ school based on HAP’s critique.
Finally, RR never presented 90% as a magic number—where 89.9 is a clear, sunny day and 90.1 a class 5 hurricane—nor did they neglect to recognise that correlation is not causation. The 90% threshold is similar to the 200 mg/dL cholesterol level that the American Heart Association (AHA) warns will ‘raise your risk’ of heart disease; neither figure implies a sharp drop-off or ‘cliff’ at the exact threshold point. As an example of a correct interpretation of RR’s research, Tyler Cowen of Marginal Revolution—one of the most heavily trafficked economics blogs—wrote in 2010 that 90% wasn’t ‘sacred’ or ‘stable.’^{^[xiv]} I always saw it as merely the upper limit on one of RR’s buckets and a reasonable marker to use in conclusions. Such markers are needed to make sense of complicated risks. And yet, anti-RR pundits suggest that it’s bad research to attempt an answer to the question: ‘At what point does debt become a problem?’ This is just as illogical as a slam on the AHA for its advice that we should lay off the fats if our cholesterol rises above 200.

Keeping score

And now for the scorecard I promised. I’ll start with RR.

-0.5 for an Excel error that should have been caught before publication. But this is a minor issue, as I pointed out above. I reread the paper to check the effect, and the error didn’t change a single word. We all make mistakes, and this one wasn’t even a factor. It’s like the stumble that costs a distance runner a fraction of a second but doesn’t change his position in the race. I repeat: It didn’t change a single word.

No score on the debate over the weighting method. RR have a clear and logical defence for their approach, while HAP offered a reasonable criticism. This happens all the time in academia. People think and act differently, and they also approach research differently.

No score on HAP’s accusation that RR selectively omitted certain data points. I have no reason to doubt RR’s defence that their dataset wasn’t complete when they wrote the paper. I’ve used their data on several occasions and seen it evolve, with significant additions to their government defaults in 2011, for example. And it takes time to build such a large dataset that you can use with confidence, let alone share with your peers as RR have done graciously.

-1 for the interactional effects of their various methods. Based on the mix of methods that RR chose, HAP pointed out that the average growth rate for RR’s >90% bucket assigned a 14% weight to a single year’s growth in New Zealand. The year happened to be 1951, when New Zealand’s economy reportedly (but not correctly—see HAP’s scores below) contracted by 7.6%. This seems too much weight for such an extreme result and it would have been helpful for RR to highlight its effect. But it’s hardly the intellectual travesty HAP made it out to be. Empirical work is always vulnerable to outliers in the data. The important thing is not to make your methods perfect, which is impossible, but to recognise their limitations.

+10 for their contribution to their field. Yes, I’m biased in that I believe RR have built the world’s most comprehensive history of the types of risks that are most threatening to us today. Their dataset and book are tremendous accomplishments. And remember, they operate in the field of macroeconomics. If you were to review all the published papers in this field for the last, say, 100 years, and weigh them against real life events, the vast majority could be shown to have major shortcomings. Many have done real damage, leading policymakers to adopt views that are hopelessly disconnected from reality. It’s no exaggeration to say the foundations of conventional macroeconomic theory have been discredited repeatedly in the last century. And of most concern are the papers that rely on unrealistic, abstract theories, not a 2% disagreement in a historical average. By comparison, HAP versus RR is ho-hum.

Here are my scores for HAP:

+2 for delivering a helpful critique on one aspect of RR’s paper, with a comprehensive collection of charts that clearly illustrates the historical results.

-2 for the way it was done. Reports from both sides suggest that RR gave their spreadsheets to HAP but didn’t even receive an advance copy of the critique. Before RR knew of the analysis, blogger Rortybomb had already read HAP’s critique, interviewed the authors, examined their spreadsheets, and written the first article to hit the blogosphere, triggering an avalanche of coverage on financial and political sites. Because of this ambush, many people formed their opinions without seeing both sides of the story.

-1 for the analysis of interactional effects. While these effects were noteworthy, it turns out that HAP got them wrong. As Reinhart disclosed on her website, she discovered that the 1951 New Zealand GDP data in RR’s initial dataset (they had turned to other sources by the time of their 2012 paper) was incorrect, thanks to an error in a third-party database that’s heavily used and highly regarded by economists.[xv] HAP then compounded the error by adding New Zealand data for 1946 to 1950, which was also incorrect.[xvi] That New Zealand featured so prominently isn’t surprising; I too had dropped the country from unrelated research published in March 2013 because I hadn’t sorted out discrepancies in data obtained from different sources.[xvii] Considering HAP’s vehemence in attacking RR’s data choices, HAP should have investigated these choices more thoroughly before publishing their critique.

-3 for failing to acknowledge the most important of RR’s results on the empirical relationship between growth and debt. HAP had no comment whatsoever on the very first result cited in RR’s 2010 paper—the finding that the median growth rate is about 1% lower when debt rises above 90% of GDP. And HAP’s also failed to comment on the first result cited in RR’s 2012 paper, which also referenced a growth difference of about 1%. Based on RR’s papers and interviews, it should be no surprise that pre-2013 accounts of their research highlighted the 1% difference, as John Mauldin and Jonathan Tepper do in their 2011 book, Endgame: ‘Rogoff and Reinhart show that when the ratio of debt to GDP rises above 90 percent, there seems to be a reduction of about 1 percent in GDP.’[xviii] But HAP chose to focus exclusively on arithmetic averages over a single time period and calculated a revised difference of, well, about 1%. In other words, they asked us to cross out RR’s 1% and replace it with their more ‘accurate’ 1% (see Figure 2). So what exactly was the difference we were arguing about?

Overall, HAP certainly offered some analysis for consideration, while pointing out weaknesses in the 2010 paper, as is expected in a critique. But they just as certainly failed to disprove RR’s thesis that high debt tends to be associated with lower growth.

Assigning a score to the pundits

In the meantime, pundits inclined towards loose fiscal policy launched a character assassination of remarkable force. One only needed to read a few of the more critical essays and comment threads to see RR subjected to a treatment normally reserved for crooks and felons.

Most remarkable about this episode is how everyone became instant experts on exactly how RR described their research to policymakers all over the world. I must have been the only one who missed the nightly Reinhart and Rogoff Hour on national television.

Which brings me to the scoring for the pundits who unleashed the frenzy. Their contribution isn’t so much a number but an odour. They left a stench of hypocrisy and a strong whiff of political trickery by using sensational language and misrepresenting the real issues.

It’s easy to see why they sided with HAP—the pundits are philosophically opposed to any research suggesting that high government debt can have unwanted consequences. Moreover, they emphasised the insignificant spreadsheet errors because no one would have otherwise paid attention to their rhetoric.

In using HAP’s critique as an opportunity for political chest banging, pundits themselves made clear errors in articles written to heap scorn on someone else’s errors. Add the many critical points they failed to mention, and much of the Reinhart-Rogoff reporting amounted to nothing more than a witch-hunt.

Endnotes

[i] Carmen M. Reinhart and Kenneth S. Rogoff, ‘Growth in a Time of Debt,’ American Economic Review 100 (2010), 573–578.

[ii] Carmen M. Reinhart, Vincent R. Reinhart, and Kenneth S. Rogoff, ‘Debt Overhangs: Past and Present,’ Journal of Economic Perspectives (forthcoming).

[iii] Thomas Herndon, Michael Ash, and Robert Pollin, ‘Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff,’ Working Paper Series 322 (Political Economy Research Institute, University of Massachusetts, Amherst: April 2013).

[iv] See Chris Cook, ‘Reinhart-Rogoff recrunch the numbers,’ ft.com blog, Financial Times (17 April 2013); Carmen M. Reinhart, ‘Response to critics about content,’ author website (26 April 2013).

[v] Justin Fox, ‘Reinhart, Rogoff, and how the economic sausage is made,’ HBR blog, Harvard Business Review (17 April 2013); Brad Plumer, ‘Is the evidence for austerity based on an Excel spreadsheet error?’ Wonkblog, Washington Post (16 April 2013).

[vi] Mike Konczal, ‘Researchers finally replicated Reinhart-Rogoff, and there are serious problems,’ Next New Deal Rortybomb blog (Roosevelt Institute, 16 April 2013); Matthew Yglesias, ‘Is the Reinhart-Rogoff result based on a simple spreadsheet error?’ Moneybox blog, Slate (16 April 2013).

[vii] Michael Hiltzik, ‘How an Excel error fueled panic over the federal debt,’ LA Times (16 April 2013); Peter Coy, ‘FAQ: Reinhart, Rogoff and the Excel error that changed history,’ Bloomberg BusinessWeek (18 April 2013); Eamon Murphy, ‘Math in a time of Excel: Economists’ error undermines influential paper,’ Daily Finance blog (19 April 2013).

[viii] Carmen M. Reinhart and Kenneth S. Rogoff, This Time is Different: Eight Centuries of Financial Folly (Princeton, NJ: Princeton University Press, 2009).

[ix] See, for example, Paul Krugman, ‘How the case for austerity has crumbled,’ The New York Review of Books (6 June 2013).

[x] Carmen M. Reinhart and Kenneth S. Rogoff, ‘Letter to PK,’ Carmen M. Reinhart’s website (25 May 2013); F.F. Wiley, ‘It’s time to change focus from Reinhart-Rogoff witch hunts to Krugman’s contradictions,’ Cyniconomics.com (28 May 2013).

[xi] See, for example, ‘Is the academic premise for austerity in the Eurozone crumbling? Not quite …’ Open Europe blog (17 April 2013).

[xii] Carmen Reinhart and Kenneth Rogoff, ‘The Reinhart and Rogoff response to critics about intent: Debt, growth, and reality,’ Carmen M. Reinhart’s website (30 April 2013); ‘Reinhart and Rogoff: Selected interviews, op-eds and media on the policy response to crisis,’ Carmen M. Reinhart’s website.

[xiii] Editorial, ‘Rogoff-Reinhart error is not the source of global “austerity”,’ The Washington Post (21 April 2013).

[xiv] Tyler Cowen, ‘What’s the critical debt-GDP ratio,’ Marginal Revolution blog (23 July 2010).

[xv] ‘A note on the New Zealand data,’ Carmen M. Reinhart’s website (27 April 2013).

[xvi] See, for example, Matthew C. Klein, ‘Reinhart and Rogoff were right about New Zealand,’ Bloomberg View (30 April 2013).

[xvii] F.F. Wiley, ‘Answering the most important question in today’s economy,’ Cyniconomics.com (20 March 2013).

[xviii] John Maudlin and Jonathan Tepper, Endgame: The End of the Debt Supercycle and How It Changes Everything (Hoboken, NJ: John Wiley, 2011), 113.