Stuart Buck: Are scholars or journalists more to blame when correlation and causation are confused?

Stuart_BuckNews stories about everything from nutrition to epidemiology to family behavior often confuse correlation with causation. Drink coffee, we are told, and you will lower your risk of dying (or perhaps raise it, depending on the week). Get married, and you will have stronger bones.

Sophisticated news consumers in the know understand that it’s best to discount such stories, which do not report on randomized experiments or any other statistical model that could show causation. The articles are invariably about correlations—akin to demonstrating that sunburn goes up along with accidental drowning, which is true not because either one causes the other, but because both occur in the summer.

Why are these stories so common, though? In many cases, the problem is that journalists overemphasize the possibility of causation, while failing to mention any disclaimers that the scholarly authors may have tried to highlight. Yet it is not always the journalists’ fault. A recent controversy shows that some social scientists do not even seem to understand that they mentioned causation in the first place.

The study in question came from the August issue of Demography. On the basis of a nationally representative longitudinal survey of American elderly adults, the authors claimed that, even after controlling for parents’ own education level and income, having children who graduated from college—as compared to dropping out of high school—raised the parents’ lifespan from 69 to 71 years.

Unsurprisingly, journalists from the Wall Street Journal to the Washington Post wrote articles claiming that sending your children to college will make you live longer. In turn, an economist writing in the New York Times and my own article in Slate strongly criticized the scholarly article for confusing correlation and causation. As I pointed out, the authors claimed that adult children who go to college could increase their parents’ lifespan by having “more flexible jobs” or more “comfort with the Internet.” But a far more likely possibility is that unobserved factors like parental motivation are causing both college attendance and a longer life.

The authors wrote a curious response to these criticisms on the Rand blog. They now say, “Correlation does not prove causation—yes, we know this. Nowhere in our article do we assert that the relationship we find is causal. On the contrary, we discuss this limitation at length in our work.”

Nowhere? The article is full of causal statements. Here is a partial list from the “Discussion” section alone (emphasis mine):

• “We show that later in life, adult offspring become critical for ensuring the health and survival of their parents.”
• “This research isolates a mechanism through which differences in health and mortality come about, to wit: the differential educational attainments of offspring.”
• “Our results suggest that in the United States, parents benefit from having more educated offspring—a benefit that extends beyond the effects of parents’ own SES [socioeconomic status].”
• “This work shows that another way to influence the health of the elderly is through their offspring.”
• “Policies targeting one generation of the family may set in motion a series of reactions that lead to improved health for others in previous generations.”
• “Improving offspring’s lives may benefit not only the offspring themselves over their lifetimes but their parents as well.”

All of these statements directly point to causality: if A “benefit[s]” or “lead[s] to” or “influence[s]” or “improve[s]” or is “critical for ensuring” B, that means that A is playing a causal role as to B. Some of these statements are hedged with words like “suggest” or “may,” but others are not. In either case, the authors repeatedly attribute a causal role to children in affecting their parents’ lifespan.

To be fair, the authors do concede that “finding such a relationship does not prove a causal interaction (for instance, unobserved personality traits might be associated with parental mortality, parental health behaviors, and offspring’s schooling).” But this brief concession is undermined by all the claims of causation elsewhere. Indeed, on the article’s final page, the authors briefly acknowledge the “possible endogeneity of offspring’s schooling with parental mortality,” and if they wished to avoid claims of causality, that would have been the place to admit that their analysis cannot show that offspring actually improve their parents’ survival.

But they did the exact opposite: the same paragraph goes on to say that “we choose to examine the causal nature of this relationship more descriptively,” thereby showing that “one way highly educated offspring improve parents’ survival chances is by improving their parents’ health behaviors.” In other words, even after acknowledging an endogeneity concern that should have been fatal, the authors still claim to have found that offspring “improve”—an undeniably causal term—both their parents’ behavior and lifespan.

A further wrinkle is that, in the authors’ response, they say that the “main novelty of our article was to advance the idea that a possible mechanism that underpins differences in mortality among older persons is the resources and behaviors of their adult offspring.”

But their method of isolating a mechanism seems wrong. The authors say in their article that they want to find out “whether offspring improve [note that this is causal language] their parents’ health by changing their health behaviors” via a plausible mechanism or mediator, such as less parental smoking or more parental exercise. After all, if offspring schooling is connected to mortality only through parents’ accidental deaths or something else that offspring do not generally affect, then there would be no causal mechanism between offspring schooling and parental survival.

The authors then look first at whether offspring schooling directly predicts parental behaviors like smoking or exercise. Having found that it does (this is still sheer correlation, by the way), they return to their Cox mortality model and start adding control variables for parental behavior, like “ever smoked,” or smokes now, or exercises. They say that adding in these parental behaviors lowers the coefficients on offspring education. Thus, “the hypothesis that health behaviors are part of how offspring’s schooling is translated into survival gains [causal language yet again] for parents is supported by these findings.”

Given that “ever smoked” is also a parental behavior that changes the coefficient on offspring schooling, then one of two things must be true:
1) Children who graduate from college are somehow going back in time and preventing their parents from “ever” smoking in the past; or,
2) The authors’ way of trying to show causation is irremediably flawed, as it implies the impossible.

In other words, the authors’ attempt to demonstrate a mechanism between children’s schooling and parental lifespan does not make sense—it merely amounts to showing that children’s schooling is correlated with other things about parents, including the way the parents behaved before the children even existed. This provides further evidence that it is actually parents’ characteristics that affect their children’s education, rather than vice versa.

The broader lesson here is that when scholars do not acknowledge how often they have used causal language, journalists need to be ever more skeptical readers of research. In turn, if journalists fail to question scholars’ assertions of causation, readers might be better off ignoring a good deal of science journalism altogether. Better science and better science journalism are what society needs in order to make informed decisions.

Stuart Buck is the vice president of research integrity at the Laura and John Arnold Foundation, and manages an initiative dedicated to improving science research. He is also on the board of the new Center for Healthcare Transparency.

Competing interests: The author has no relevant competing interests to declare.