Jonathan Haidt and Robby Soave debate the impact of social media, some initial thoughts

In my little circle of the world, the debate between Jonathan Haidt and Robby Soave on the impact of social media has been making the rounds. The overall event was fascinating, but towards the end, a statistician pushes back against Haidt, which can be found here. The audience member says,

I think at the heart of what you said was that anxiety in first among young girls had increased, had tripled, and the analysis you had done had showed that there was a 0.20 correlation.

Continuing the audience member says,

I’m not sure that you understand that what you were saying was that four-fifths of that tripling was due to something else and that at best one-fifth of that tripling was due to involving social media if there was causation.

To end his comment/question, he reminded Haidt that correlation doesn’t equal causation.

Haidt responds in three ways that I find interesting. First, he agrees on the correlation and causation point. Second, Haidt concedes a bit about the implications of a 0.2 correlation. Third, and most importantly, he then talks about the dose-response for social media.

Before I found the paper and read it, I had three impressions.

Yes, obviously correlation doesn’t equal causation, but why correlation is even being brought up? Correlation is typically used as a measure of variance, and we care about variance only in specific cases, so why does correlation matter in this case?

Second, yes, the statistician is generally right. Correlation can also be understood as a measure of what is and isn’t explained. In this understanding of correlation, for example the R^2, a measure of 0.2 would mean that 0.8 of the variance isn’t accounted for.

Finally, it is interesting that Haidt shifted to dose-response because that’s the important part. Correlations aren’t what we should be concerned with. Indeed, we should be concerned with effect sizes.

The paper - “Underestimating digital media harm”

So I pulled the paper and read through it, “Underestimating digital media harm” by Jean M. Twenge , Jonathan Haidt, Thomas E. Joiner, and W. Keith Campbell. I am working my way through this literature and this is one of the papers I hadn’t yet read, so I am glad this occasion got me to dive into it.

Haidt’s paper opens, “Orben and Przybylski use a new and advanced statistical technique to run tens of thousands of analyses across three large datasets. The authors conclude that the association of screen time with wellbeing is negative but ‘too small to warrant policy change.'” They offer six responses.

A small interlude. I wrote about Orben and Przybylski when it came out. As I noted back in 2019,

What really sets apart the new paper from Amy Orben and Andy Przybylski is that it aims to capture a more complete picture of how variables interact. The problem that Orden and Przybylski tackle is endemic one in social science. Sussing out the causal relationship between two variables will always be confounded by other related variables in the dataset. So how do you choose the right combination of variables to test?

An analytical approach first developed by Simonsohn, Simmons and Nelson outlines a method for solving this problem. As Orben and Przybylski wrote, “Instead of reporting a handful of analyses in their paper, (researchers) report all results of all theoretically defensible analyses.” The result is a range of possible coefficients, which can then be plotted along a curve, a specification curve. Below is the specification curve from one of the datasets that Orben and Przybylski analyzed.

Amy Orben and Andrew Przybylski explained why their method is important to policy makers:

Although statistical significance is often used as an indicator that findings are practically significant, the paper moves beyond this surrogate to put its findings in a real-world context. In one dataset, for example, the negative effect of wearing glasses on adolescent well-being is significantly higher than that of social media use. Yet policymakers are currently not contemplating pumping billions into interventions that aim to decrease the use of glasses.

The goal of Orben and Przybylski (2019) was to get at effect sizes. And I see this as the big thing that Haidt isn’t discussing in the (2020) response. But maybe I am missing something major here., but correlations aren’t important, effect sizes are.

Still, Haidt and his co-authors criticize Orben and Przybylski (2019) over six points.

As they write, “The first issue is the consideration of only monotonic effects. Associations between digital media use and well-being are often non-monotonic; in fact, Przybylski himself named this the Goldilocks hypothesis. Associations often follow a J-shaped curve (see Extended Data Fig. 1).”

Screen Shot 2022-03-11 at 12.19.19 PM.png

I could be wrong, but I always thought that the point of monotonicity was that the function didn’t decrease at some point in its defined space. In the technical parlance, its first derivative isn’t negative at any point. So a J-shaped curve should be monotonic, but what makes a J-shape unique is that its slope changes. It can be differentiated, and that needs to be properly accounted for in the model.

Non-monotonic effects would mean that the function acts like an inverted-U. If social media effects were non-monotonic, then the dose response would turn back down at some point. The bad effects would increase over more usage and then go down when someone uses the tech all of the time. See this post for more.

“The second issue is the aggregation of data across screen time types and gender. The mental health crisis among adolescents that began after 2012 is hitting girls far harder than boys, in multiple countries. Thus, it is vital that researchers pay special attention to girls, and to the types of media that became more popular after 2012.”

Haidt presents good evidence that the impact of tech use seems to be more prevalent with teen girls than teen boys. See, for example, Figure 1. And yet the paper never tests it. A simple mean difference test and would show exactly that. Why not conduct a simple test? Maybe there is more that I am missing, but this should have been done.

Continuing Haidt and his colleagues write, “The third issue is the use of individual items. Orben and Przybylski’s effect sizes include many individual items, which are lower in internal reliability than multiple-item scales and thus produce lower effect sizes. In addition, scales with more items count more heavily in the analysis—not because they are more important, but because of the arbitrary fact of having more items.”

The fourth concern that they raise is a little eyebrow-raising and it seems that it is here that Haidt got this 0.20 correlation from.

“The fourth issue is missing measures. The Monitoring the Future dataset measures digital media use in two ways: (1) on a scale of ‘never’ to ‘almost every day’, which has very low variance, as the vast majority of teens now use digital media every day; and (2) in hours per week, which has sufficient variance. Surprisingly, Orben and Przybylski did not include the Monitoring the Future hours-per-week items on non-television digital media (social media, internet use, gaming, texting and video chat); they only included the low-variance items. The low-variance items produce substantially lower linear r values. For example, the r value for happiness and social media use on the low-variance item is −0.01, compared with −0.09 when measured in hours (Table 2 in ref. 9 ). Although Supplementary Fig. 5 in ref. 1 lists these hourly items, it does not report any statistics using them. In addition, Orben and Przybylski do not include the measure of self-harm behaviours included in the MCS.”

In other words, the paper takes umbrage with Orben and Przybylski because they are missing measures. They are missing measurements of depression. For psychologists like Twenge and Haidt as well as Orben and Przybylski, the measures are important because they seemingly track changes in depression over time.

They aim to prove that point by charting out all of the linear r values “between well-being and various factors in boys and girls from two datasets.”

Screen Shot 2022-03-11 at 1.05.49 PM.png

But this graph is confusing because it says that well-being is highly correlated with heroin use and social media. It also suggests that exercise for boys, heroin use in girls, and exercise in girls all have about the same correlation. Great, I guess. But why does this matter at all?

In some ways I am not able to parse this research because I am still thinking through measurement. A lot of this research relies upon an index to measure changes in wellbeing, explored in “Increases in Depressive Symptoms, Suicide-Related Outcomes, and Suicide Rates Among U.S. Adolescents After 2010 and Links to Increased New Media Screen Time.”

This index is constructed using six items from the Bentler Medical and Psychological Functioning Inventory depression scale, including “Life often seems meaningless,” “I enjoy life as much as anyone”, “The future often seems hopeless,” “I feel that I can’t do anything right,” “I feel that my life is not very useful,” and “It feels good to be alive.” As Twenge et al wrote of this index, “Response choices ranged from 1 (disagree) to 5 (agree). After recoding the two reverse-scored items, item-mean scores were computed (α = .86).”

To summarize, Twenge et al made an index and then tested on it. But it matters a lot how this index has changed over time. This is where I am starting because it is the bedrock. Everyone should be wary of testing on indices. For example, of the six items, what if two of the questions changed substantially over time? What if two of the six were the reason why depression got more prevalent? It would change the analysis completely.

This is why Orben and Przybylski analyzed each one of these survey responses in their (2019) paper. I could go on, but their work aims to get at effect sizes, and it shows that tech does have a negative impact, but it is a small one all things considered. To read the back and forth, start with Orben and Przybylski (2019), then this piece from Twenge, Haidt, et. al (2020) then Orben and Przybylski (2020).

Then think about this, the cherry on the pie.

Most social media research relies on self-reporting methods, which are systematically biased and often unreliable. Communication professor Michael Scharkow, for example, compared self-reports of Internet use with the computer log files, which show everything that a computer has done and when, and found that “survey data are only moderately correlated with log file data.” A quartet of psychology professors in the UK discovered that self-reported smartphone use and social media addiction scales face similar problems in that they don’t correctly capture reality. Patrick Markey, Professor and Director of the IR Laboratory at Villanova University, summarized the work, “the fear of smartphones and social media was built on a castle made of sand.”

So, um, none of the measures of Internet use are reliable.

Teen suicide rates

Still, the most alarming trends come in hospital visits for suspected suicide and suicides. Haidt’s biggest concern is that ER visits for suspected sucide is up this year for teen girls. But the CDC has cautioned about divining causes. In part, xxx.

So I pulled the data.

I am still munching on this, so please consider it a draft. It gave me a lot to think about and some ideas. And I am sorry for how long this response is. It takes a while to windup.

https://www.nature.com/articles/s41390-022-01952-w

As the CDC official release noted,

Among adolescents aged 12–17 years, the number of weekly ED visits for suspected suicide attempts decreased during spring 2020 compared with that during 2019 (Figure 1) (Table). ED visits for suspected suicide attempts subsequently increased for both sexes. Among adolescents aged 12–17 years, mean weekly number of ED visits for suspected suicide attempts were 22.3% higher during summer 2020 and 39.1% higher during winter 2021 than during the corresponding periods in 2019, with a more pronounced increase among females. During winter 2021, ED visits for suspected suicide attempts were 50.6% higher among females compared with the same period in 2019; among males, such ED visits increased 3.7%. Among adolescents aged 12–17 years, the rate of ED visits for suspected suicide attempts also increased as the pandemic progressed (Supplementary Figure 1, https://stacks.cdc.gov/view/cdc/106695). Compared with the rate during the corresponding period in 2019, the rate of ED visits for suspected suicide attempts was 2.4 times as high during spring 2020, 1.7 times as high during summer 2020, and 2.1 times as high during winter 2021 (Table). This increase was driven largely by suspected suicide attempt visits among females.

Among men and women aged 18–25 years, a 16.8% drop in the number of ED visits for suspected suicide attempts occurred during spring 2020 compared with the 2019 reference period (Figure 2) (Table). Although ED visits for suspected suicide attempts subsequently increased, they remained consistent with 2019 counts (Figure 2). However, the ED visit rate for suspected suicide attempts among adults aged 18–25 years was higher throughout the pandemic compared with that during 2019 (Supplementary Figure 2, https://stacks.cdc.gov/view/cdc/106696). Compared with the rate in 2019, the rate was 1.6 times as high during spring 2020, 1.1 times as high during summer 2020, and 1.3 times as high during winter 2021 (Table).

https://www.nytimes.com/2020/11/12/health/covid-teenagers-mental-health.html

Relatively little research has focused on children and young people (CYP) whose mental health and wellbeing improved during Covid-19 lockdown measures, but about 1/3 of those in the UK surveyed did better. A deep read: https://buff.ly/377WMXo

Extra resources

“Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.” (link) Luke Stein’s Stanford graduate economics core. Always start here. It’s 65 pages of clean stats review.
“People often ask me how social media and the internet contribute to teenagers’ risk of suicide. The teens we spoke with rarely discussed them alone as a trigger for their suicidal thoughts. However, for already vulnerable adolescents, technology can provide a forum for more trauma, worsening conflict or isolation. Further, having easy access to information on the internet about how to engage in self-harm can be dangerous for teens with mental health concerns.” (link)
Jonathan Haidt and Jean Twenge put together a useful Google Doc summarising the available evidence.

First published Mar 11, 2022