Using Data Gathered from Social Media to Assess Public Health

Social media can be a very silly place. Teenagers gossiping. Twenty-somethings getting political. Thirty-somethings making needlessly paranoid posts in their neighborhood Facebook groups. Baby boomers posting lots and lots of pictures that are always inexplicably blurry….

We aren’t exactly talking about the Library of Alexandria here. Nevertheless, many researchers can use social media to get valuable insights. Here we look at how social media posts can be used to assess public health data.

The Situation

Data is just information. Right now, social media is one of the biggest depositories for information to ever exist. You’re right! My son just tweeted a picture of his Taco Bell order. Someone call the Department of Health.

Ha. Yes. We will get right on that. Of course, the premise isn’t that all social media posts contribute to researchers’ understanding of public health. More that it can be used to get a broad-strokes idea of the public mindset pertaining to a wide range of different health issues.

No, it’s probably not the best way to derive hard facts. It is, after all, a largely opinion-based platform and therefore highly colored by inaccurate or subjective statements.

Nevertheless, it’s a great way to take the temperature of an issue relating to public health. For example….


Researcher Linda (not a real person) wants to learn more about vaccine hesitancy in rural America. She’s nervous. She’s excited. She’s nervous….

She already has a data set provided by the local public health department. Only twenty-eight percent of people in Town Y (a real town! No, just kidding, also fake) have been fully vaccinated against Covid-19. Another twenty percent received the initial vaccine, but nevertheless, this town is one of the least vaccinated in the United States.

Halfway into flu season, Researcher Linda discovers that flu vaccine numbers have gone down considerably in Town Y as well. She wants to find out why.

Of course, she can go door to door, or hold studies at the local university. She will do both eventually. Unfortunately, however, these methods have their shortcomings. After all, would you open your door to a stranger wanting to talk about medicine?

Social media data can bridge some of that understanding gap. Using social media, Linda can find out what sort of language people are using to describe vaccines. What percent of people have expressed an intention to get the flu vaccine? What level of confidence does Town Y have in the public health system in general?

The research can’t begin and end here, but it gives her a great baseline. She can use these findings as a springboard to frame future research questions, and simply gain an understanding of Town Y that she never would have gotten just through conversation.

The Problems

Of course, social media comes with its issues. For one thing, people tend to say things there that don’t necessarily reflect the way they would behave in real life. Because the internet is anonymous it often triggers base behaviors.

This phenomenon is known as “online disinhibition.” Basically, people feel comfortable saying anything and everything that pops into their heads because they can do it safely behind a mask of anonymity.

Ok. They wouldn’t say it in real life. So what? They said it here. Data is data, right?

Eh. Not so much. “Trollish behavior” as it is often called online, is often less about expressing truth and more about generating a reaction. Some people simply like to be disruptive online, which can have a corrupting influence on data taking.

Then there is deliberate disinformation that takes place online. You hear about this all the time. Bots —which is to say, accounts that aren’t even used by real people — spreading incorrect information. It’s a form of cyber terrorism that has been used to disrupt elections, and yes, even spread bad information about public health events.

A really good algorithm will be able to control these influences to an extent, but they are certainly still there and must be taken into account when research is being done on social media.


Bots or no bots, social media is still a great way to gain insights into public health information that contributes to the broader wealth of biostatistics, as well as a wide range of other issues. Encyclopedia Brittanica probably isn’t going to start transcribing Mark Schmark’s tweets about the lakers, but researchers can use large swaths of social media posts as a way to find out more about what people think about a given topic.

It’s a valuable tool, especially when it is used with other research methods to take a closer look at an issue.