Is It Okay to Study Your Posts Without Asking?

A Researcher Visits Your Favourite Forum

Scraping data can feel like just collecting facts — but those facts came from real people.

Imagine you post a joke in a Minecraft forum. Later, you stumble across a science paper that quotes your exact words as part of a study. The researchers never asked you. They never told you they were watching. They simply grabbed your public comment along with thousands of others, because it was sitting out in the open. Is that fair? Or does something feel a little off?

This is the knotty question at the heart of Internet research ethics — the rules about how scientists should behave when they study people through online data. The internet has made it easy to collect huge amounts of information about real human beings, from their tweets to their search histories. But many of the old ethical guidelines were written before social media even existed. When is it okay to study people without asking? When does “public” really mean public? And who gets to decide?

Why We Have Rules for Studying People

Even the simplest recruitment poster must be approved by an ethics review board before it goes up.

Long before the internet, researchers sometimes hurt people in the name of science. One famous example is the Tuskegee study in the United States, where doctors lied to hundreds of African American men, telling them they were receiving free healthcare while secretly studying how a serious disease progressed without treatment. The study ran for forty years and became a shocking lesson in why research needs strong protections.

In response to such abuses, governments created ethical principles. The Belmont Report (1979) laid out three bedrock ideas for any research with human beings: respect for persons (people must choose to participate freely), beneficence (maximize good outcomes and minimize harm), and justice (benefits and risks should be shared fairly). These ideas drove the Common Rule, a set of U.S. regulations that requires most research on human subjects to be reviewed by an Institutional Review Board (IRB). An IRB checks a study’s plan to make sure it protects people’s privacy, obtains proper consent, and doesn’t cause unnecessary harm.

Crucially, the rules were designed for a world of paper forms, face-to-face interviews, and medical needles. They use concepts like private information — things a person reasonably expects are not being recorded and will not be made public. But what happens when your casual chat, your movie ratings, and your daily jogging route all live online?

The Internet Blurs the Lines

An open online forum can feel as public as a park, or as private as a locker room, depending on the moment.

Researchers have long talked about the internet as either a tool (something you use, like an online survey platform) or a venue (a place you enter, like a chat room or a game world). The distinction helped decide when human-subject protections kick in. A tool might be less personal; a venue is where social life happens.

But that line has nearly collapsed. Today, social media platforms are simultaneously a tool, a venue, and a massive data-collection machine all in one. When a researcher scrapes thousands of Reddit posts, is she studying a published text, like pages in a library — or is she watching people in their digital “living room” without knocking?

A key test is the legal definition of private information. Under the Common Rule, information counts as private only if a person can reasonably expect it isn’t routinely observed or isn’t publicly available. If your post is visible to anyone on the web, a researcher might argue you have no reasonable expectation of privacy. But is that actually how people experience social media? Many users don’t fully understand how their data flows, who can see it, or how it can be combined with other data. A study that scrapes public profiles might harvest personal details users believed were only visible to friends — because the platform’s privacy settings were confusing, or because the rules suddenly changed without notice.

Can They Just Take Your Posts?

Even a zip code, a movie ticket, and a few clicks can be puzzle pieces that lead straight back to you.

Traditional research ethics tries to protect you by removing personally identifiable information (PII) — your name, address, Social Security number. Strip those away, the thinking goes, and the data becomes anonymous and safe. The internet made that idea wobble.

In a famous 2002 study, a researcher showed that knowing just a person’s zip code, birthday, and gender was often enough to re-identify them from “anonymous” public records. Later experiments proved that sets of movie ratings, or seemingly random web search queries, could be used to identify specific people. An IP address — the unique number assigned to your device every time you connect — can be traced back to you or your household. Under the European Union’s General Data Protection Regulation (GDPR), an IP address is personal data. In the United States, the rules are less clear.

This means that even when a researcher scrubs names and emails from a dataset, the information isn’t truly anonymous if a motivated person (or a powerful algorithm) could link it back to a real individual. And the harm can be real: a scraped post taken out of context could damage someone’s reputation, cost them a job, or expose a vulnerable person to harassment. Because of this, some ethicists now argue that the old distinction between “identified” and “anonymous” data is breaking down. Any data that is even slightly useful often remains identifiable, at least to someone.

A terms-of-service page can be thousands of words long — and barely anyone reads a single line.

Informed consent is the cornerstone of ethical research. Before you join a study, the researcher explains what’s going on, what the risks are, and asks for your clear, voluntary agreement. On the internet, that process gets tangled.

First, can researchers even find you to ask? Many online spaces use pseudonyms; verifying a person’s real identity or age can be nearly impossible. Second, what if the study is observational and massive? One controversial experiment in 2014 secretly adjusted nearly 690,000 Facebook users’ news feeds to see if their emotions would shift. The researchers didn’t ask for consent. They relied on Facebook’s terms of service, a document more than 9,000 words long that barely mentioned “research” — and later it turned out the version in effect during the study didn’t mention it at all.

Even when a platform’s privacy policy does mention research, users are extremely unlikely to read and understand it. If a scientist scrapes your data because you clicked “I agree” years ago while signing up, did you really give meaningful consent? Many philosophers and legal scholars argue that true consent requires that you know what you’re agreeing to. That’s almost impossible when data collection is invisible, continuous, and designed to be ignored.

There’s also the problem of shared data. Governments and funding agencies increasingly require researchers to place their final datasets into public archives for other scientists to use. A data donor might later find their de-identified information in somebody else’s project, possibly used for a purpose they never imagined — all without a fresh consent conversation. The ethical questions multiply.

Why This Matters to You

You may never see the researcher, but your data can have a long life far beyond your screen.

Internet research isn’t just about faraway scientists. When a game company tests different versions of a feature to see which keeps you playing longer, or when a social platform studies how you scroll to improve its advertising, they are conducting research on you. Most companies have no formal ethics review. Their experiments can shape your moods, your friendships, and your beliefs — often without your knowledge.

The rules lag behind the technology. In a 2020 survey, fewer than a quarter of IRBs in the United States felt prepared to handle the ethical puzzles of “big data” research that involves millions of people’s online traces. Meanwhile, more and more of your life — your location, your heartbeat from a smartwatch, your voice commands — becomes a potential research subject.

Yet you aren’t powerless. The same principles that emerged after Tuskegee — respect, fairness, protection from harm — still matter. Figuring out how to apply them to a world of invisible data collection is one of the great ethical challenges of our time. The next time you post, message, or even just swipe, you might wonder: who else is watching, and what story are they telling with my life?

Think about it

If you post a funny video on a public platform and a researcher uses it in a study without telling you, is that fair? Why or why not?
A company studies your clicks to show you better ads. Is that different from a university studying your posts to understand teenage friendships? What’s the difference?
Should a new law require everyone doing internet research to get permission from each person whose data they collect — even if the data is already public? What might be the downsides?

Email

Is It Okay to Study Your Posts Without Asking?

A Researcher Visits Your Favourite Forum

Why We Have Rules for Studying People

The Internet Blurs the Lines

Can They Just Take Your Posts?

Why This Matters to You

Think about it

Is Google Telling You the Truth, or Just What You Want to Hear?

Why Do You Follow Rules Nobody Is Enforcing?

Do We Live Under Laws, or Under the People Who Make Them?

A Researcher Visits Your Favourite Forum

Why We Have Rules for Studying People

The Internet Blurs the Lines

Can They Just Take Your Posts?

Do You Need to Ask? The Puzzle of Consent

Why This Matters to You

Think about it

Keep exploring

Is Google Telling You the Truth, or Just What You Want to Hear?

Why Do You Follow Rules Nobody Is Enforcing?

Do We Live Under Laws, or Under the People Who Make Them?