Chapter 16: Social media research

Danielle Berkovic

Learning outcomes

Upon completion of this chapter, you should be able to:

  • Understand the three different typologies of social media research.
  • Identify the steps and key considerations in undertaking social media research.
  • Articulate the ethical and legal challenges associated with social media research.


What is social media?

Social media are online platforms that allow individuals, communities and organisations to collaborate, connect, interact and build community by enabling them to create, co-create, modify, share and engage with user-generated content that is easily accessible.1 Examples of social media include Facebook, Twitter (now known as X), Instagram, Pinterest, LinkedIn and Reddit. The global scale and use of social media are huge: 57 per cent of the world’s population uses social media (4.48 billion people) and 99 per cent of these users access social media by smartphone.2 Social media connectivity is growing: from 2020–2021, 520 million more people joined at least one social media platform.3 Global populations using social media represent a large and as yet mostly untapped source from which researchers can download, analyse and interpret data on some of the world’s most pressing research agendas.

Types of social media studies

There are no specific study designs for social media research, but researchers have developed three typologies: extant, elicited and enacted.4 The three typologies are described in more detail below, and Table 16.1 provides examples of social media research based on each typology.

  • Extant: Social media research that is extant seeks to use existing data through unobtrusive observation. The key idea is to observe, so there is no direct contact between participants and the researcher. Example data types include existing materials such as blog posts, tweets, or Instagram photos.4
  • Elicited: Social media research that is elicited seeks to use data from participants in response to the researcher. The key idea is to interact, so there is interaction between >1 consenting participant and the researcher. Example data types include responding to a researcher-initiated blog post, tweet, or Instagram photo.4
  • Enacted: Social media research that is enacted seeks to use data generated with participants during the study. The key idea is to co-develop, so there is collaboration between >1 participant and the researcher. Example data types include creative interaction, vignettes, problem-centred or scenario interviews using video or text, chat or messaging features. Of note, enacted studies are more likely to be mixed methods in design.4

How to undertake a social media study

Given the variability in social media platforms and the type of data that can be collected and analysed, there are as yet no standard guidelines as to how to conduct a social media study. A mixed-methods framework designed by Andreotta et al5 enables the collection and analysis of social media data.

Phase 1: Harvest social media and compile a corpus

Researchers can use automated tools to find social media data, extract this data and compile it into a corpus (a collection of written texts). Researchers might search for content posted in a particular time frame, containing specific content or posted by users with certain characteristics. This all depends on the research question and its aims and objectives.

Phase 2: Use data science techniques to compress the corpus along a dimension of relevance

Although researchers may want to examine the entire dataset, it is often more practical to focus on a subset of the data. Researchers can use data science techniques (e.g. algorithms, topic modelling) or manual qualitative methods (e.g. by narrowing the search strategy) to identify a representative subset of data to analyse.

Phase 3: Extract a subset of data from the most relevant spaces of the corpus

Once the data from phase 2 has been identified, researchers can extract the data most relevant for answering the research question. This may include all data in the compressed corpus, but the researcher may choose to randomly sample from the corpus if this is too large to analyse. If the data is perceived to be too narrow, the researcher can revisit phase 2 and adopt a more lenient search strategy.

Phase 4: Perform a qualitative analysis on this corpus of data

The final phase involves performing a qualitative analysis to address the research question, aims and objectives.

What should be considered before conducting social media research?

Franz et al have outlined 5 concepts to consider prior to conducting social media research: 6

  • Participants: What kinds of participants, if any, will be included in the study? This generally applies only to the enacted social media typology. The researcher needs to consider the user characteristics of various social media platforms; for example, Facebook users tend to be older than Instagram users.
  • Social media platform: There is a plethora of social media platforms, and each contains different types of data. Facebook, for example, collects a combination of public and private information about individual users. Twitter’s advanced search filters can be used to select desired variables and data about social media users. Many Instagram users have a ‘hidden’ profile that is inaccessible to researchers who are not ‘friends’ with them. Some platforms may also prohibit researchers to use their data; the words ‘no research’ may be indicated. When deciding what data is needed to address the research question, these factors need to be considered.
  • Data analysis: Depending on the size of the dataset, researchers may prefer a manual versus an automated approach to coding and data analysis. Content, framework and thematic analysis, as described in Chapters 23–25, are methods often used by researchers to analyse social media research data.
  • Data protection: ReCODE health7 is a US, web-based resource to help navigate ethical issues in social media research, which are outlined in the following section.

Ethical and legal challenges associated with social media research

Several challenges are associated with social media research. To begin with, social media users are generally unaware that their data could be used by researchers without their knowledge, which raises questions of informed consent.8 Second, the terms and conditions of social media platforms define the relationship between the platform and its users, but these legal agreements are often long and complex. It is questionable as to whether users understand these terms and conditions.9 At the same time, the terms and conditions of various social media platforms are always changing and will likely also change as the researcher’s work is underway. It is essential to constantly check this information to ensure that the research being undertaken does not violate these rules.10 At a minimum, it is essential for the researcher to apply for ethical approval from their institution’s ethics committee to conduct a social media study. Such a committee will require answers to the following questions:

  • What action or process has the researcher implemented to ensure that the data used is published by a reliable source?
  • Is it possible that social media users are the correct age or gender identity for the proposed study, and have the conditions, experiences or circumstances stated in their profiles?
  • How will the data be extracted from the platform? It is important to be mindful when publishing research findings, as it may be possible to reverse-identify individual users.

Advantages and disadvantages of social media research

There are several strengths and limitations in conducting social media research.

Advantages of social media research include the ability to:

  • engage with or gain access to populations that are difficult to access through traditional recruitment efforts; for example, highly immunocompromised people, or those living in the most remote communities
  • gain easy access to a large volume of data that would otherwise take researchers months or years to collect themselves
  • collect data quickly through search functions on various social media platforms; for example, using the advanced search function on Twitter

Disadvantages of social media research include:

  • problems in identifying participants – the researcher is never fully able to know who participants are; for example, a social media account may be a bot. At the same time, the researcher is never able to fully de-identify people either
  • users’ ability to delete or modify their content
  • the highly curated nature of social media – posts are often not truly reflective of people’s views and perspectives
  • the monitoring or prohibition of certain social media platforms in some countries; for example, in China and Iran. This limits data collection from some parts of the world. Governments in Australia are also poised to ban TikTok from devices issued to staff, which means that obtaining data from some populations may also be restricted

Table 16.1. Examples of social media studies

  Extant Elicited Enacted
Article title Tweets by people with arthritis
during the COVID-19 pandemic: content and sentiment analysis
Barriers to managing fertility: findings from the understanding fertility management in contemporary Australia Facebook discussion group12 Exploring the relationship between perceptions of social capital and enacted support online13
First author, year Berkovic, 2020 Holton, 2016 Stefanone, 2012
CC Licence CC BY 4.0 CC BY 2.0
Aim(s) and objective(s) ‘To identify proxy topics of importance for individuals with arthritis during the COVID-19 pandemic, and to explore the emotional context of these tweets by people with arthritis during the early phase of the pandemic.’ [Abstract] ‘To identify public opinion about sexual and reproductive health in Australia’ [Abstract] ‘To explore the relationship between perceptions of online relationships and actual, enacted support.’(p455)
Social media platform Twitter Facebook Facebook
Statement on the terms and conditions of the social media platform ‘All data were collected and reported according to the terms and conditions of Twitter, which state that content posted by individuals is publicly available to syndicate, broadcast, distribute, retweet, promote or publish, excluding private information (e.g. home addresses and identity documents). Use of tweets by individuals outside of Twitter can be carried out with no compensation paid to the individual tweeter, as use of Twitter is agreed upon as sufficient compensation.’ [Methods, Design] ‘Facebook allows users to determine how much of their personal information is publicly displayed. Profile security settings can be public (e.g. allowing access to the complete profile by any Facebook user) or private (e.g. limiting access of some or all profile information).

Before joining the discussion group, participants were asked to ensure that their Facebook privacy settings were consistent with what they wanted to reveal to the group. Participation in the group was voluntary, and participants could withdraw at any time. A request to join the group was taken as informed consent to participate.’ [Methods, Ethics]
Not included.
Recruitment No participants were recruited for this study. English-speaking women and men aged 18 to 50 years who were Facebook users living in Australia were sought and invited to participate in the online discussion group.

‘From October through December 2013, an advertisement briefly describing the research and discussion group was placed on the Facebook pages of all users meeting the eligibility criteria… The project page provided further details about the research and the participation involved. Those who chose to participate requested to join the group by clicking on a link on the Facebook project page. The group moderator approved requests and sent participants a ‘welcome to the group’ message on Facebook, inviting them to participate in the discussion by posting their responses to questions and to comments from other members.’

[Methods, Recruitment and Procedures]
Participants were recruited from a communication class at a large US-based university and instructed not to discuss the study with anyone else until the completion of the study (2-week period).
Data collection ‘Tweets were retrospectively extracted from March 20 to April 20, 2020… The desktop version of the Twitter website (versus the mobile app) was used for data collection for ethical purposes, with only publicly available tweets extracted, rather than through a private login.

In addition to the tweets themselves, accompanying data fields were extracted from each tweet using a customised template. Extracted data fields included (where possible): Twitter profile blurb, gender identity of tweeter, country of tweeter, number of likes, number of retweets, number of replies, hashtags used, number of hashtags and use of accompanying photos.’

[Methods, Data Collection]

‘In a closed-group moderated discussion, participants responded to questions about how people in Australia attempt to manage three aspects of fertility:

1. avoiding pregnancy,

2. achieving pregnancy, and

3. difficulties conceiving.

Non-identifiable demographic information was sought; no personal accounts of fertility management were requested.’

49 participants sent requests for instrumental help from their Facebook friends to determine the accessibility of networked resources and online social capital.

Each participant was instructed to examine their entire Facebook friend network and to think about their 6 strongest and 6 weakest relationships on this site.

Then they were required to record the identities and contact information for each of these 12 online friends. Participants completed a brief survey measuring demographic information and their perceptions about a series of relationship characteristics for each of the 12 friends. Finally, they were instructed to send each friend the following message:

‘Hey, [First Name] I need your help with a class project I'm working on. I need people to provide labels for a series of online images. I’d really appreciate your help! Please go to [study URL] and take the quick survey and label as many images as you can. Your participation will be a huge help. Thanks!’

Data analysis Content and sentiment analysis.

Content analysis was used to characterise the textual contents of tweets related to arthritis and COVID-19. Sentiment analysis was employed to assess the emotion associated with the theme of each tweet using Glaser and Strauss’s 6 codes for sentiment analysis.
Thematic analysis using the 4 systematic steps appropriate for focus groups:
organising, shaping, summarising and explaining
Survey data was analysed using Williams’ Internet Social Capital Scale. Frequency of communication was measured with a 7-point scale. Responses to the message were measured using binary outcomes (yes/no response to the message) and the quality of help provided based on participants’ perceptions.
Data protection ‘To avoid reverse identification of participants based on their tweets (which can be found through internet searches), tweets analysed in this study are not quoted verbatim. Instead, all data are expressed in aggregate form through descriptive statistics and qualitative syntheses.’

‘Before the group began, researchers made decisions about reasonable expectations for privacy, ownership of any data generated and means of moderating the discussion and removing any offensive posts. These expectations were outlined to potential participants on the project’s Facebook page.’

[Methods, Recruitment and Procedures]
Not stated.
Main findings Content analysis revealed 7 themes from the tweets:

1. health care experiences,

2. personal stories,

3. links to or advertisements of relevant blogs,

4. discussion of arthritis-related symptoms,

5. advice sharing,

6. messages of positivity and

7. stay-at-home messaging.

Sentiment analysis categorised the 7 themes into ‘great’, ‘swell’, ‘so-so’, ‘bad’, ‘wretched’ and ‘no sentiment’ categories.

Four main themes about fertility management were identified:

1. access,2. geographical location,

3. knowledge and

4. cost.

Participants reported that young people and people from rural areas faced barriers accessing contraception and fertility services. Limited knowledge about sex and reproduction, and the cost of fertility services and contraception were also said to impede effective fertility management.

Perceptions of online support are associated with actual, enacted support.


Social media platforms can provide researchers with access to data that can be downloaded, analysed and interpreted to understand some of the world’s most pressing research agendas. There are 3 social media research typologies: extant, elicited and enacted, each involving varying levels of participant involvement. There is no definitive method of conducting social media research, and there are several ethical and legal challenges that researchers need to address.


  1. Weinberg BD, Pehlivan E. Social spending: managing the social media mix. Bus Horiz 2011;54(3):275-282. doi:10.1016/j.bushor.2011.01.008
  2. Yoon S, Wee S, Lee VSY et al. Patterns of use and perceived value of social media for population health among population health stakeholders: a cross-sectional web-based survey. BMC Public Health. 2021;21(1):1312. doi:10.1186/s12889-021-11370-y
  3. Statista. Number of social media users worldwide from 2017 to 2027 (in billions). Accessed April 13, 2023.
  4. Salmons J. Using social media in data collection: designing studies with the qualitative e-research framework. The SAGE Handbook of Social Media Research Methods. In: Quan-Hasse A and Sloan L, eds. The SAGE Handbook of Social Media Research Methods. 2nd ed. SAGE; 2022: 113-125. Accessed September 7 2023. ISBN-10: 1529720966
  5. Andreotta M, Nugroho R, Hurlstone MJ, et al. Analyzing social media data: a mixed-methods framework combining computational and qualitative text analysis. Behav Res Methods. 2019;51(4):1766-1781. doi:10.3758/s13428-019-01202-8
  6. Franz D, Marsh HE, Chen JI et al. Using Facebook for qualitative research: a brief primer. J Med Internet Res. 2019;21(8):e13544.
  7. Research Centre for Optimal Digital Ethics Health (ReCODE Health). Collectively Shaping Responsible and Ethical Practices in Digital Health. University of California. Accessed April 13, 2023.
  8. Golder S, Ahmed S, Norman G, et al. Attitudes Toward the Ethics of Research Using Social Media: A Systematic Review. J Med Internet Res. 2017;19(6):e195. doi: 10.2196/jmir.7082
  9. Townsend L, Wallace C. Social Media Research: A Guide to Ethics. University of Aberdeen. Accessed April 19, 2023.
  10. Schneble CO, Favaretto M, Elger BS et al. Social media terms and conditions and informed consent from children: ethical analysis. JMIR Pediatr Parent. 2021;4(2):e22281. doi:10.2196/22281
  11. Berkovic D, Ackerman IN, Briggs AM et al. Tweets by people with arthritis during the covid-19 pandemic: content and sentiment analysis. J Med Internet Res. 2020;22(12):e24550. doi:10.2196/24550
  12. Holton S, Rowe H, Kirkman M et al. Barriers to managing fertility: findings from the understanding fertility management in contemporary Australia Facebook discussion group. Interact J Med Res. 2016;5(1):e7. doi:10.2196/ijmr.4492
  13. Stefanone MA, Kwon KH, Lackaff D. Exploring the relationship between perceptions of social capital and enacted support online. J Comput Mediat Commun. 2012;17(4):451-466. doi:10.1111/j.1083-6101.2012.01585.x