Social Media Data Stewardship


As a research field, interdisciplinary academic social media research is growing at a rapid pace.

For many researchers, social media data (both user- and system-generated) is a rich source of behavioral data that can reveal how we communicate and interact with each other online and what that might mean for our society as we continue to speed towards an increasingly computer-mediated future. That future holds many promises, but also some perils. In the aftermath of a highly debated Facebook mood experiment in 2014, questions around how industry and academic researchers should handle and use social media data are more relevant than ever.

Currently, there is still a bit of a Wild West type mentality when it comes to the handling and usage of social media data. Many key questions about proper data management processes are still unsettled. For example, what can and can’t you do with social media data? When is it appropriate for a researcher to mention the name of a person in their data set and when is it not? As part of my recent appointment as a Canada Research Chair in Social Media Data Stewardship, I aim to examine these and other questions in details and to conduct studies that will help to settle some of these questions.

So what exactly is Social Media Data Stewardship?’ It is a new concept that I am proposing that touches on many of the data management processes that social media researchers have to navigate today in relation to collecting, storing, analysing, visualizing, publishing and reusing social media data.  A working definition for the concept of Social Media Data Stewardship’ is below.


To understand the new concept ofSocial Media Data Stewardship’, we must first understand what we mean when we refer to ‘data management processes?’ These are processes such as collection, retrieval, re-use, sharing, archiving, preservation, and disposal. They are often combined in a recently emerged, umbrella-term – data curation, or an even broader concept – data stewardship. The key functions of data curation are to “enable data discovery and retrieval, maintain its quality, add value, and provide for reuse over time“. Data stewardship expands the focus of data curation to also include preservation and long-term data management (Lazorchak, 2011).

The Archives, Library and Information Science field has been actively tackling issues of data and information management for decades. The rapid growth of accumulating scientific research data (primarily in the “hard” sciences) brought the questions related to the stewardship of research data to the forefront of the research community, including some recent work by Palmer, and Chao, However, questions related to stewardship of social media data and interrelated ethical considerations and implications have largely gone unanswered and understudied.

Defining Social Media Data Stewardship (SMDS)

As a way to offer a common framework to handle social media data and discuss data- and user-driven challenges associated with it, I propose to expand the original notion of data stewardship and apply it to the management of social media data. So here is a working definition:

Social Media Data Stewardship (SMDS) is a set of data- and user-driven principles to guide all aspects of managing social media data including its collection, storage, analysis, publication, reuse, sharing and preservation. 

Social Media Data + Data Stewardship = Social Media Data Stewardship –  processes related to all aspects of managing social media data including

In order to study SMDS, we at the Social Media Lab are launching a new 5-year initiative which goal is to develop a new social media data stewardship framework to inform future development of digital research infrastructure. As SMDS is a multi-faceted notion, it would require studying different stakeholders and their practices to get a full picture:

  1. Data consumers (researchers, policy and decision makers working in the private and public sectors);
  2. Data producers (social media users);
  3. Data intermediaries (social media platforms).

Table 1 below outlines some of the initial research questions and dimensions that we plan to study as part of this initiative. Our hope is that a resulting SMDS framework will allow both data consumers and producers to unlock the full potential of social media data while still considering ethical implications of using available data.

To follow the initiative’s updates and to join our growing community of researchers interested in this subject, please contact us or visit our new website for this initiative:




• Where and how data is being collected?
• How to properly sample social media data?
• How to access and collect historical data?
• Should researchers collect informed consent from social media users even if they are working with publicly available social media data and under what circumstances?
• If yes, how should researchers go about collecting consent, especially in cases of anonymous users?
• What are the API rate limits for data collection set by social media platforms?
Storage/ Preservation
• What are efficient data structures for storage and subsequent retrieval of such multidimensional datasets?
• How do we preserve social media data that spans time, space, formats and platforms?
• What are effective metadata schemas to represent social media data?
• What are the main strategies and ethical considerations when preserving social media data?
• Should social media users whose data is being preserved be permitted to request deletion of their records? (“the right to be forgotten”)
• Can and for how long social media data be stored outside social media platforms?
• Should the records collected previously be deleted if they are also deleted by their creator?
• How to deal with bots (automatically created accounts)? Should they be detected and excluded from data?• How should sensitive social networks data be handled?
• What are the ethical consideration of studying users and their behavior across multiple social media platforms?
• Are there any types of analysis or data transformation processes that are not permitted by the social media platforms?
• When aggregation and/or anonymization of social media data is required, what methods and techniques should be used?• What are the issues associated with publishing the whole dataset?
• Should researchers only present data in an aggregated form to avoid linking to individual posts and users?
• Can researchers publish data they used for the analysis? For example, as part of a journal publication?
• What are effective mechanisms, tools, strategies and repositories for sharing large-size and dynamic datasets from social media? • Should only anonymized data be shared with other researchers?
• If the original collection required user’s consent, should social media users be re-contacted to request permission to re-use their data for a different study/project?
• Can researchers share data they have collected with other researchers?