Skip to main content

The Problem

Making data widely available to researchers is good policy. It enables replication and validation of scientific findings and maximizes return on research investment. For these reasons, sponsors and publishers expect or mandate the sharing of data where possible.

However, data containing sensitive information about individuals cannot be shared openly without appropriate safeguards. An extensive body of statutes, regulations, institutional policies, consent forms, data sharing agreements, and common practices govern how sensitive data should be used and disclosed in different contexts.

Researchers and institutions that manage and share data must interpret how the various legal requirements and other data privacy and security standards constrain their handling of a given dataset. DataTags helps researchers navigate these complex issues.

How DataTags Works

Using the following three-step process, DataTags automates the assessment of the data handling rules that apply to an individual dataset. The output is a set of tags that describe how the dataset can be stored, transmitted, or used over time. learn more

Tag Levels

The DataTags prototype generates a set of tags based on the model data classification levels described in the Harvard Research Data Security Policy . The tag levels, a draft version of which appears below, describe the data security and other handling requirements that must be implemented based on the legal restrictions and privacy risks associated with a dataset.

Risk Levels and Associated Tags

These tags denote the minimum handling requirements based on the risks associated with a dataset. Hover/touch tags for explanation
Level DUA Agreement Method Authentication Transit Storage
  
Blue
None
None
Clear
Clear
Non-confidential information that can be stored and shared freely
  
Green
None
Email or OAuth
Clear
Clear
Potentially identifiable but not harmful personal information, shared with some access control
  
Yellow
Click Through
Password
Encrypted
Clear
Potentially harmful personal information, shared with loosely verified and/or approved recipients
  
Orange
Sign
Password
Encrypted
Encrypted
May include sensitive, identifiable personal information, shared with verified and/or approved recipients under agreement
  
Red
Sign
Two Factor
Encrypted
Encrypted
Very sensitive identifiable personal information, shared with strong verification of approved recipients under signed agreement
  
Crimson
Sign
Two Factor
Double Encryption
Double Encryption
Requires explicit permission for each transaction, using strong verification of approved recipients under signed agreement
Step 1
Questionnaire

The user answers a series of questions designed to elicit the key properties of a given dataset

Step 2
Assessment

Based on the user’s responses, DataTags applies inference rules to determine which handling requirements are relevant to the dataset

Step 3
Tag Assignment

DataTags generates simple, iconic tags that indicate how the dataset can be stored, transmitted, or used based on its properties and the applicable restrictions.

DataTags is being designed to integrate with the open source data repository software Dataverse and its suite of access controls and statistical analysis tools. It will also operate as a standalone tool and as an application that can be integrated with other platforms.

Try the DataTags Demo

Research

DataTags is an open source tool being developed by the Privacy Tools for Sharing Research Data project at Harvard University.

The goal of this broad, multidisciplinary collaboration between the Center for Research on Computation and Society, the Institute for Quantitative Social Science, the Berkman Center for Internet & Society, and the Data Privacy Lab is to help enable the collection, analysis, and sharing of personal data for research while providing privacy for individual subjects.

More information about the development of the DataTags prototype is available from IQSS.