Responsible Conduct in Research:
Research Data Management


Vicky Rampin & Nicholas Wolf


Get this presentation:

https://nyu-dataservices.github.io/RCR-DataManagement

The Problem

Researchers work with a lot of data...

...but how should it be organized?

Most Scientific Research Data From the 1990s Is Lost Forever

Article in the Atlantic

A new study has found that as much as 80 percent of the raw scientific data collected by researchers in the early 1990s is gone forever, mostly because no one knows where to find it.

Disappearing Data

Human Error


Washington Post: "An Alarming Number of Scientific Papers Contain Excel Errors"

The Solution

Spell out in detail how you will account for this in your grant (Data Management Plan)


Managing the way data is collected, processed, analyzed, preserved, and published for greater reuse by the community and the original researcher.

What is Data?

"the recorded factual material commonly accepted in the scientific community as necessary to validate research findings." -Federal Office of Management & Budget Circular A-110

Federal Regulations

High-Level View of RDM

Data Type Group Roles Data Storage Data Archiving
format of data to be generated who is primarily responsible for carrying out RDM? Set group norms where will you store your data and how will you backup your data? how will you preserve and make your data available to others?

Basically, think to yourself:

if I wanted to use this data in 10 years, what would I need to pack with it to make it useful?

Keep all those things

Documentation with the Open Science Framework

  • Wiki: document your lab procedures, standards, etc.
  • Collaborators: add collaborators of all levels, on different parts of your project
  • Components: sub-projects to organize your research
  • Version Control: upload files of the same name & OSF will track your versions!
  • Add-Ons: use OSF to bring together tools you use | GitHub
  • Registrations: when you have an unchanging version of your project, register it & get a DOI!

Documentation with
Jupyter Notebooks

  1. Web Application
    • in-browser code editing: syntax highlighting & indentation
    • run code in-browser: results attached to parent code
    • display results in LaTex, HTML, SVG, & more
  2. Notebook
    • a complete record of a session, interleaving code with text, maths, & objects
    • can export to LaTex, PDF, slideshows, etc. or webpage

Basically, think to yourself:

if I wanted to use this data in 10 years, what would I need to pack with it to make it useful?

Keep all those things

Documenting Local Files

Storage Rules!

NYU Storage Resources

  NYU Google Drive NYU Box NYU Research Workspace
Intended use General data use requiring password access General data, including sensitive or secure data High-capacity data storage
Storage size Unlimited Unlimited 2 TB
Sharing and user control Yes Yes Yes
Versioning and file change tracking Yes Some Snapshots of files
Funder requirements Moderate risk security High risk security U.S. based data location

Anonymizing Data

  • Anonymizing data:
    • Direct identifiers (name, DOB, SSN, address, id numbers, etc.)
    • Indirect identifiers (variables in combination that enable identification)

  • Solutions:
    • Removal of identifying variables
    • Binning values/top coding (i.e. hide unique outlier values or aggregate values)
    • Disturbing (add random values to encoded value, retaining integrity of statistical accuracy)

Long Term Storage

Choose what you want to preserve/get to in the long term, but No matter WHAT, make sure you keep:

  • documentation (lab/field notebooks, etc.)
  • tools & analysis
Put your data into an archival format!

  • this should be open + accessible
  • Software agnostic

Archival Storage in Repositories

When you publish, you should make the underlying data available in a repository that issues DOIs! You then link that DOI in your "Supplementary Materials" section!

This means that anyone who wants to use your data must go to this repository, download it, and cite their use if they publish using it!



Example: Dryad Data Repository

Advantages to Tracking Citations:

  • Demonstrate to funders/promotion committees you & your data make big impacts in your field!
    • they judge merit based on intellectual merit and wider impact
    • tangible evidence to weigh against the cost of research

  • Monitor usage of datasets!
    • You can know what forms of data prep and data publication are most effective for sharing/open science!
    • Uncover opportunities for collaboration amongst peers

Getting Credit for Your Data

Data Management To-Do List

1. Create a Researcher Identity

Open Researcher & Contributor ID

  • free! persistent identifier for researchers (think DOI)
  • link all your publications to you rather than someone with your same name!
  • many journals are asking for an ORCID upon submission of materials

Do you have one? No? Let’s get you an ORCID.org!


2. Get a Home for Your Research

Open Science Framework

  • Wiki for documentation!
  • Collaborators of all levels, on different parts of your project!
  • Components: sub-projects to organize your research!
  • Add-Ons: use OSF to bring together tools you use!

3. Know What Data Management Funders Want

Applying Best Practices

Data Management Plans

a document that describes how you will collect, organise, manage, store, secure, backup, preserve, and share your data.

From NSF’s Data Management Plan Guidelines:

  1. the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;
  2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
  3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
  4. policies and provisions for re-use, re-distribution, and the production of derivatives;
  5. plans for archiving data, samples, and other research products, and for preservation of access to them.

Example Data Management Plans

Thank you! Questions?


Email us: vicky.rampin@nyu.edu or nicholas.wolf@nyu.edu

Learn more about RDM: guides.nyu.edu/data_management

Get this presentation: guides.nyu.edu/data_management/resources

Make an appointment: guides.nyu.edu/appointment