Estimating costs for data management and sharing

Information and tools for understanding and estimating costs involved in managing and sharing data

Investigators may include costs involved managing and sharing data according to their DMS. Currently, there is no fee for sharing data through the Open Data Commons for data under 30 Gb in size.

Guidance from NIH NOT-OD-21-015 on allowable costs: Reasonable, allowable costs may be included in NIH budget requests when associated with:

  1. Curating data and developing supporting documentation, including formatting data according to accepted community standards; de-identifying data; preparing metadata to foster discoverability, interpretation, and reuse; and formatting data for transmission to and storage at a selected repository for long-term preservation and access.

  2. Local data management considerations, such as unique and specialized information infrastructure necessary to provide local management and preservation (e.g., before deposit into an established repository).

  3. Preserving and sharing data through established repositories, such as data deposit fees necessary for making data available and accessible. For example, if a Data Management and Sharing Plan proposes preserving and sharing scientific data for 10 years in an established repository with a deposition fee, the cost for the entire 10-year period must be paid prior to the end of the period of performance. If the Plan proposes deposition to multiple repositories, costs associated with each proposed repository may be included.

The question is, how to estimate these costs? In our experience, researchers tend to underestimate the amount of time and effort involved in managing and sharing data. Depending on the institution and the size, nature and complexity of the data, the major costs are usually not storage or access, but rather personnel. If you do not have a dedicated data steward in your lab, you will have to ensure that you budget the required personnel to manage and share the data.

Some resources that can help:

Cost drivers for data adapted from the National Academies of Science report on Lifecycle Decisions for Biomedical Data

