The ODC has a set of standards designed to ensure data is FAIR (Findable, Accessible, Interoperable, and Reusable). These standards make the data more useful for everyone, including your future self!
We can divide the standards into three types:
Data Formatting specifications. What type of files to use, and how is data organized in these files?
Common terminology. A set of terms (e.g., variables) that are pre-defined and allow for all shared data to have a common language. For example, this may include minimal required variables or common data elements.
Metadata standards. Associated information about a dataset is needed to make the data FAIR. This includes a formatted data dictionary, information to generate digital object identifiers (DOI)
ODC supports uploading spreadsheet data in .csv (comma-separated values) file format. You can save your data to .csv format using the most common spreadsheet software, such as Excel.
ODC uses tidy formatting of a spreadsheet or tabular data. The basics are simple: each row is an observation, and each column is a measure, field, or variable. A tidy data format is a great way to make data shareable and understandable by humans and machines!
The following two images illustrate how to create a .csv file with data organized in the tidy format
Unique subject ID column: ODC is organized around subjects. One of your dataset columns must contain the subject or animal identifier (e.g. Subject_ID). This identifier should be unique for each subject. If subject_1 represents two different animals present in two different experiments, the identifier is not unique.
Columns as Variables: Each column represents a study parameter, outcome measure, field or variable
First row lists the Variable names: The first row of the dataset contains the name of the columns.
Subsequent rows as observations: Each row represents a single observation for a single subject.
A subject could have multiple rows. For example, your dataset might include multiple timepoints for each subject, in which case each row might represent a unique observation of a subject at a specific time point. The Subject ID column will help identify all the data for each specific subject in the dataset.
Column/Variable name requirements (based on best practices):
Keep variable names short. Avoid variable names longer than 64 characters.
Variable names must start with a letter.
Variable names should be intuitive (e.g. use “Date_Birth” instead of “DB”).
Avoid spaces in variable names. Use underscore (“_”) instead.
Avoid special characters except underscores (“_”) and periods (“.”). If you must use special characters, verify the corresponding Variable and data are uploaded correctly.
No duplicated Variable names: Every column header must have a unique name. If two of your columns have the same name, you will receive an error during data upload.
Avoid the use commas: CSV files use commas to separate the contents of one cell from another. If you use commas in a cell, it may be read as a delimiter character (i.e. cell separator) which can lead to errors in data upload. If you must use commas, always double check that your data is uploaded correctly. Microsoft Excel can also save csv's in such a way to prevent misinterpretation of commas in your data. However, we generally recommend avoiding the use of commas in your dataset altogether.
A crucial aspect of making data interoperable and reusable is using common definitions for the same things, such that data collected in one study is comparable to the data collected by others. For instance, what one researcher defines as "injury severity" is the same across the research community. However, this is extremely challenging in practice because there is generally not only a single way to define what we do in the laboratory. A solution can be common terminologies that serve as reference models and standards for defining data variables (also known as data elements). These provide information on how to name variables, and their definitions and, in some instances, define how the variables need to be collected or measured to fulfill those definitions.
The ODC uses different sets of common terminologies depending on the community and the projects supported.
These common terminologies are still in development and are likely to evolve and change over time. We can help to understand and navigate these terminologies. Contact us if you need help!
ODC-SCI community data elements (CoDEs). The ODC-SCI has a set of data elements endorsed by the community board that serves as the minimal required variables necessary for making data public through the ODC-SCI.
There are currently several federally-supported efforts to develop and update common data elements for TBI. Prominent examples are:
PRECISE-TBI CDEs. The PRE Clinical Interagency reSearch resourcE-TBI (PRECISE-TBI) project uses the ODC-TBI as a data-sharing platform. PRECISE-TBI is developing a set of CDEs for pre-clinical TBI research. Those CDEs will be available for their use as common terminology for data shared through the ODC-TBI.
TOP-NT TBI CDEs. The Translational Outcomes Project In Neurotrauma (TOP-NT) is a consortium for developing and validating clinically relevant biomarkers for traumatic brain injury (TBI).
The ODC-SCI community board has approved the definition of a set of community data elements or CoDEs and established them as a minimal set of variables required for any dataset to be published through the ODC-SCI with a DOI.
If you get used to including these variables with the following names during the preparation of your data, you will reduce the time to get a DOI!
The list below includes the required variable name (in bold font) and the definition for each CoDE. You can download an ODC data dictionary template with the CoDEs
Subject_ID: Unique identifiers for each subject in the dataset
Species: Species of the subject
Strain: Strain of the subject
Animal_origin: Vendor or origin of the animal
Age: Age of the subject at start of experiment. If age is available at different timepoints, age is provided at the corresponding time in a corresponding time/timepoint variable
Weight: Weight of the subject at start of experiment. If weight is available at different timepoints, weight is provided at the corresponding time in a corresponding time/timepoint variable
Sex: Sex of the subject
Group: Name or identifier of the experimental group at which the subject was included if any
Laboratory: Name of laboratory, usually the PI
StudyLeader: Name of person responsible for overseeing project
Exclusion_in_origin_study: Whether the subject was included in the study that originated the data. 'Total exclusion" if excluded from the entire study, otherwise, specify experiment or measures of which the animal was excluded if any. For example: animals that were run in behavior but maybe tissue is loss and excluded from histological analyses. Reasons for exclusion might be specify in the exclusion_reason variable.
Exclusion_reason: Reason by which the subject was excluded from the study that originated the data as specified in the Exclusion_in_origin_study variable
Cause_of_Death: Cause of death (e.g. perfusion/necropsy, died during surgery, euthanized for health reasons, etc)
Injury_type: Type or model of injury used in the subject (e.g. contusion, complete transaction, partial section)
Injury_device: Name of the device used for the injury
Injury_level: Spinal cord level at which the injury was performed including segment (e.g. cervical; C) and number (e.g. C5)
Injury_details: Other details referent to the injury that might be relevant to understand the severity and type of injury performed
Metadata refers to "data about the data" or information that may not constitute the data itself but provides an understanding of different aspects of the data. For instance, keywords associated with a dataset or the date on which a dataset was uploaded to a repository can be considered part of the metadata along the data. There are different types of metadata depending on their goal. A data dictionary, as described below, can be considered descriptive metadata that provides definitions and other elements for the content of a dataset. The citation of a dataset (similar to the citation of a paper) provides referencing metadata, and a data reuse license may provide legal metadata. Using standardized metadata increases the Findability and Interoperability of the data resources. The ODCs utilize the following standards.
ORCID. The ODCs support the Open Researcher and Contributor ID or ORCID, a researcher global standard identifier. Users can link their ODCs accounts and profiles to ORCID and use it for identification.
RRIDs. The ODCs support the use of Research Resource Identifiers or RRIDs, a standard identification number for the catalog of scientific tools and resources.
ODC-SCI: SCR_016673
ODC-TBI: SCR_021736
Creative Commons License. All datasets published on the ODC are under the Creative Commons Attribution License (CC-BY 4.0).
ODC Data dictionary. A data dictionary or codebook provides information about the dataset variables. It is one of the most important pieces of information to include with a dataset for anyone who wants to interpret and reuse the data.
ODC narrative summary (abstract). ODC offers a metadata narrative and summary where data owners can provide information about the dataset. This information is unique to each dataset and is essential for archiving, interpretability, and reuse.
A data dictionary, also known as a codebook, provides information about the dataset variables. It is one of the most important pieces of information to include with a dataset for anyone who wants to interpret and reuse the data. Even if you are not planning on releasing your data, it is encouraged and of good data management practice to have data dictionaries for your datasets. You may know now what a variable name means in your spreadsheet (e.g., jtemp_6), but will your PI or colleagues know when you leave the lab? Will you know if you try to reuse the data two years from now? A data dictionary is a critical lab asset that ensures the data that have taken great effort and resources to acquire will not go to waste in the future due to poor documentation.
Data dictionaries can fulfill funding requirements for datasets to be accompanied with proper documentation.
The data dictionary used by the ODC is a .csv file (a comma-separated value file). Learn more about .csv files here.
The file must contain the following column names in the first row:
VariableName: * Variables (i.e. column headers) that appear in the dataset. You must include all of your dataset variables in the data dictionary.
Title: * Title is the full name of the variable when the VariableName contains abbreviations or shorthand. If the VariableName is already a complete name, you can copy and paste the VariableName into the Title entry.
Unit_of_Measure: Units for the variable (if applicable).
Description: * Definitions and descriptions of the variable. The description should explain what the variable represents in enough detail such that a reader can understand the contents of the column in the dataset.
DataType: Specify whether the variable specifically contains Numeric, Categorical, Ordinal, Date, or Free Text data.
PermittedValues: If the variable is not numeric or free text, list all possible values here (e.g. "Male, Female" for the variable "Sex"). If the variable is numeric or free text, can leave this blank (use MinimumValue and MaximumValue columns).
MinimumValue: If the variable is numeric, list the Minimum possible value. For example, if you expect a variable to be between 0-100, write 0 for MinimumValue. If there is no minimum value, leave this blank.
MaximumValue: If the variable is numeric, list the Maximum possible value. For example, if you expect a variable to be between 0-100, write 100 for MaximumValue. If there is no maximum value, leave this blank.
Comments: Additional notes such as exclusion criteria, reasons for special values, etc.
VariableName, Title, and Description are always required and cannot be left blank for data dictionary upload; every row in the data dictionary must have a VariableName, Title, and Description. The other columns are optional for upload, but are required for dataset publication.
ODC offers a metadata form where you can provide information about the dataset in a standardized way. You can access the Metadata Editor for each dataset from a dataset view page. This information is unique to each dataset and helps with the interpretability and reuse of the data.
Title: Title that will be displayed on the dataset citation. Note that this will not change the title of the dataset visible within the ODC itself. Please include the species, sex, lesion type and area in your title.
Abstract: The Abstract includes 3 fields: Study Purpose, Data Collected, Data Usage Notes.
Study Purpose: Short description of the overall study purpose that resulted in the dataset.
Data Collected: Summary of what kind of data is included in the dataset and how the data was collected. Please include important experiment parameters (such as experimental model and injury severity) and critical outcome measures.
Conclusions: Summary of conclusions (if any) made with the dataset at the time of dataset publication.
Keywords: Keywords can be added to allow search engines to locate the DOI and dataset citation once the dataset is published. You can add your own keyword or start typing to see the ones ODC has already registered. You can reorder the keywords after they are added by dragging/dropping them in the list.
Provenance / Originating Publication: This section allows for entering publications that are related to the dataset. You can either import the information automatically or introduce it manually.
Import from existing publication: enter the DOI or PMID. Note that we can only import information from some preprint articles. Check the “Import authors as contributors” checkbox to import the publication’s author list automatically as contributors of the dataset (you will have to assign the dataset author and contact author labels to the respective entries after importing). After import, choose to edit the publication entry and fill out the remaining fields: Citation Relevance.
Manual: If you want to enter information manually, you can create a blank entry by leaving the DOI/PMID field blank and hitting “Import/Add Publication.” Choose to edit the new entry and fill out the appropriate fields: DOI, PMID, Citation, Citation Relevance.
Relevant links: This section allows for adding links to external resources that are relevant to the dataset. For example, if omics data associated to the dataset have been deposited in another repository, the link can be provided here. This section can also be used to link the current dataset to a published dataset in ODC.
Notes: The Notes section should be used to provide important guidance for others on using your data. Relevant information may include technical issues during the experiment that may require data exclusion, specifics about the techniques that may prevent merging with other datasets, and so on. The goal is to provide information useful to data re-users and prevent data misuse.
Funding and Acknowledgements: The Funding and Acknowledgements section requires 2 fields for each entry: Funding Agency, Funding Identifier and PI Initials. You can reorder the entries after they are added by dragging/dropping them in the list.
Funding Agency: Name of funding agency.
Funding Identifier and PI Initials: Respective funding ID (e.g. grant number) and PI Initials in parenthesis. For example: 4F0887Z (AC).
Contributors / Authors: ODC considers any Author of a dataset a Contributor. Authors will have their names attached to the citation of a dataset if published. Contributors that are not authors are other persons that do not constitute an author but you want acknowledge their contribution to the dataset. Each Contributor/Author has 5 fields to fill out. Each entry is added as a contributor to the dataset by default; if you check the options to include an entry as an author or contact author, the appropriate label will be applied to the contributor/author entry. Each entry includes: First Name, Middle Initial, Last Name, ORCID, Affiliation, and Contact Email (if contact author). Data that was imported might be incomplete and will need to be edited.
First Name: Person’s first name.
Middle Initial: Person’s middle initial (if relevant).
Last Name: Person’s last name.
ORCID: Person’s associated ORCID.
Affiliation: Person’s associated affiliation at time of publishing the dataset.
Contact Email: Person's contact email (field appears only if they are a contact author)
DOI: This is provided by ODC once you go through the DOI request process
Dataset Citation: This field provides you with a look at how the citation of the dataset will look like if released to the public. It is constructed automatically from the provided list of authors and the tile. You can see the changes as you change those two pieces of information!
Dataset Info: Dataset info is automatically populated from the other sections. The section includes: Contact Author information, Lab, ODC-SCI Accession Number, Number of Records in Dataset, Fields per Record, number of associated Files.
License: The License is automatically assigned once and if the dataset is published. All datasets published on the ODC-SCI will be under the Creative Commons Attribution License (CC-BY 4.0).