🧠
Open Data Commons
  • Open Data Commons
    • Principles of the ODC
  • How to get help
    • General help
    • FAQ
    • Let us know
  • 📒Tutorials & Documentation
    • Getting started
    • Demo site
    • Getting your data ready
      • Required variables
      • Data format
      • Data dictionary
      • Common errors for Dataset and Data Dictionary
    • Upload to ODC
      • Upload new data
      • Upload a dictionary
      • Upload a Methodology file
      • Supplementary files
      • Common errors during upload
    • Manage a dataset
      • Update a dataset
      • Add metadata
      • Share data
    • Publish your dataset with a DOI
      • Request a DOI
      • Data Quality Checks for DOI
      • Publication in ODC-SCI
        • Summary of review process
      • Publication in ODC-TBI
    • Adding an experimental protocol to a dataset
    • How to cite ODC and dataset
    • Manage a lab
    • Get a reviewer token
    • Estimating costs for data management and sharing
    • Sample DMS
    • ODC Standards
      • Data formatting specifications
      • Common Terminology
        • ODC-SCI CoDEs
      • Metadata standards
        • ODC data dictionary
        • ODC Narrative and Metadata
    • Glossary
  • 🛠️ODC Tool Sandbox
    • Tool Sandbox
      • ODC quality control app
    • For developers
  • 📗Fundamentals
    • Why share data with ODC?
    • What are the different account types on the ODC?
    • How does privacy work on the ODC?
    • FAIR data
  • ➕Extras
    • The ODC team
      • About ODC-SCI
      • About ODC-TBI
    • Funding and support
      • ODC-SCI funding
      • ODC-TBI funding
    • Publications
    • Our blogs
    • Workshops and Outreach
    • What people are saying
    • Terms of use and policies
Powered by GitBook
On this page
  • The ODC data dictionary

Was this helpful?

Edit on GitHub
Export as PDF
  1. Tutorials & Documentation
  2. Getting your data ready

Data dictionary

PreviousData formatNextCommon errors for Dataset and Data Dictionary

Last updated 6 days ago

Was this helpful?

A data dictionary, also known as a codebook, provides information about the dataset variables. It is one of the most important pieces of information to include with a dataset for anyone who wants to interpret and reuse the data. Even if you are not planning on releasing your data, it is encouraged and of good data management practice to have data dictionaries for your datasets. You may know now what a variable name means in your spreadsheet (e.g., jtemp_6), but will your PI or colleagues know when you leave the lab? Will you know if you try to reuse the data two years from now? A data dictionary is a critical lab asset that ensures the data that have taken great effort and resources to acquire will not go to waste in the future due to poor documentation.

Data dictionaries can fulfill funding requirements for datasets to be accompanied with proper documentation

The ODC data dictionary

The data dictionary used by the ODC is a .csv file (a comma separated value file). Learn more about .csv files here.

Download the pre-clinical data dictionary template

The data dictionary file must contain the following column names in the first row:

  • VariableName: * Variables (i.e. column headers) that appear in the dataset. You must include all of your dataset variables in the data dictionary. Tip: Select your variable row in your dataset file and Copy, in the data dictionary file in cell A2, Paste Special>Transpose, all your variable names should be pasted into the first column of your data dictionary file now.

  • Title: * Title is the full name of the variable when the VariableName contains abbreviations or shorthand. If the VariableName is already a complete name, you can copy and paste the VariableName into the Title entry.

  • Unit_of_Measure: Units for the variable (if applicable).

  • Description: * Definitions and descriptions of the variable. The description should explain what the variable represents in enough detail such that a reader can understand the contents of the column in the dataset.

  • DataType: Specify whether the variable specifically contains Numeric, Categorical, Ordinal, Date, or Free Text data.

  • PermittedValues: If the variable is not numeric or free text, list all possible values here (e.g. "Male, Female" for the variable "Sex"). If the variable is numeric or free text, can leave this blank (use MinimumValue and MaximumValue columns).

  • MinimumValue: If the variable is numeric, list the Minimum possible value. For example, if you expect a variable to be between 0-100, write 0 for MinimumValue. If there is no minimum value, leave this blank.

  • MaximumValue: If the variable is numeric, list the Maximum possible value. For example, if you expect a variable to be between 0-100, write 100 for MaximumValue. If there is no maximum value, leave this blank.

  • Comments: Additional notes such as exclusion criteria, reasons for special values, etc.

VariableName, Title, and Description are always required and cannot be left blank for data dictionary upload; every row in the data dictionary must have a VariableName, Title, and Description. The other columns are optional for upload, but are required for dataset publication.

📒
4KB
ODC-SCI_DataDictionary_Version2.0Feb182025.csv
Example of a data dictionary created in a spreadsheet format