Data dictionary

A data dictionary, also known as a codebook, provides information about the dataset variables. It is one of the most important pieces of information to include with a dataset for anyone who wants to interpret and reuse the data. Even if you are not planning on releasing your data, it is encouraged and of good data management practice to have data dictionaries for your datasets. You may know now what a variable name means in your spreadsheet (e.g., jtemp_6), but will your PI or colleagues know when you leave the lab? Will you know if you try to reuse the data two years from now? A data dictionary is a critical lab asset that ensures the data that have taken great effort and resources to acquire will not go to waste in the future due to poor documentation.

Data dictionaries can fulfill funding requirements for datasets to be accompanied with proper documentation

The ODC data dictionary

The data dictionary used by the ODC is a .csv file (a comma separated value file). Learn more about .csv files here.

Download the pre-clinical data dictionary template

The data dictionary file must contain the following column names in the first row:

  • VariableName: * Variables (i.e. column headers) that appear in the dataset. You must include all of your dataset variables in the data dictionary. Tip: Select your variable row in your dataset file and Copy, in the data dictionary file in cell A2, Paste Special>Transpose, all your variable names should be pasted into the first column of your data dictionary file now.

  • Title: * Title is the full name of the variable when the VariableName contains abbreviations or shorthand. If the VariableName is already a complete name, you can copy and paste the VariableName into the Title entry.

  • Unit_of_Measure: Units for the variable (if applicable).

  • Description: * Definitions and descriptions of the variable. The description should explain what the variable represents in enough detail such that a reader can understand the contents of the column in the dataset.

  • DataType: Specify whether the variable specifically contains Numeric, Categorical, Ordinal, Date, or Free Text data.

  • PermittedValues: If the variable is not numeric or free text, list all possible values here (e.g. "Male, Female" for the variable "Sex"). If the variable is numeric or free text, can leave this blank (use MinimumValue and MaximumValue columns).

  • MinimumValue: If the variable is numeric, list the Minimum possible value. For example, if you expect a variable to be between 0-100, write 0 for MinimumValue. If there is no minimum value, leave this blank.

  • MaximumValue: If the variable is numeric, list the Maximum possible value. For example, if you expect a variable to be between 0-100, write 100 for MaximumValue. If there is no maximum value, leave this blank.

  • Comments: Additional notes such as exclusion criteria, reasons for special values, etc.

VariableName, Title, and Description are always required and cannot be left blank for data dictionary upload; every row in the data dictionary must have a VariableName, Title, and Description. The other columns are optional for upload, but are required for dataset publication.

Last updated