Data formatting specifications

ODC supports uploading spreadsheet data in .csv (comma-separated values) file format. You can save your data to .csv format using the most common spreadsheet software, such as Excel.

If you want to know more about .csv files check this out

The tidy format

ODC uses tidy formatting of a spreadsheet or tabular data. The basics are simple: each row is an observation, and each column is a measure, field, or variable. A tidy data format is a great way to make data shareable and understandable by humans and machines!

Want to know more about tidy data? Check this out!

The following two images illustrate how to create a .csv file with data organized in the tidy format

General considerations when formatting your data for ODC

  • Unique subject ID column: ODC is organized around subjects. One of your dataset columns must contain the subject or animal identifier (e.g. Subject_ID). This identifier should be unique for each subject. If subject_1 represents two different animals present in two different experiments, the identifier is not unique.

  • Columns as Variables: Each column represents a study parameter, outcome measure, field or variable

  • First row lists the Variable names: The first row of the dataset contains the name of the columns.

  • Subsequent rows as observations: Each row represents a single observation for a single subject.

    • A subject could have multiple rows. For example, your dataset might include multiple timepoints for each subject, in which case each row might represent a unique observation of a subject at a specific time point. The Subject ID column will help identify all the data for each specific subject in the dataset.

  • Column/Variable name requirements (based on best practices):

    • Keep variable names short. Avoid variable names longer than 64 characters.

    • Variable names must start with a letter.

    • Variable names should be intuitive (e.g. use β€œDate_Birth” instead of β€œDB”).

    • Avoid spaces in variable names. Use underscore (β€œ_”) instead.

    • Avoid special characters except underscores (β€œ_”) and periods (β€œ.”). If you must use special characters, verify the corresponding Variable and data are uploaded correctly.

  • No duplicated Variable names: Every column header must have a unique name. If two of your columns have the same name, you will receive an error during data upload.

  • Avoid the use commas: CSV files use commas to separate the contents of one cell from another. If you use commas in a cell, it may be read as a delimiter character (i.e. cell separator) which can lead to errors in data upload. If you must use commas, always double check that your data is uploaded correctly. Microsoft Excel can also save csv's in such a way to prevent misinterpretation of commas in your data. However, we generally recommend avoiding the use of commas in your dataset altogether.

Last updated