In this page you will learn how to upload data to the ODC
We have a web app for checking the formatting standards of your pre-clinical dataset and data dictionary file. It is straightforward, and you can use it before uploading. This will save time if you want to publish your data!
Visit the ODC QC-EDA app to learn more!
There are three different types of files that you can upload to the ODC:
The dataset file in .csv format
A data dictionary file in .csv format
Supplementary files in .doc or .pdf
ODC is organized around datasets. So, we need to start by uploading a new dataset
This page will teach you how to upload a new dataset to the ODC
Check how to get your data ready
If you are going to request a DOI for a public release, check this first!
If your dataset was too large to upload, you should split your data file into smaller files, upload the first one and use the Append Rows workflow to upload each piece.
Log in ODC
Go to Upload a New Dataset in the left navigation bar
3. Verify that you are in the Lab that you want to be and click Next
4. Select your .csv file and enter the dataset information. The dataset name is an internal name for you and your lab to remember what the dataset is about! The dataset description helps to describe the content of the dataset
This is different of the information needed for a DOI. Check how to get a DOI
5. Preview the dataset and select subject ID column. The ODC organizes the data around subjects! Make sure you include a unique subject ID column
6. Upload!
It is possible that the page shows a little window with some messages, read them and choose whether you want to continue.
Congratulations! You uploaded a dataset!
Methodology files are great to attach narrative and protocol descriptions associated to a dataset. You can upload a methodology file in either .doc or .pdf form following the same process described to upload a data dictionary
Currently, the ODC do not require methodology documents for you to upload and publish your datasets. However, we encourage you to include them as they improve the interpretability and reusability of your data.
Some recommendations for compiling a methodology document:
If the dataset comes from a published paper (and hence has the methodology already published), you can provide the paper citation and link as part of the Methodology doc/pdf. Due to potential copyright issues, do not copy/paste the methods from any published papers.
If you have multiple protocols you wish to upload for the to the same dataset, we recommend combining them all into a single word document or pdf. The ODC only allows a single Methodology file to be attached to each dataset.
Supplementary files contain extra information that you may want to attach to a dataset. For instance, you may want to have an image of a graph saved in association with your data.
File Formats: .pdf, .csv, or images (.jpg, .png and .gif)
How: You can upload supplementary files following the same procedure for uploading a data dictionary
If you have not prepared a ODC compatible data dictionary, check how to get one ready
You can upload a data dictionary in ODC format any time while you are working with your data. The steps are simple!
Log into ODC, in your dashboard scroll down to the list of datasets. Identify the dataset you want to upload the data dictionary for and click on it
In the dataset view page, select "Upload Files", and click on Data Dictionary
In case the upload fails, please ensure that the dataset variable names exactly match (spelling, caps) the variable names in the data dictionary, see more common errors.
3. In the Data dictionary and methodology page you can follow the instructions and select the .csv file to upload your data dictionary
Done! You have associated a data dictionary or methodology file to a dataset
After uploading a data dictionary or methodology file, you can track the associated files to a dataset
A few common errors are flagged during the data upload process. If you hit an error on the data preview page after selecting your datafile, check your dataset for the following errors:
Your datafile contains a column with an empty header (i.e. missing a value in the first row). You might have forgotten to type a name for a column or have an empty column in the middle of your dataset. Make sure every column of your dataset has a column name in the first row.
Your datafile contains duplicate header names. Make sure every column must have a unique column header.
One of your column headers has a variable name that is longer than 64 characters. There is a 64 character limit for the column headers; make sure every header is 64 characters or less.
You used a comma in your datafile, and the .csv file is reading the comma as a separator between one entry and the next. This can be a source of error for header names and data entries. If this is the error, you will find rows that have different numbers of columns and/or entries that have been shifted to other columns.
You can fix this error by removing commas from the entries in your datafile.
A second way to fix this is to open your .csv in a spreadsheet software like Excel, correct for any shifted cells, and save as a new .csv through the program. Many software programs will naturally treat cells with commas as a fixed sequence of characters and won't treat those within-cell commas as separators between cells. This treatment will be maintained when you upload the new .csv to the ODC.
Your datafile might be too large. If your dataset is larger than 100Mb or has a total number of cells larger than 3,000,000, see the section "What if my dataset is too large" below.
When uploading a data dictionary file, there are a few possible errors that are flagged during the data dictionary upload process. You will be notified directly on the upload page; as a reference, the possible errors include:
Your data dictionary is not a .csv file. The data dictionary must be a .csv file for upload.
Your data dictionary is missing a column. The data dictionary must include all 9 required columns with exact spelling.
Your data dictionary does not include an entry for every column of your dataset. Every Variable (i.e. column header) in your dataset must have a respective row in your data dictionary. Please ensure that the dataset variable names exactly match (spelling, caps) the variable names in the data dictionary.
While not an error, if you have rows in your data dictionary that are missing values under Title or Description, this will flag a warning. This is not required for initial data dictionary upload, but during DOI request/dataset publication, we require that every row in your data dictionary have at least VariableName, Title, and Description columns filled out.
Importantly, every chunk of your dataset must have the same column headers in the first row of each .csv file. Make sure you (1) split your dataset along the rows and not along the columns and (2) include the column headers in every file.
If you encounter issues uploading your data or data dictionary, check this page. If you cannot identify the error or do not know how to solve it,
Your datafile is not a .csv file. For more information, see our
For more information about how to prepare your data for upload, see the
For more information on how to prepare your data dictionary for upload, see the section.
If your dataset is too large (e.g., your dataset is larger than 100Mb or has a total number of cells larger than 3,000,000), it can cause an error during the data upload process. The error can also happen when you try to replace your dataset using the function. In both cases, we recommend splitting up your dataset-to-be-uploaded into chunks with fewer rows and utilizing the append data option to add your dataset piece by piece.