Data Quality Checks for DOI

In ODC, the dataset and the data dictionary undergo quality checks for proper formatting. These checks ensure that the data is Interoperable and Reusable with other datasets. Some quality checks are performed during uploading datasets, ensuring minimal quality to all private and public datasets in the ODC. The check during the upload process is automatic without human oversight since data upload is handled privately within the account of the data owner. When data is released for publication, further checks will be conducted to ensure that the released dataset meets FAIR standards:

Source checks

(Checked at upload): ODC can not read the data file. Possible reasons include:

  • The data file is not a *.csv. The ODC only accepts the upload of *.csv data files.

  • Reserved special characters were used in the column headers (first row with the variable names). Check our recommendations for How to upload data.

Structure checks

  • Blank-header (Checked at upload): There is a blank variable name. All cells in the header row (first row) must have a value.

  • Duplicate-header (Checked at upload): Multiple columns with the same name. All column names must be unique.

  • Blank-row (Checked at upload): Rows must have at least one non-blank cell.

  • Duplicate-row: Rows can not be duplicated.

Schema checks

In ODC, the schema is marked by the data dictionary. These errors reflect conflicts between the data dictionary and the dataset.

  • Extra-header: The dataset contains at least one variable name not defined in the data dictionary.

  • Missing-header: The dataset is missing at least one variable name defined in the data dictionary.

  • Missing-definition: The definition of a variable in the data dictionary is missing.

  • Required-constraint (Checked at upload): A required field for the dataset contains no values or is not assigned to the dataset. Currently, the only required value in the datasets is the subject identifier. As ODC develops additional data standards, more variables may be required on all datasets.

  • Value-constraint: The values of a variable should be equal to one of the permitted values enumerated in the data dictionary or within the limits of the permitted values.

Last updated