Common errors for Dataset and Data Dictionary

In ODC, the dataset and the data dictionary undergo quality checks for proper formatting (based on goodTables framework). These checks ensure that the data is Interoperable and Reusable with other datasets. Some of the quality checks are performed during the uploading of datasets, ensuring a minimal level of quality to all private and public datasets in the ODC-SCI. The check during the upload process is automatic without human oversight since data upload is handled privately within the account of the data owner. When data is released to the Community data space or submitted for publication, further checks will be conducted to ensure that the released or published dataset meets FAIR standards:

  • Source errors (Checked at upload): ODC-SCI can not read the data file. Possible reasons include:

    • The data file is not a *.csv. The ODC only accepts upload of *.csv data files.

    • Reserved special characters were used in the column headers (first row with the variable names). Check our recommendations for How to upload data.

  • Structure errors:

    • Blank-header (Checked at upload): There is a blank variable name. All cells in the header row (first row) must have a value.

    • Duplicate-header (Checked at upload): There are multiple columns with the same name. All column names must be unique.

    • Blank-row (Checked at upload): Rows must have at least one non-blank cell.

    • Duplicate-row: Rows can not be duplicated.

  • Schema errors: In ODC-SCI the schema is marked by the data dictionary. These errors reflect conflicts between the data dictionary and the dataset.

    • Extra-header: The dataset contains at least one variable name not defined in the data dictionary.

    • Missing-header: The dataset is missing at least one variable name defined in the data dictionary.

    • Missing-definition: The definition of a variable in the data dictionary is missing.

    • Required-constraint (Checked at upload): A required field for the dataset contains no values or is not assigned on the dataset. Currently the only required value in the datasets is the subject identifier. As ODC-SCI develops additional data standards, it is possible that more variables will be required on all datasets.

    • Value-constraint: The values of a variable should be equal to one of the permitted values enumerated in the data dictionary, or within the limits of the permitted values.

Last updated