Data Quality Checks for DOI

REFERENCE: describes the automated and manual Data Quality Checks performed by the ODC Data Team after a DOI Request has been made

Members of the ODC Data Team (i.e., Curators) complete a comprehensive data quality review of every dataset submitted for publication. Once a DOI request has been made, the Curators use a series of automated and manual checks to review the Subject Data File, Data Dictionary, and Metadata and confirm that the dataset formatting and contents meet the minimum requirements set out in the ODC Publishing Standard. These checks ensure that the data is Interoperable and Reusable with other datasets.

Automated Data Quality Checks

The automated quality checks are performed during the dataset upload process, as automatic file validations. These ensure a baseline level of quality to all private and public datasets in the ODC. The checks performed during the upload process occur without human oversight since data upload is handled privately within the account of the data owner. If a file can be uploaded, it has successfully met the Minimum Upload Specifications and passed the automated data quality checks.

GO TO: Minimum Upload Specifications

Subject Data Files

The file must pass the following automatic validations:

Source Checks Purpose: to determine if the file can be read and uploaded

Structure Checks Purpose: to determine if the Subject Data File has been formatted correctly

Other

Data Dictionary Files

The file must pass the following automatic validations:

Source Checks Purpose: to determine if the file can be read and uploaded

Schema Checks Purpose: to determine if the Data Dictionary File has been formatted correctly AND identify any conflicts between it and the the Subject Data File

Semi-Automated

The ODC Data Team uses an open source tool developed in-house to automate the evaluation of many of the technical specifications related to structure and format. This tool, the ODC Data Quality App (ODCdqa), has been made freely available to the public. The ODC strongly recommends all data producers run their dataset through the ODCdqa BEFORE submitting a DOI Request. Using the ODCdqa will help identify potential errors that can be resolved before submission, and reduce delays in the review and publication process.

Structure Checks Purpose: to determine if the Subject Data File has been formatted correctly

Schema Checks Purpose: to determine if the Data Dictionary File has been formatted correctly AND identify any conflicts between it and the the Subject Data File

Manual Checks

The final step of the data quality checks is performed manually by the ODC Data Team Curators. They read and review the contents of the entire dataset to ensure:

Last updated

Was this helpful?