Data Quality Checks for DOI
REFERENCE: describes the automated and manual Data Quality Checks performed by the ODC Data Team after a DOI Request has been made
Members of the ODC Data Team (i.e., Curators) complete a comprehensive data quality review of every dataset submitted for publication. Once a DOI request has been made, the Curators use a series of automated and manual checks to review the Subject Data File, Data Dictionary, and Metadata and confirm that the dataset formatting and contents meet the minimum requirements set out in the ODC Publishing Standard. These checks ensure that the data is Interoperable and Reusable with other datasets.
Automated Data Quality Checks
The automated quality checks are performed during the dataset upload process, as automatic file validations. These ensure a baseline level of quality to all private and public datasets in the ODC. The checks performed during the upload process occur without human oversight since data upload is handled privately within the account of the data owner. If a file can be uploaded, it has successfully met the Minimum Upload Specifications and passed the automated data quality checks.
GO TO: Minimum Upload Specifications
Subject Data Files
The file must pass the following automatic validations:
Source Checks Purpose: to determine if the file can be read and uploaded
Structure Checks Purpose: to determine if the Subject Data File has been formatted correctly
Other
Data Dictionary Files
The file must pass the following automatic validations:
Source Checks Purpose: to determine if the file can be read and uploaded
Schema Checks Purpose: to determine if the Data Dictionary File has been formatted correctly AND identify any conflicts between it and the the Subject Data File
Semi-Automated
The ODC Data Team uses an open source tool developed in-house to automate the evaluation of many of the technical specifications related to structure and format. This tool, the ODC Data Quality App (ODCdqa), has been made freely available to the public. The ODC strongly recommends all data producers run their dataset through the ODCdqa BEFORE submitting a DOI Request. Using the ODCdqa will help identify potential errors that can be resolved before submission, and reduce delays in the review and publication process.
Structure Checks Purpose: to determine if the Subject Data File has been formatted correctly
Schema Checks Purpose: to determine if the Data Dictionary File has been formatted correctly AND identify any conflicts between it and the the Subject Data File
Manual Checks
The final step of the data quality checks is performed manually by the ODC Data Team Curators. They read and review the contents of the entire dataset to ensure:
Last updated
Was this helpful?