Project

General

Profile

Quality Level 3

Introduction

Model output archives such as the IPCC and CMIP archives enable scientists to write papers based on runs done by others. We propose a method by which credit can properly be assigned for the teams that perform model runs. The intent is to create a publication of climate model data entities related to a defined citation for model output. The granularity of climate model data entities which is suitable for scientific literature seems to be on the level of experiments. This is the level of granularity which has been successfully implemented in a publication scheme by the World Data Centre for Climate (WDC Climate) at the DKRZ (German Climate Computing Centre).

A DOI (ISO Standard 26324:2012 - http://www.iso.org/iso/pressrelease.htm?refid=Ref1561 ) is to be assigned to the CMIP5 data entities, which points to a page hosted at WDCC (and to be transitioned to the IPCC data distribution centre), which will include essential citation information along with a link to more detailed metadata information (CIM repository hosted at the British Atmospheric Data Centre - BADC). Data access is possible via links from that page into the Earth System Grid Federation (ESGF) Gateway at PCMDI.

Data DOI implications

  • long term availability of data
  • long term availability of metadata
  • open access to data and metadata
  • published data entities are fixed and no longer a matter of change
  • DataCite DOI metadata (doi:10.5438/0009), i.e. data identifiers (data citations), data format(s) and data size are fixed

Criteria for QC L3/DOI publication

0. Precondition
  • Quality Control Level 2 is passed: Quality information available in CERA (quality.specification) - QC L2 assigned

Technical Quality Control passed (WDC Climate):

1. General Checks
a. number of datasets > 0
b. no hidden datasets
c. dataset size (CERA external_pointer) != 0 and size (CERA distribution) != 0
d. dataset size (CERA external_pointer) = size (CERA distribution)

2. CERA Dataset / Variable checks
a. number of chunks (CERA external_pointer) = number of files
b. format of chunks (CERA distribution) = file format
c. CF standard names (CERA topic) = file
c. Units (CERA topic) = file
d. spatial coverage: longitude, latitude, height/depth/level (CERA spatial_coverage) = file coverage
e. temporal coverage (CERA temporal_coverage) = file coverage = CERA blob_meta information

3. CERA external_pointer / file checks / QCDB type='FILE’
(if no entry in QCDB: no check against QCDB)
a. file name / DRS_id (CERA external_pointer) = netcdf file = QCDB file id
b. tracking_id (CERA external_pointer) = netcdf file = QCDB file id
c. checksum (CERA external_pointer) = netcdf file = QCDB checksum
d. file size (CERA external_pointer) = netcdf file = QCDB size
e. temporal_coverage (CERA external_pointer) = CERA (blob_meta) = netcdf file
f. data access (CERA download) of first and last record of variable time series (CERA dataset)

Data-Metadata cross-checks (CERA- CIM simulationRun):
(if no entry in CIM repository: no check against CIM simulationRun)
- model_id = reference of type 'model’ and externalID of type 'QN_DRS’ (CIM simulationRun)
- institute_id = abbreviation of contact with role 'contact’ or of any other specified organisation contact type and externalID of type 'QN_DRS’ (CIM simulationRun)
- experiment_id = name without leading number of reference of type 'experiment’ and externalID of type 'QN_DRS’ (CIM simulationRun)
- ensembleMember_id = ensemble member reference (CIM simulationRun externalID of type='DRS_CMIP5_ensembleType’)
- minimum and maximum time netcdf file = startPoint and endPoint (CIM simulationRun) = start and end time (CERA coverage)
- number of ensemble members = number of ensemble members (CIM ensemble)
- name of ensemble members = name of ensemble members (CIM ensemble/ensembleMember)

CERA Dataset DOI / CIM cross-checks:
- title (DOI / CIM simulationRun longname)
- authorsList (DOI / CIM simulationRun authorsList)
- contacts (DOI / CIM simulationRun contacts)

4. Scientific or Semantic Quality Assurance passed (data author)
Data and metadata are approved by author. Checks include:
a. check general metadata information, including approval of CMIP5 questionnaire metadata
b. check list of authors
c. check scientific contact
d. check list of contributors
e. check relations to other data entities or references to papers
f. check of data coverage
g. insert for additional scientific quality checks and its results
h. submission of SQA report to DataCite DOI Publication Agent by author

Scientific or Semantic Quality Assurance (SQA)

Quality assurance of the content of climate model data lies mainly in the responsibility of the data authors. Quality assurance is documented in the metadata and the assigned quality flag in the database system is “approved by author”. Metadata is checked by the author and the SQA checks can be described and their results documented. Finally, the author approves that the existing metadata is checked and the documented data is checked and is ready to be DataCite DOI published.

Technical Quality Assurance (TQA)

Technical Quality Control of metadata and data was developed and is performed by the WDC Climate. CERA is the database of the WDC Climate and CIM for the metadata repository storing the questionnaire metadata. The passing of some of the checks is required (numbers 1 to 3 of above criteria catalogue Qc_l3), where the results of others are logged or documented (unnumbered warnings of above criteria catalogue). TQA is carried out by the DataCite DOI publication agency “WDC Climate”. The checks include double-checks of the data and cross-checks between data and metadata. In case of found inconsistencies the data author or the data node manager is contacted for clarification.

DataCite DOI Publication for ESGF data

After receiving the author’s approval and passing the TQA checks the DataCite DOI data publication process follows consisting of the following main steps:

  1. The results of the QC L3 checks are documented in the WDCC and for CMIP5 in the CIM metadata repository.
  2. CMIP data gets visible in the IPCC DDC (IPCC Data Distribution Centre)
  3. Creation of DataCite DOI metadata (metadata for publication of electronic media according to ISO 690-2): A detailed definition of the DataCite DOI metadata profile can be obtained from doi:10.5438/0009 and the corresponding XML schema is located at doi:10.5438/0008.
  4. DataCite metadata registration via the DataCite API, i.e. inserting the citation metadata in the DataCite catalog
  5. URL for the landing page assigned to registered DOI via the DataCite API, i.e. registration of the DOI in the IDF DOI handle system
  6. Notification of the corresponding data author about the finalization of the DOI data publication process.
  7. Additional data dissemination via DKRZ’s ESGF data node
  8. Long-term accessibility of data and metadata is established together with a permanent data citation using the DataCite DOI.