Project

General

Profile

Quality Control Level 2

Introduction

The QC Level 2 of CMIP5 is defined as subjective data and metadata controls (CMIP5 Quality Control Document). The QC Level 2 checks consist of separate checks on data and metadata. Where metadata checks are performed on all delivered data, data checks are restricted to replicated part of the CMIP5 requested data. QC L2 checks on metadata are designed and performed at the British Atmospheric Data Centre (BADC ). QC L2 checks on data and the QC L2 checker tool were developed at the World Data Center for Climate (WDCC ). The data checks are carried out by all three Earth System Grid Federation (ESFG ) members: PCMDI, BADC and WDCC/DKRZ.

Criteria for QC L2 assignment

QC Level 2 is assigned as soon as QC metadata checks and QC data checks are passed.
The assignment rules are described in QC L2 assignment criteria.
A documenting list of QC tool exceptions is at: list of exceptions .

QC Level 2 on Metadata

The metadata checks consist of subjective controls of the metadata entered via the metafor questionnaire. These QC L2 checks on data were designed and are conducted by the BADC (see also the recently set up ES-DOC project at http://www.earthsystemcog.org/projects/cmip5qc/ ).

QC Level 2 on Data

The checks are based on WDCC's QC Tool (sometimes referred as statistical QC Tool or just QC Tool). Checks are performed at PCMDI, BADC and WDCC/DKRZ. In case of problems with this QC Tool or a failure on a larger part of the data a simple record threshold check (also referred as range check or record check) is applied for these datasets.

The checks and used tools are described in QC Level 2 Tools.

Installation Instructions

Quality results are in netcdf-3 format.

NOTE: For usage of Oracle QCDB set environment variable QCDB_TYPE=ORACLE

Contacts

Run QC

1. Edit the qc.conf file using the example in qcWrapper main directory

  • Options in the upper section have to be set. They are marked as mandatory.
  • Use db 'test1' for testing, 'qc1' for output1 (replicated) and 'qc2' for output2 (requested and not replicated) for production runs.
  • For old DRS-Syntax (production DRS) use QCDB_DRS=drsold.txt. If not specified new DRS syntax (ESGF data node directory structure) is used according to CMIP5 DRS syntax reference: http://cmip-pcmdi.llnl.gov/cmip/doc/cmip5_data_reference_syntax.pdf.
  • Note: By default: the last ESG published version is checked only. If you need to check older versions specify them by the qcWrapper.py option --version=v<date> (examples: --version='v20': check all available versions; --version='v20110101': check only the version v20110101). Configure file key: QCDB_VERSION

2. Run QC with the wrapper (statistical QC Tool):

    qcWrapper.py --configure=<conffile>
 

2a. If unchecked variables remain: Run QC record checks for data not checkable by statistical QC Tool:

    qcWrapper.py --configure=<conffile> --rcheck
  

3. Check the QC results and export CIM quality documents [and export configure and log files] with:

    qcDbselect.py --postgresdb=<database> --experiment=<DRS name of experiment> --cim [--log=<download dir for logfiles>] [--quiet]
 
  • Errorflags (as specified in exception list ):
    • 'FATAL' (severe error), 'ERROR', 'WARNING', 'INFO' (successful qc check)
    • 'UNKNOWN': error could not be classified because of a missing error number in the qc tool output or in the exception list
    • 'UNCHECKED': files were not checked by the qc tool
  • Output: QCL2_statistics_<drs>.txt - exception list of all atomic datasets ordered by exception code
    qcDbselect_<drs>.log - list of detailed error messages for every atomic dataset (as in stdout)
  • Option --cim: exports CIM quality documents for esg datasets (publication units) and experiments and writes them to directory specified in --log (if not given to current directory)
  • Option --log: download of all input and log files in the specified directory
Notes:
  1. For more analyses and checking options please see examples at qcDbselect.py
  2. For more information on the exception messages: list of exceptions and on QC L2 assignment criteria.

4. For a data update run the wrapper again:

    qcWrapper.py --configure=<conffile>  or
    qcWrapper.py --configure=<conffile> --rcheck
 
Notes:
  1. Data flagged as 'QCL2_excluded' is not updated! (cf. qcExclude.py)
  2. It is important that you don't delete the QC_DATA_ROOT directory because of possible updates of data. If you delete these QC results, the QC tool will start checking the old (already checked) data again.
  3. In general deletion should not be necessary. Use qcDbdelete.py very carefully in case of the calculation of a wrong experiment or alike. It is safer to use qcExclude.py in such cases.

5. Recommended option: Plot netcdf results with xmgrace

    qcDbselect.py --postgresdb=<database> --experiment=<DRS name of experiment> --outcera=<xmgrace template>
 
  • xmgrace template: QCWrapper/table/qc_xmgrace.par
  • netcdf-3 qc results required !

Assignment of QC Level 2

6. Exclude ESG datasets from assignment (if necessary)

In general, QC L2 is assigned to all latest versions of every dataset present in the QC database. If one of these latest version datasets should not be part of the assignment, it has to be flagged as excluded.

Exclusion of a ESG dataset / publication unit by:

         qcExclude.py --postgresdb=<database> --esgds=<ESG dataset name> --qstatus=exclude

Example: qcExclude.py --postgresdb=test1 --qstatus=exclude 
                      --esgds=cmip5/output1/MPI-M/MPI-ESM-LR/sstClim/day/atmos/day/r1i1p1/v20111005

In case you have to include the ESG dataset again, just use the same command with --qstatus=include .

If only part of the data belonging to an ESG dataset is to be excluded from QC L2 assignment, the whole ESG dataset in version x has to be excluded, and a new ESG dataset version x+1 has to be published for the data part to be included in the QC L2 assignment. The QC has to be applied again, before QC L2 can be finished by qcAssignL2.py.

7. Create CIM documents for the experiment

    qcDbselect.py --postgresdb=<database> --experiment=<DRS name of experiment> --cim [--quiet]
 

8. Assignment of QC Level 2 - Update QCDB, ingest of QC result plot (created in 5. - recommended) and CIM metadata (created in 7., using option --cim; please do not rename the xml files! - mandatory) and sending CIM XML to CIM repository (using pubEsgdsResult.py):

    qcAssignL2.py --postgresdb=<database> --experiment=<DRS name of experiment> 
                  --cimxml=<cim quality EXPERIMENT of 7.> --plot=<plot.pdf of 5.> --comment=<comment on QC L2 assignment>

  • Note on result evaluation:
    Please check the QC L2 assignment criteria before QC L2 assignment. Run qcDbselect.py with option --outcera before to create the QC result plot (pdf) .
  • Notes on assignment procedure:
    • QC Level is assigned on DRS experiment level. It is thereby assigned to all atomic datasets of the highest version number belonging to this experiment. In case of newer Versions with QC results in the QCDB or you want to exclude an ESG dataset from the QC L2 assignment, please flag them as 'QCL2_excluded' (qcExclude.py) or delete them (qcDbdelete.py) before assignment or send us a request to do it for you.
    • ESG datasets have to be complete, therefore you cannot exclude single atomic datasets from the experiment. In such a case you have to publish the ESG dataset under a newer version, which excludes these atomic datasets and run the QC again on this version / part of data before assignment of QC Level 2.
    • Any data exclusion has to be agreed by the modeling center.
    • Comment your QC assignment procedure using the option 'comment' of qcAssign.py, e.g. comment on data exclusion or ignored exceptions. This information is needed during QC L3 / DOI publication procedure.
    • We recommend to provide a qc result plot along with your qc results (option '--plot')

Flag single ESG Datasets as QC L2 passed

In case you want to flag single ESG datasets / publication units as "QC L2 passed" before the overall QC L2 process for the whole experiment is finished, you could do so by:

1. Extract CIM QC documents from QCDB

a. CIM QC documents for all ESG datasets

qcDbselect.py --postgresdb=<database> --experiment=<DRS name of experiment> --cim

b. CIM QC document for a single ESG dataset:
qcDbselect.py --postgresdb=<database> --esgds=<DRS name of ESG dataset> --cim

2. Send CIM QC document for ESG dataset to CIM QC repository

pubEsgdsResult.py --postgresdb=<database> --xml=<cim quality ESG DATASET of 1.>

List status of QC L2 (QC_STATUS)

1. Check for experiment QC L2 status:

qcDbselect.py --postgresdb=<database> --list 

Output example:
# EXPERIMENT     QC_STATUS COMMENT       ERR_STATUS      QC_DATE         QC_CONDUCTOR    QC_TOOL_VERSION
cmip5/output/MPI-M/ECHAM6-MPIOM-TR/amip QCL2_started  ---       ERROR   2011-01-14 15:04:17     Martina Stockhause:martina.stockhause@zmaw.de   QC Tool in svn revision: 2679 by author: heinz-dieter on date: 2011-01-13 14:05:41 +0100 (Thu, 13 Jan 2011)
cmip5/output/MPI-M/ECHAM6-MPIOM-TR/historical   QCL2_assigned QC Level2 assigned at Wed Jan 12 08:35:12 CET 2011        ERROR   2010-11-08 15:21:39     Martina Stockhause:martina.stockhause@zmaw.de     QC Tool in svn revision: 2571 by author: heinz-dieter on date: 2010-12-02 14:22:14 +0100 (Thu, 02 Dec 2010)

2. List history for a given experiment

qcDbselect.py --postgresdb=<database> --list --experiment=<drs experiment>

3. Check status of atomic datasets of an ESG dataset /publication unit

qcDbselect.py --postgresdb=<database> --esgds=<drs esg publication unit>

QC_STATUS column in output

QC_STATUS Meaning Comment
QCL2_started QC L2 run is finished and data ingested in QCDB
QCL2_excluded Datasets excluded from QC L2 assigned on ESG dataset / publication unit level
QCL2_assigned QC L2 is assigned by ESGF partner QC L2 results are write protected, QC L3 / DOI publication process has started
QCL1 QC L3 failed or problems with the DRS experiment are found before DOI assignment,
which require new data delivery and restart of the whole QC process
QCL1 is the first QC level,
which is implicitly set during ESG publisher application
QCL3_assigned QC L3 is assigned and DOI published by WDCC as publication agency of DataCite Data and Metadata are persistent and the data is citable by its DOI.

Problems

1. QCDB errors or internet connection problems:

  • Make sure you send the IP of the machine on which the QC is installed to us. In case of error occurrence after finished QC Tool, run the qcWrapper again with the additional option --noqc to skip the QC tool checks for new data.

2. Statistical QC tool abort:

  • Set additional options back to default values of the configure file and enable the option IGNORE_ERROR. Run qcWrapper again.
  • no error message available:
    • Check QCDB_LOGFILE for stderr messages from the QC tool.
    • Start the QC tool standalone by getting the call from the QCDB_LOGFILE (grep 'qcManager') and adding the option -E_DEBUG_C
  • session log with error message available:
    • Check it for errors in the processing of the last listed data file
  • session log without error message:
    • Start the QC tool standalone by getting the call from the QCDB_LOGFILE (grep 'qcManager') and adding the option -E_DEBUG_C

3. Statistical QC tool runs without writing any result data

  • Cancel qcWrapper.py and try again with enabled option IGNORE_ERROR in the configure file.
  • Enable option NUM_EXEC_THREADS in the configure file and set it to a value > 1 and try again.
  • Delete output branch for this experiment and try again.
  • Update your QC tool and try again.
  • Open a request in redmine https://redmine.dkrz.de/collaboration/projects/cmip5-qc/issues/new or contact martina.stockhauseATzmaw.de and send the QCDB_LOGFILE.

4. QC Wrapper reports an error though I exchanged the data

  • Delete existing error and warning files in the output directory at dataset level and try again.