QC Meeting (BADC 24th/25th May)¶
More complete notes can be found in the attached document(see below), but some of the key points are summarised in the following three flowcharts:
Overarching QC flow¶
Note there are three key changes from previous versions:
- note that the requested data is now expected to be 3.4 PB
- noting that replication in ESGF may not be uniform, and is dependent on QC level 2, but is not the same as the data movement necessary to actually do the QC level 2.
- explicit recognition that not all requested data is likely to be able to get a formal WDCC DOI. Even if the WDCC can store more data on tape than on spinning disk, the human workload is not likely to support it all in the mean time.
- So the red box indicates that we need to establish some more criteria as to how things get into qc-level 3. One obvious one is long-term interest potential (certainly true of any projections). Some criteria that BADC uses are available (pdf), but those criteria pre-date the concept of DOIS, and need updating at the BADC, let alone for use in this context).
- We now expect that storage-wise DKRZ can put more data through the DOI process than they have available spinning disk (they have considerably more tape reserves), the constraint will be the human time and resource capability to actually do it.
- Note that data that is not in the IPCC-DDC can still be linked from the IPCC-DDC (under new rules being promulgated). To be explicit, data that reaches the bottom right hand box, will appear in a different way on the IPCC-DDC, but both will be visible.
- Need to come up with a consistent "informal citing rules", we use the same structure and rules we are using for the DOI, except that we have to replace with a URL (which would then ideally be the same format as the data url on the DOI page).
QC level 2 detail¶
This diagram shows the expected workflow for carrying out QC at BADC, DKRZ and PCMDI (who will do the bulk of this). Key points to note are that we recognise the importance of publication units, and the necessity to publish qc level 2 on both those and the experiment descriptions.The two orange boxes will result in the creation of CIM qc documents, which will be visible in the [[http://qc.cmip5.ceda.ac.uk|qcportal]] at BADC.
- The first one will result in a CIM document which can be used to trigger replication into the wider ESGF.
- The second one will result in qc documents which will be discoverable at the experiment level, and will trigger movement to qc level3.
Both sets of qc documents will need to be linked in the ESGF gateways.Major action items:
- BADC to expedite the two pieces of code required to publish CIM documents
- BADC to operationalise the qcportal
- DKRZ to support a view on the qc database so that it is possible for those doing qc level 2 to evaluate only the logfiles and plots of interest to a specific database (and for them to be loadable into the CIM via the BADC tooling).
- NCAR to support harvest and display of qc documents. (This could be direct harvest and display of xml artefacts, which could then be easily displayed - the hard part would be the linking algorithm, which is of course DRS based).
- Leads to subsiduary requirement, in both questionnaire and qcportal feeds to make the DRS id information easily accessible.
QC level 3 detail¶
The key worry here is that the DOI needs to point to the correct experiment data page (including version) in the gateway at PCMDI, with a URL that can be constructed from the DRS information information (This is to say that it points to a set of specific versions of publication units) The gateway will have to support that!
If PCMDI don't have a copy of all the relevant data, the link should revert to DKRZ, who must have a copy of any data that has a DOI.
Likewise the extended metadata link should be to the metafor portal at BADC, but meanwhile wont be (and could even never be, with DKRZ taking it over under IS-ENES in some possible futures).
What artefacts are deployed to support the qc process?
- The QC database (at DKRZ)
- The QC portal (at BADC) at http://qc.cmip5.ceda.ac.uk
- The ESG gateways which harvest atom feeds from the QC portal
- The QC L2 checker tool and QC L2 manager support tool ("wrapper") at http://svn-mad.zmaw.de/svn/mad/Model/QualCheck (access restricted)
- The Atarrabi Publication System GUI for the QC L3 author approval step at http://cera-www.dkrz.de/atarrabi
Actions arising from QC Technical Meeting:¶
ACTION Redraw CMIP5 QC flow diagrams with additional detail.[BNL]
ACTION What URL is used to match the QC Metadata to the data? [MS]
ACTION need to resolve how DOI's will be issued. [MJ]
ACTION Make sure DOI information chain (harvested from Thredds) will not break if the Thredds profile changes. [SP]
ACTION Can CIM's be generated directly from the QCDB at DKRZ? [MS]
ACTION Produce ‘CIM comment upload code’ at end of meeting [BNL]
ACTION Find out why PCMDI are not using 'full' DRS [SP]
ACTION qcDbSelect -BADC to test the analysis functionality to produce a summary report and give timings to DKRZ. [KM]
ACTION BNL to modify and distribute QC/DOI flowchart diagram [BNL]
ACTION qcDbSelect -get screenshots sent to Jeff showing how to process the errors/warnings [MS]
ACTION Investigate atom content structure. [BNL]
ACTION Produce RunBADCqc2release.py [BNL]
See attached document "QC Technical Meeting at RAL.odt" for details. This contains the raw notes taken during the meeting as well as summary notes kindly provided by Martina.