The goal of this task is to design and implement a framework that appears to the execution environment as one application and that runs concurrently a number of EC-EARTH simulations as ensemble members. It seems crucial to define the scientific cause of the ensemble approach and consider possible technical solutions in the context of the targeted PRACE systems.
It should be clarified that, in climate science, the result of a single simulation run provides (only) a fractional answer. In order to obtain a scientificly sound result, an ensemble of simulations with properly defined variations is needed. Thus, in the context of HPC usage, the completition of multiple simulations should be considered as one application. This type of application demands a large amount of computational resources, however, it comprises a level of nearly perfect parallelism. To make ensemble applications feasible, some data handling capabilities are needed, which are not included in the current model implementations.
General Aspects¶Level of ensemble member control:
- Wrapper script
- MPI rank
Autosubmit (IC3)¶Autosubmit is a suite of Python scripts that manages ensemble runs as a sequence of (inter-dependend) jobs. It handles multi-stage workflow (e.g. pre/post processing) and includes live monitoring.
- Existing software. Seems to do what's needed. Some(?) experience.
- Needs Python
- Handles ensemble members as individual jobs (in the meaning of a queueing system). May not be what's needed to convince the system operators/evaluation board.
OASIS Ensemble Mode¶That's just an idea, but couldn't OASIS be able to handle separate ensemble members in a similar way as the pseudo parallel mode?
- True one-application-approach (one mpirun for all members)
- If done carefully, the model components could run un-modified
- Seems a bit of design and implementation work ...
Most of the year 2011 will be dedicated to requirement specification and design considerations. Evaluation of the Autosubmit tool and possibly experimental modifications can be undertaken simultaneously. Implementation work is not expected before late 2011.