3. Create a Parallel Cluster and run CMAQv5.3.3#
3.1. Why might I need to use ParallelCluster?#
The AWS ParallelCluster may be configured to be the equivalent of a High Performance Computing (HPC) environment, including using job schedulers such as Slurm, running on multiple nodes using code compiled with Message Passing Interface (MPI), and reading and writing output to a high performance, low latency shared disk. The advantage of using the AWS ParallelCluster command line interface is that the compute nodes can be easily scaled up or down to match the compute requirements of a given simulation. In addition, the user can reduce costs by using Spot instances rather than On-Demand for the compute nodes. ParallelCluster also supports submitting multiple jobs to the job submission queue.
Our goal is make this user guide to running CMAQ on a ParallelCluster as helpful and user-friendly as possible. Any feedback is both welcome and appreciated.
Additional information on AWS ParallelCluster:
AWS ParallelCluster documentation
AWS ParallelCluster training video
- 3.1.1. Introductory Tutorial
- Step by Step Instructions to Build a Demo ParallelCluster.
- Establish Identity and Permissions
- Configure a Demo Cluster
- Create a Demo Cluster
- Login and Examine Cluster
- SSH into the cluster
- Check what modules are available on the ParallelCluster
- Check what version of the compiler is available
- Check what version of openmpi is available
- Verify that Slurm is available (if slurm is not available, then you may need to try a different OS)
- Do not install sofware on this demo cluster
- Exit the cluster
- SSH into the cluster
- Delete the Demo Cluster
- 3.1.2. CMAQv5.3.3 Intermediate Tutorial
- Use ParallelCluster pre-installed with software and data.
- Create CMAQ ParallelCluster with software/data pre-installed
- Log into the new cluster
- Change shell to use tcsh
- Verify Software
- Verify Input Data
- Examine CMAQ Run Scripts
- Submit Job to Slurm Queue
- Check status of run
- Successfully started run
- Once the job is successfully running
- If you repeatedly see that the job is not successfully provisioned, cancel the job.
- Try submitting a smaller job to the queue.
- Check status of run
- Check to view any errors in the log on the parallel cluster
- if the job will not run using SPOT pricing, then update the compute nodes to use ONDEMAND pricing
- Submit a new job using the updated ondemand compute nodes
- Submit a 72 pe job 2 nodes x 36 cpus
- Submit a minimum of 2 benchmark runs
- 3.1.3. CMAQv5.3.3 Parallel Cluster Benchmark on HPC6a-48xlarge with EBS and Lustre (optional)
- Use ParallelCluster pre-installed with CMAQv5.3.3 software and 12US2 Benchmark
- Create CMAQ ParallelCluster with software/data pre-installed
- Log into the new cluster
- Resize the EBS Volume
- Change shell to use tcsh
- Verify Software
- Verify Input Data
- Examine CMAQ Run Scripts
- To run on the EBS Volume a code modification is required.
- Build the code by running the makefile
- Submit Job to Slurm Queue to run CMAQ on Lustre
- Submit a run script to run on the EBS volume
- Modify YAML and then Update Parallel Cluster.
- Submit a minimum of 2 benchmark runs
- upgrade pcluster version to try Persistent 2 Lustre Filesystem
- Query the stack formation log messages