CMAQv5.3.3 on AWS Tutorials (Single VM and ParallelCluster)#
Scripts and code to run CMAQ on Single Virtural Machine or Parallel Cluster (multiple VMs).
To obtain this code use the following command.#
git clone -b CMAQv5.3.3 https://github.com/CMASCenter/pcluster-cmaq pcluster-cmaq-533
Warning
This documentation is under continuous development. This documentation is under continuous development latest version is available here: CMAQ on AWS Tutorials Latest Version
Overview#
This document provides tutorials and information on how users can create High Performance Computers (Single Virtual Machine (VM) or ParallelCluster) on Amazon Web Service (AWS) using the AWS Command Line Interface. The tutorials are aimed at users with cloud computing experience that are already familiar with Amazon Web Service (AWS). For those with no cloud computing experience we recommend reviewing the Additional Resources listed in chapter 16 of this document.
Format of this documentation#
This document provides several hands-on tutorials that are designed to be read in order.
The Introductory Tutorial will walk you through creating a demo ParallelCluster. You will learn how to set up your AWS Identity and Access Management Roles, configure and create a demo cluster, and exit and delete the cluster.
Single VM Tutorials#
The Single VM Intermediate Tutorial will show you how to create a single virtual machine using an AMI that has the software and data pre-loaded and give instructions for creating the virtual machine using ec2 instances that have different number of cores, and are matched to the benchmark domain. The Single VM Advanced tutorial will show you how to install the CMAQv5.3.3 software and libraries, and how to create custom environment modules.
Parallel Cluster Tutorials#
The CMAQv5.3.3 Parallel Cluster Intermediate chapter will show you how to run CMAQv5.3.3 using the 12US2 benchmark. The CMAQv5.3.3 Advanced Tutorial explains how to scale the ParallelCluster for larger compute jobs and install CMAQv5.3.3 and required libraries from scratch on the cloud. The Chapter “Benchmark on HPC6a-48xlarge with EBS and Lustre” uses CMAQv5.3.3 on advanced HPC6a compute nodes that are only available in the us-east-2 region.
The remaining sections provide instructions on post-processing CMAQ output, comparing output and runtimes from multiple simulations, and copying output from ParallelCluster to an AWS Simple Storage Service (S3) bucket.
Why might I need to use ParallelCluster?#
The AWS ParallelCluster may be configured to be the equivalent of a High Performance Computing (HPC) environment, including using job schedulers such as Slurm, running on multiple nodes using code compiled with Message Passing Interface (MPI), and reading and writing output to a high performance, low latency shared disk. The advantage of using the AWS ParallelCluster command line interface is that the compute nodes can be easily scaled up or down to match the compute requirements of a given simulation. In addition, the user can reduce costs by using Spot instances rather than On-Demand for the compute nodes. ParallelCluster also supports submitting multiple jobs to the job submission queue.
Our goal is make this user guide to running CMAQ on a ParallelCluster as helpful and user-friendly as possible. Any feedback is both welcome and appreciated.
Additional information on AWS ParallelCluster:
AWS ParallelCluster documentation
AWS ParallelCluster training video
- 1. System Requirements
- 2. Create Single VM and run CMAQv5.3.3 (software pre-installed)
- 3. Create a Parallel Cluster and run CMAQv5.3.3
- 4. Performance and Cost Optimization
- 4.1. Right-sizing Compute Nodes for the ParallelCluster Configuration
- 4.2. An explanation of why a scaling analysis is required for Multinode or Parallel MPI Codes
- 4.3. Slurm Compute Node Provisioning
- 4.4. Spot versus On-Demand Pricing
- 4.5. Benchmark Timings for CMAQv5.3.3 12US2 Benchmark
- 4.6. Benchmark Scaling Plots for CMAQv5.3.3 12US2 Benchmark
- 4.7. Cost Information
- 4.8. Recommended Workflow for extending to annual run
- 4.9. Side by Side Comparison of the information in the log files for 12x9 pe run compared to 9x12 pe run.
- 5. Developer Guide to install and run CMAQv5.33 on Single VM or Parallel Cluster
- 6. Post-process and qa
- 7. Logout and Delete ParallelCluster
- 8. Additional Resources
- 8.1. FAQ
- 8.2. Free Training
- 8.3. Another workshop to learn the AWS CLI 3.0
- 8.4. Youtube video
- 8.5. Intro to AWS for HPC People - HPC Tech Shorts
- 8.6. Benchmarking
- 8.7. Help Resources for CMAQ
- 8.8. Computing on the Cloud References
- 8.9. Resources from AWS for diagnosing issues with running the Parallel Cluster
- 8.10. Instructions on how to create Parallel Cluster Amazon Machine Image (AMI) from the command line
- 8.11. ParallelCluster Update
- 8.12. Use Elastic Fabric Adapter/Elastic Network Adapter for better performance
- 8.13. VPC Management
- 8.14. Using Cost Allocation Tags with ParallelCluster
- 9. Future Work
- 10. Contribute to this Tutorial