2.5. Use ParallelCluster with Software and Data pre-installed for c6a.48xlarge#
Step by step instructions to configuring and running a ParallelCluster for the CMAQ 12US1 benchmark
Notice
The CMAQ libraries were installed using the gcc compiler on c6a.large.
2.5.1. Create CMAQ Cluster using SPOT pricing#
Use an existing yaml file from the git repo to create a ParallelCluster
cd /your/local/machine/install/path/
Use a configuration file from the github repo that was cloned to your local machine
git clone -b main https://github.com/CMASCenter/pcluster-cmaq.git pcluster-cmaq
Edit the c6a.large-48xlarge.ebs_unencrypted_installed_public_ubuntu2004.fsx_import.yaml
cd pcluster-cmaq/yaml
vi c6a.large-48xlarge.ebs_unencrypted_installed_public_ubuntu2004.fsx_import.yaml
Note
the c6a.large-48xlarge*.yaml is configured to use SPOT instance pricing for the compute nodes.
the c6a.large-48xlarge*.yaml is configured to the the c6a-48xlarge as the compute node, with up to 10 compute nodes, specified by MaxCount: 10.
the c6a.large-48xlarge*.yaml is configured to disable multithreading (This option restricts the computing to CPUS rather than allowing the use of all virtual CPUS. (192 virtual cpus reduced to 96 cpus)
the c6a.large-48xlarge*.yaml is configured to enable the setting of a placement group to allow low inter-node latency
the c6a.large-48xlarge*.yaml is configured to enables the elastic fabric adapter
given this yaml configuration, the maximum number of PEs that could be used to run CMAQ is 96 cpus x 10 = 960, the max settings for NPCOL, NPROW is NPCOL = 24, NPROW = 40 or NPCOL=40, NPROW=24 in the CMAQ run script. Note: CMAQ does not scale well beyond 2-3 compute nodes.
Replace the key pair and subnet ID in the c6a.large-48xlarge*.yaml file with the values created when you configured the demo cluster
Region: us-east-1
Image:
Os: ubuntu2004
HeadNode:
InstanceType: c6a.large
Networking:
SubnetId: subnet-xx-xx-xx << replace
DisableSimultaneousMultithreading: true
Ssh:
KeyName: your_key << replace
LocalStorage:
RootVolume:
Encrypted: true
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: queue1
CapacityType: SPOT
Networking:
SubnetIds:
- subnet-xx-xx-x x << replace
PlacementGroup:
Enabled: true
ComputeResources:
- Name: compute-resource-1
InstanceType: c6a.48xlarge
MinCount: 0
MaxCount: 10
DisableSimultaneousMultithreading: true
Efa:
Enabled: true
GdrSupport: false
SharedStorage:
- MountDir: /shared
Name: ebs-shared
StorageType: Ebs
EbsSettings:
Encrypted: false
SnapshotId: snap-05a36eeec1f5267bd
- MountDir: /fsx
Name: name2
StorageType: FsxLustre
FsxLustreSettings:
StorageCapacity: 1200
ImportPath: s3://cmas-cmaq/
The Yaml file for the c6a.large-48xlarge contains the settings as shown in the following diagram.
Figure 1. Diagram of YAML file used to configure a ParallelCluster with a c6a.large head node and c6a.48xlarge compute nodes using SPOT pricing
(to do!)
2.5.2. Create the c6a.48xlarge pcluster#
pcluster create-cluster --cluster-configuration c6a.large-48xlarge.ebs_unencrypted_installed_public_ubuntu2004.fsx_import.yaml --cluster-name cmaq --region us-east-1
Check on status of cluster
pcluster describe-cluster --region=us-east-1 --cluster-name cmaq
After 5-10 minutes, you see the following status: “clusterStatus”: “CREATE_COMPLETE”
If the cluster fails to start, use the following command to check for an error
pcluster get-cluster-stack-events --cluster-name cmaq --region us-east-1 --query 'events[?resourceStatus==
CREATE_FAILED]'
Proceed to the next chapter to login and run CMAQv5.4 on the Parallel Cluster.