CMAQv5.4 on Parallel Cluster Advanced Tutorial (optional)
4.2.1. Configure Parallel Cluster#
Use ParallelCluster with default Ubuntu OS using hpc7g.large head node and hpc7g.16xlarge compute node.
Step by step instructions to configuring and running a ParallelCluster for the CMAQ 12US1 benchmark with instructions to install the libraries and software.
Notice
Skip this tutorial if you successfully completed the CMAQv5.4 on Parallel Cluster Intermediate Tutorial. Unless you need to build the CMAQ libraries and code and run on a different family of compute nodes, such as the c6gn.16xlarge compute nodes AMD Graviton.
Activate the virtual environment to use the ParallelCluster command line
source ~/apc-ve/bin/activate
source ~/.nvm/nvm.sh
Upgrade to get the latest version of ParallelCluster
python3 -m pip install --upgrade "aws-parallelcluster"
Verify that the ParallelCluster AWS CLI is installed by checking the version
pcluster version
Output:
{
"version": "3.6.0"
}
Use an existing yaml file from the git repo to create a ParallelCluster
cd /your/local/machine/install/path/
Use a configuration file from the github repo that was cloned to your local machine
git clone -b main https://github.com/CMASCenter/pcluster-cmaq.git pcluster-cmaq
cd pcluster-cmaq/yaml
Edit the hpc7g.16xlarge.ebs_unencrypted_installed_public_ubuntu2004.fsx_import.yaml
vi hpc7g.16xlarge.ebs_unencrypted_installed_public_ubuntu2004.fsx_import.yaml
Note
the hpc7g-16xlarge*.yaml is configured to use ONDEMAND instance pricing for the compute nodes.
the hpc7g-16xlarge*.yaml is configured to the the hpc7g.16xlarge as the compute node, with up to 10 compute nodes, specified by MaxCount: 10.
the hpc7g-16xlarge*.yaml is configured to disable multithreading (This option restricts the computing to CPUS rather than allowing the use of all virtual CPUS. (192 virtual cpus reduced to 96 cpus)
the hpc7g-16xlarge*.yaml is configured to enable the setting of a placement group to allow low inter-node latency
the hpc7g-16xlarge*.yaml is configured to enables the elastic fabric adapter
given this yaml configuration, the maximum number of PEs that could be used to run CMAQ is 64 cpus x 10 = 640, the max settings for NPCOL, NPROW is NPCOL = 32, NPROW = 20 or NPCOL=20, NPROW=32 in the CMAQ run script. Note: CMAQ does not scale well beyond 2-3 compute nodes.
Replace the key pair and subnet ID in the hpc7g.16xlarge*.yaml file with the values created when you configured the demo cluster
Region: us-east-1
Image:
Os: ubuntu2004
HeadNode:
InstanceType: c7g.large
Networking:
SubnetId: subnet-xx-xx-xx << replace
DisableSimultaneousMultithreading: true
Ssh:
KeyName: your_key << replace
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: queue1
CapacityType: ONDEMAND
Networking:
SubnetIds:
- subnet-xx-xx-x x << replace
PlacementGroup:
Enabled: true
ComputeResources:
- Name: compute-resource-1
InstanceType: hpc7g.16xlarge
MinCount: 0
MaxCount: 10
DisableSimultaneousMultithreading: true
Efa:
Enabled: true
GdrSupport: false
SharedStorage:
- MountDir: /shared
Name: ebs-shared
StorageType: Ebs
- MountDir: /fsx
Name: name2
StorageType: FsxLustre
FsxLustreSettings:
StorageCapacity: 1200
The Yaml file for the hpc7g.16xlarge contains the settings as shown in the following diagram.
Figure 1. Diagram of YAML file used to configure a ParallelCluster with a hpc7g.large head node and hpc7g.16xlarge compute nodes using ONDEMAND
4.2.2. Create the hpc7g.16xlarge pcluster#
pcluster create-cluster --cluster-configuration hpc7g.16xlarge.ebs_unencrypted_installed_public_ubuntu2004.fsx_import.yaml --cluster-name cmaq --region us-east-1
Check on status of cluster#
pcluster describe-cluster --region=us-east-1 --cluster-name cmaq
After 5-10 minutes, you see the following status: “clusterStatus”: “CREATE_COMPLETE”
Login to cluster#
Note
Replace the your-key.pem with your Key Pair.
pcluster ssh -v -Y -i ~/your-key.pem --region=us-east-1 --cluster-name cmaq
Note
Notice that the hpc7g.16xlarge yaml configuration file contains a setting for PlacementGroup.
PlacementGroup:
Enabled: true
A placement group is used to get the lowest inter-node latency.
A placement group guarantees that your instances are on the same networking backbone.
Check to make sure elastic network adapter (ENA) is enabled#
modinfo ena
lspci
Check what modules are available on the cluster#
module avail
Output:
module avail
------------------------------------------------------------------------------------------------ /usr/share/modules/modulefiles -------------------------------------------------------------------------------------------------
armpl/21.0.0 dot libfabric-aws/1.17.1 module-git module-info modules null openmpi/4.1.5 use.own
Load the openmpi module#
module load openmpi/4.1.5
Load the Libfabric module#
module load libfabric-aws/1.17.1
Verify the gcc compiler version is greater than 8.0#
gcc --version
output:
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See also