Run CMAQ#
Verify that you have an updated set of run scripts from the pcluster-cmaq repo#
To ensure you have the correct directory specified
cd /shared/pcluster-cmaq/run_scripts/cmaq533/
ls -lrt run*pcluster*
Compare with
ls -lrt /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/run*pcluster*
If they are not identical, then copy the set from the repo
cp /shared/pcluster-cmaq/run_scripts/cmaq533/run*pcluster* /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/
Verify that the input data is imported to /fsx from the S3 Bucket#
cd /fsx/12US2
Need to make this directory and then link it to the path created when the data is copied from the S3 Bucket.
This is to make the paths consistent between the two methods of obtaining the input data.
mkdir -p /fsx/data/CONUS
cd /fsx/data/CONUS
ln -s /fsx/12US2 .
Create the output directory#
mkdir -p /fsx/data/output
Run the CONUS Domain on 180 pes#
cd /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/
sbatch run_cctm_2016_12US2.180pe.5x36.pcluster.csh
Note, it will take about 3-5 minutes for the compute notes to start up. This is reflected in the Status (ST) of CF (configuring)
Check the status in the queue#
squeue -u ubuntu
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 queue1 CMAQ ubuntu CF 3:00 5 queue1-dy-computeresource1-[1-5]
After 5 minutes the status will change once the compute nodes have been created and the job is running
squeue -u ubuntu
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 compute CMAQ ubuntu R 16:50 5 compute-dy-c5n18xlarge-[1-5]
The 180 pe job should take 60 minutes to run (30 minutes per day)
check on the status of the cluster using CloudWatch#
(optional)
<a href="https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=cmaq-us-east-1">Cloudwatch Dashboard</a>
<a href="https://aws.amazon.com/blogs/compute/monitoring-dashboard-for-aws-parallelcluster/">Monitoring Dashboard for ParallelCluster</a>
check the timings while the job is still running using the following command#
grep 'Processing completed' CTM_LOG_001*
Output:
Processing completed... 8.8 seconds
Processing completed... 7.4 seconds
When the job has completed, use tail to view the timing from the log file.#
tail run_cctmv5.3.3_Bench_2016_12US2.10x18pe.2day.pcluster.log
Output:
==================================
***** CMAQ TIMING REPORT *****
==================================
Start Day: 2015-12-22
End Day: 2015-12-23
Number of Simulation Days: 2
Domain Name: 12US2
Number of Grid Cells: 3409560 (ROW x COL x LAY)
Number of Layers: 35
Number of Processes: 180
All times are in seconds.
Num Day Wall Time
01 2015-12-22 2481.55
02 2015-12-23 2225.34
Total Time = 4706.89
Avg. Time = 2353.44
Submit a request for a 288 pe job ( 8 x 36 pe) or 8 nodes instead of 5 nodes#
`sbatch run_cctm_2016_12US2.288pe.8x36.pcluster.csh``
Check on the status in the queue#
squeue -u ubuntu
Note, it takes about 5 minutes for the compute nodes to be initialized, once the job is running the ST or status will change from CF (configure) to R
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6 queue1 CMAQ ubuntu R 24:57 8 queue1-dy-computeresource1-[1-8]
Check the status of the run#
tail CTM_LOG_025.v533_gcc_2016_CONUS_16x18pe_20151222
Check whether the scheduler thinks there are cpus or vcpus#
sinfo -lN
Output:
Wed Jan 05 19:34:05 2022
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
queue1-dy-computeresource1-1 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-2 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-3 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-4 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-5 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-6 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-7 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-8 1 queue1* mixed 72 72:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-9 1 queue1* idle~ 72 72:1:1 1 0 1 dynamic, Scheduler health che
queue1-dy-computeresource1-10 1 queue1* idle~ 72 72:1:1 1 0 1 dynamic, Scheduler health che
Note: on a c5n.18xlarge, the number of virtual cpus is 72.
If the YAML contains the Compute Resources Setting of DisableSimultaneousMultithreading: false, then all 72 vcpus will be used
If DisableSimultaneousMultithreading: true, then the number of cpus is 36 and there are no virtual cpus.
edit run script to use#
SBATCH –exclusive
Edit the yaml file to use DisableSimultaneousMultithreading: true#
Confirm that there are only 36 cpus available to the slurm scheduler#
sinfo -lN
Output:
Wed Jan 05 20:54:01 2022
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
queue1-dy-computeresource1-1 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-2 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-3 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-4 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-5 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-6 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-7 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-8 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-9 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
queue1-dy-computeresource1-10 1 queue1* idle~ 36 36:1:1 1 0 1 dynamic, none
Re-run the CMAQ CONUS Case#
cd /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/
Submit a request for a 288 pe job ( 8 x 36 pe) or 8 nodes instead of 10 nodes with full output#
sbatch run_cctm_2016_12US2.288pe.full.pcluster.csh
squeue -u ubuntu
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 queue1 CMAQ ubuntu CF 3:06 8 queue1-dy-computeresource1-[1-8]
Note, it takes about 5 minutes for the compute nodes to be initialized, once the job is running the ST or status will change from CF (configure) to R
squeue -u ubuntu
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 queue1 CMAQ ubuntu R 24:57 8 queue1-dy-computeresource1-[1-8]
Check the status of the run#
tail CTM_LOG_025.v533_gcc_2016_CONUS_16x18pe_full_20151222
Once you have submitted a few benchmark runs and they have completed successfully, proceed to the next chapter.