4.2.8. Run CMAQ using hpc7g.8xlarge compute nodes#

Verify that you have an updated set of run scripts from the pcluster-cmaq repo#

Run the 12US1 Domain on 32 pes on hpc7g.8xlarge#

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/

sbatch run_cctm_2018_12US1_v54_cb6r5_ae6.20171222.1x32.ncclassic.c7g.8xlarge.csh`

When the job has completed, use tail to view the timing from the log file.#

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/

tail  run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.32.4x8pe.2day.20171222start.1x32.hpc7g.8xlarge.log

Output:

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2017-12-22
End Day:   2017-12-23
Number of Simulation Days: 2
Domain Name:               12US1
Number of Grid Cells:      4803435  (ROW x COL x LAY)
Number of Layers:          35
Number of Processes:       32
   All times are in seconds.

Num  Day        Wall Time
01   2017-12-22   6266.1
02   2017-12-23   6868.5
     Total Time = 13134.60
      Avg. Time = 6567.30

Submit a request for a 64 pe job ( 2 x 32 pe) using 2 nodes on hpc7g.8xlarge#

sbatch run_cctm_2018_12US1_v54_cb6r5_ae6.20171222.2x32.ncclassic.c7g.8xlarge.csh

Check on the status in the queue#

squeue -u ubuntu

Note, it takes about 5 minutes for the compute nodes to be initialized, once the job is running the ST or status will change from CF (configure) to R

Output:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 4    queue1     CMAQ   ubuntu  R    1:11:48      2 queue1-dy-compute-resource-2-[3-4]

Check the status of the run#

tail run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.64.8x8pe.2day.20171222start.2x32.hpc7g.8xlarge.log

The 64 pe job should take xx minutes to run (xx minutes per day)

Check whether the scheduler thinks there are cpus or vcpus#

sinfo -lN

Output:

Fri Jun 30 16:39:48 2023
NODELIST                         NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
NODELIST                        NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
queue1-dy-compute-resource-1-1      1   queue1*       idle~ 64     64:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-1-2      1   queue1*       idle~ 64     64:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-1-3      1   queue1*       idle~ 64     64:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-1-4      1   queue1*       idle~ 64     64:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-1-5      1   queue1*       idle~ 64     64:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-1      1   queue1*       idle~ 32     32:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-2      1   queue1*       idle~ 32     32:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-3      1   queue1*   allocated 32     32:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-4      1   queue1*   allocated 32     32:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-5      1   queue1*   allocated 32     32:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-6      1   queue1*   allocated 32     32:1:1 124518        0      1 dynamic, none                
queue1-dy-compute-resource-2-7      1   queue1*   allocated 32     32:1:1 124518        0      1 dynamic, none            

When multiple jobs are submitted to the queue they will be dispatched to different compute nodes.#

squeue

output

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 4    queue1     CMAQ   ubuntu  R    1:13:21      2 queue1-dy-compute-resource-2-[3-4]
                 7    queue1     CMAQ   ubuntu  R      57:51      3 queue1-dy-compute-resource-2-[5-7]

When the job has completed, use tail to view the timing from the log file.#

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/

tail run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.64.8x8pe.2day.20171222start.2x32.hpc7g.8xlarge.log

Output:

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2017-12-22
End Day:   2017-12-23
Number of Simulation Days: 2
Domain Name:               12US1
Number of Grid Cells:      4803435  (ROW x COL x LAY)
Number of Layers:          35
Number of Processes:       64
   All times are in seconds.

Num  Day        Wall Time
01   2017-12-22   3122.1
02   2017-12-23   3419.1
     Total Time = 6541.20
      Avg. Time = 3270.60


Based on the Total Time, adding an additional node gave a speed-up of 2.008 with expected speedup of 2x 13134.60/6541.20 = 2.008

Submit a job to run on 96 cores, 3x32 nodes on hpc7g.8xlarge#

sbatch run_cctm_2018_12US1_v54_cb6r5_ae6.20171222.3x32.ncclassic.c7g.8xlarge.csh

Verify that it is running on 3 nodes#

sbatch

output:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 7    queue1     CMAQ   ubuntu  R      59:47      3 queue1-dy-compute-resource-2-[5-7]

Check the log for how quickly the job is running#

grep 'Processing completed' run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.64.12x8pe.2day.20171222start.3x32.hpc7g.8xlarge.log

Output:

            Processing completed...       5.6952 seconds
            Processing completed...       8.3384 seconds
            Processing completed...       8.2416 seconds
            Processing completed...       5.7230 seconds
            Processing completed...       5.6911 seconds

When the job has completed, use tail to view the timing from the log file.

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/

tail -n 20 run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.64.12x8pe.2day.20171222start.3x32.hpc7g.8xlarge.log

Output:

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2017-12-22
End Day:   2017-12-23
Number of Simulation Days: 2
Domain Name:               12US1
Number of Grid Cells:      4803435  (ROW x COL x LAY)
Number of Layers:          35
Number of Processes:       96
   All times are in seconds.

Num  Day        Wall Time
01   2017-12-22   2141.9
02   2017-12-23   2384.6
     Total Time = 4526.50
      Avg. Time = 2263.25

Based on the Total Time, adding 2 additional nodes gave a speed-up of 2.902, close to 3x if ideal scaling 13134.60/4526.50 = 2.902

Once you have submitted a few benchmark runs and they have completed successfully, proceed to the next chapter.