2.2.1. Learn how to Use AWS CLI to launch c6a.2xlarge EC2 instance using Public AMI#

Public AMI contains the software and data to run 2016_12SE1 benchmark using CMAQv5.33#

Software was pre-installed and saved to a public ami.

The input data was also transferred from the AWS Open Data Program and installed on the EBS volume.

This chapter describes the process that was used to test and configure the c6a.2xlarge ec2 instance to run CMAQv5.3.3 for the 2016_12SE1 domain.

Todo: Need to create command line options to copy a public ami to a different region.

Verify that you can see the public AMI on the us-east-1 region.#

aws ec2 describe-images --region us-east-1 --image-id ami-065049c5c78e6c6a5

Output:

{
    "Images": [
        {
            "Architecture": "x86_64",
            "CreationDate": "2023-06-24T00:17:02.000Z",
            "ImageId": "ami-065049c5c78e6c6a5",
            "ImageLocation": "440858712842/cmaqv5.4_c6a.48xlarge.io2.iops.100000",
            "ImageType": "machine",
            "Public": true,
            "OwnerId": "440858712842",
            "PlatformDetails": "Linux/UNIX",
            "UsageOperation": "RunInstances",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/sda1",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "Iops": 100000,
                        "SnapshotId": "snap-08b8608dca836ef2e",
                        "VolumeSize": 500,
                        "VolumeType": "io2",
                        "Encrypted": false
                    }
                },
                {
                    "DeviceName": "/dev/sdb",
                    "VirtualName": "ephemeral0"
                },
                {
                    "DeviceName": "/dev/sdc",
                    "VirtualName": "ephemeral1"
                }
            ],
            "EnaSupport": true,
            "Hypervisor": "xen",
            "Name": "cmaqv5.4_c6a.48xlarge.io2.iops.100000",
            "RootDeviceName": "/dev/sda1",
            "RootDeviceType": "ebs",
            "SriovNetSupport": "simple",
            "VirtualizationType": "hvm",
            "DeprecationTime": "2025-06-24T00:17:02.000Z"
        }
    ]
}

Use q to exit out of the command line

Note, the AMI uses the maximum value available on io2 for Iops of 100000.

AWS Resources for the aws cli method to launch ec2 instances.#

aws cli examples

aws cli run instances command

Tutorial Launch Spot Instances

(note, it discourages the use of run-instances for launching spot instances, but they do provide an example method)

Launching EC2 Spot Instances using Run Instances API

Additional resources for spot instance provisioning.

Spot Instance Requests

To launch a Spot Instance with RunInstances API you create the configuration file as described below:

cat <<EoF > ./runinstances-config.json
{
    "DryRun": false,
    "MaxCount": 1,
    "MinCount": 1,
    "InstanceType": "c6a.2xlarge",
    "ImageId": "ami-065049c5c78e6c6a5",
    "InstanceMarketOptions": {
        "MarketType": "spot"
    },
    "TagSpecifications": [
        {
            "ResourceType": "instance",
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "EC2SpotCMAQv54"
                }
            ]
        }
    ]
}
EoF

Use the publically available AMI to launch an ondemand c6a.2xlarge ec2 instance using a io2 volume with 100000 IOPS with hyperthreading disabled#

Note, we will be using a json file that has been preconfigured to specify the ImageId

Obtain the code using git#

git clone -b main https://github.com/CMASCenter/pcluster-cmaq

cd pcluster-cmaq/json

Note, you will need to obtain a security group id from your IT administrator that allows ssh login access. If this is enabled by default, then you can remove the –security-group-ids launch-wizard-with-tcp-access

Example command: note launch-wizard-with-tcp-access needs to be replaced by your security group ID, and your-pem key needs to be replaced by the name of your-pem.pem key.

aws ec2 run-instances --debug --key-name your-pem --security-group-ids launch-wizard-with-tcp-access --dry-run --region us-east-1 --cli-input-json file://runinstances-config.json

Command that works for UNC’s security group and pem key:

yaws ec2 run-instances –debug –key-name cmaqv5.4 –security-group-ids launch-wizard-179 –region us-east-1 –dry-run –ebs-optimized –cpu-options CoreCount=4,ThreadsPerCore=1 –cli-input-json file://runinstances-config.io2.c6a.2xlarge.json`

Once you have verified that the command above works with the –dry-run option, rerun it without as follows.

aws ec2 run-instances --debug --key-name cmaqv5.4 --security-group-ids launch-wizard-179 --region us-east-1 --ebs-optimized --cpu-options CoreCount=4,ThreadsPerCore=1 --cli-input-json file://runinstances-config.io2.c6a.2xlarge.json

Example of security group inbound and outbound rules required to connect to EC2 instance via ssh.

Inbound Rule

Outbound Rule

Additional resources

CLI commands to create Security Group

Use the following command to obtain the public IP address of the machine.#

This command is commented out, as the instance hasn’t been created yet. keeping the instructions for documentation purposes.

aws ec2 describe-instances --region=us-east-1 --filters "Name=image-id,Values=ami-065049c5c78e6c6a5" | grep PublicIpAddress

Load the environment modules#

module avail

module load ioapi-3.2/gcc-11.3.0-netcdf mpi/openmpi-4.1.2 netcdf-4.8.1/gcc-11.3

Update the pcluster-cmaq repo using git#

cd /shared/pcluster-cmaq

git pull

Run CMAQv5.4 for 12US1 Listos Training 3 Day benchmark Case on 4 pe#

Input data is available for a subdomain of the 12km 12US1 case.

GRIDDESC

'2018_12Listos'
'LamCon_40N_97W'   1812000.000    240000.000     12000.000     12000.000   25   25    1

Use command line to submit the job. This single virtual machine does not have a job scheduler such as slurm installed.#

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts
./run_cctm_2018_12US1_listos.csh | & tee ./run_cctm_2018_12US1_listos.c6a.2xlarge.log

Use HTOP to view performance.#

htop

output

Screenshot of HTOP

Successful output#

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2018-08-05
End Day:   2018-08-07
Number of Simulation Days: 3
Domain Name:               2018_12Listos
Number of Grid Cells:      21875  (ROW x COL x LAY)
Number of Layers:          35
Number of Processes:       4
   All times are in seconds.

Num  Day        Wall Time
01   2018-08-05   166.7
02   2018-08-06   167.0
03   2018-08-07   171.3
     Total Time = 505.00
      Avg. Time = 168.33

Note, this took longer than the run done using c6a.48xlarge, where 32 cores were used. The c6a.2xlarge also has smaller cache sizes than the c6a.48xlarge, which you can see when you compare output of the lscpu command.

Change to the scripts directory#

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/

Use lscpu to confirm that there are 4 cores on the c6a.2xlarge ec2 instance that was created with hyperthreading turned off.#

lscpu

Output:

lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7R13 Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            5299.98
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdt
                         scp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x
                         2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext invpcid_s
                         ingle ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clze
                         ro xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    2 MiB (4 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Edit the 12US3 Benchmark run script to use the gcc compiler and to output all species to CONC output file.#

cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/

vi run_cctm_Bench_2018_12NE3.c6a48xlarge.csh

change

setenv compiler intel

setenv compiler gcc

Comment out the CONC_SPCS setting that limits them to only 12 species

   # setenv CONC_SPCS "O3 NO ANO3I ANO3J NO2 FORM ISOP NH3 ANH4I ANH4J ASO4I ASO4J" 

Change the NPCOL, NPROW to run on 4 cores

   @ NPCOL  =  2; @ NPROW =  2

Run the 12US3 Benchmark case#

./run_cctm_Bench_2018_12NE3.c6a.2xlarge.csh |& tee ./run_cctm_Bench_2018_12NE3.c6a.2xlarge.4pe.log

Use HTOP to view performance.#

htop

output

Screenshot of HTOP

Note, this 12NE3 Domain uses more memory, and takes longer than the 12LISTOS-Training Domain. It also takes longer to run using 4 cores on c6a.2xlarge instance than on 32 cores on c6a.48xlarge instance.

Successful output for 12 species output in the 3-D CONC file took 56 minutes to run 1 day#

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2018-07-01
End Day:   2018-07-01
Number of Simulation Days: 1
Domain Name:               2018_12NE3
Number of Grid Cells:      367500  (ROW x COL x LAY)
Number of Layers:          35
Number of Processes:       4
   All times are in seconds.

Num  Day        Wall Time
01   2018-07-01   3410.99
     Total Time = 3410.99
      Avg. Time = 3410.99

Compared to the timing for running on 32 processors, which took 444.34 seconds, this is a factor of 7.67 or close to perfect scalability of adding 8x as many cores.

Find the InstanceID using the following command on your local machine.#

aws ec2 describe-instances --region=us-east-1 | grep InstanceId

Output

i-xxxx

Stop the instance#

aws ec2 stop-instances --region=us-east-1 --instance-ids i-xxxx

Get the following error message.

aws ec2 stop-instances –region=us-east-1 –instance-ids i-041a702cc9f7f7b5d

An error occurred (UnsupportedOperation) when calling the StopInstances operation: You can’t stop the Spot Instance ‘i-041a702cc9f7f7b5d’ because it is associated with a one-time Spot Instance request. You can only stop Spot Instances associated with persistent Spot Instance requests.

Note sure how to do a persistent spot instance request .

Terminate Instance#

aws ec2 terminate-instances --region=us-east-1 --instance-ids i-xxxx

Verify that the instance is being shut down.#

aws ec2 describe-instances --region=us-east-1

2.2.1. Learn how to Use AWS CLI to launch c6a.2xlarge EC2 instance using Public AMI#

Public AMI contains the software and data to run 2016_12SE1 benchmark using CMAQv5.33#

Verify that you can see the public AMI on the us-east-1 region.#

AWS Resources for the aws cli method to launch ec2 instances.#

Use the publically available AMI to launch an ondemand c6a.2xlarge ec2 instance using a io2 volume with 100000 IOPS with hyperthreading disabled#

Obtain the code using git#

Use the following command to obtain the public IP address of the machine.#

Login to the ec2 instance#

Login to the ec2 instance again, so that you have two windows logged into the machine.#

Load the environment modules#

Update the pcluster-cmaq repo using git#

Run CMAQv5.4 for 12US1 Listos Training 3 Day benchmark Case on 4 pe#

Use command line to submit the job. This single virtual machine does not have a job scheduler such as slurm installed.#

Use HTOP to view performance.#

Successful output#

Change to the scripts directory#

Use lscpu to confirm that there are 4 cores on the c6a.2xlarge ec2 instance that was created with hyperthreading turned off.#

Edit the 12US3 Benchmark run script to use the gcc compiler and to output all species to CONC output file.#

Run the 12US3 Benchmark case#

Use HTOP to view performance.#

Successful output for 12 species output in the 3-D CONC file took 56 minutes to run 1 day#

Find the InstanceID using the following command on your local machine.#

Stop the instance#

Terminate Instance#

Verify that the instance is being shut down.#