Install Input Data on ParallelCluster#

Verify AWS CLI is available obtain data from AWS S3 Bucket#

Check to see if the aws command line interface (CLI) is installed

which aws

If it is installed, skip to the next step.

If it is not available please follow these instructions to install it.

See also

https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

cd /shared

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

unzip awscliv2.zip

sudo ./aws/install

Verify you can run the aws command#

aws --help

If not, you may need to logout and back in.

Note

If you do not have credintials, skip this. The data is on a public bucket, so you do not need credentials.

Set up your credentials for using s3 copy (you can skip this if you do not have credentials)

aws configure

Copy Input Data from S3 Bucket to lustre filesystem#

Verify that the /fsx directory exists; this is a lustre file system where the I/O is fastest

ls /fsx

If you are unable to use the lustre file system, the data can be installed on the /shared volume, if you have resized the volume to be large enough to store the input and output data.

Install the parallel cluster scripts using the commands:

cd /shared

git clone -b main https://github.com/CMASCenter/pcluster-cmaq.git pcluster-cmaq

Use the S3 script to copy the CONUS input data from the CMAS s3 bucket#

Data will be saved to the /fsx file system

/shared/pcluster-cmaq/s3_scripts/s3_copy_nosign_conus_cmas_opendata_to_fsx.csh

check that the resulting directory structure matches the run script

Note

The CONUS 12US2 input data requires 44 GB of disk space
(if you use the yaml file to import the data to the lustre file system rather than copying the data you save this space)

cd /fsx/data/CMAQ_Modeling_Platform_2016/CONUS/12US2/

du -sh

output:

44G     .

CMAQ ParallelCluster is configured to have 1.2 Terrabytes of space on /fsx filesystem (minimum size allowed for lustre /fsx), to allow multiple output runs to be stored.

For ParallelCluster: Import the Input data from a public S3 Bucket#

A second method is available to import the data on the lustre file system using the yaml file to specify the s3 bucket location in the yaml file, rather than using the above aws s3 copy commands.

See also

Example available in c5n-18xlarge.ebs_shared.fsx_import.yaml

cd /shared/pcluster-cmaq/
vi c5n-18xlarge.ebs_shared.fsx_import.yaml   

Section that of the YAML file that specifies the name of the S3 Bucket.

  - MountDir: /fsx
    Name: name2
    StorageType: FsxLustre
    FsxLustreSettings:
      StorageCapacity: 1200
      ImportPath: s3://cmas-cmaq-conus2-benchmark/data/CMAQ_Modeling_Platform_2016/CONUS    <<<  specify name of S3 bucket

This requires that the S3 bucket specified is publically available