CCNI GPFS

Best Practices for Using the CCNI File System

AKA: WHERE THE HECK DID MY FILES GO?!


ALERT: Some directories are frequently purged! Use /data at your own risk!


The new file system at the CCNI was very carefully designed. Be sure to read this page about how it was set up. The following reiterates what was said on the CCNI Wiki, then lists some best practices you can use to make your life easier.

The File System

The GPFS (IBM’s General Parallel File System) implemented at the CCNI is split into two portions: large and small. The large side uses 4 MB blocks, while the small side uses 128 KB blocks.

Heriarchy

- /gpfs
	- /sb
		- /home
		- /data
	- /lb
		- /data
		- /provisioned

Within each of these folders is a PROJ project folder for your group, consisting of the following sub-directories:

- /PROJ
	- /shared
	- /PROJuser

This means your personal ~ home directory is at /gpsf/sb/home/PROJ/PROJuser/.

Home

These directories are shared by the group, and replicated to ensure data reliablity. This portion of the file system has an effective quota of 5 GB per project.

Data

Please Note: These directiories are for temporary storage. Files not touched in at least 14 days will be purged frequetly.

These directories are intended to be staging areas for performing computation. They exist on both the small and large block partitions of the system, for flexiblity and optimum usage. Again they are shared by the group and have a much larger quota.

Provisioned

This directory exsits for long-term storage of unimportant data. It is not replicated or backed up, but your data should persist. This only exists on the large portion of the file system and has a quota of 5 GB per procjet.

Best Practices

First and foremost, remove all of your experimental results and code from /data now.

Depending on your project, you will have several files to organize. The following uses a ROSS project (part of the RSNT CCNI group) for reference. It also assumes that some type of version control is being used.

Source

/gpfs/lb/provisioned/RSNT/RSNTgons/rossnet/

This directory holds the sorce code for all of my experiments. The rossnet folder is managed through version control. This ensures that if cataclysmic event occurs and the CCNI loses this data, I have made sure a recent verion is backed up somewhere else.

Build

/gpfs/lb/data/RSNT/RSNTgons/rossnet-build/

This directory is where my source code is built. The executable lives here, and not much else. If I go on vacation for two weeks, I expect everything in these directories to disappear.

Test

/gpfs/sb/home/RSNT/RSNTgons/ROSS-tests/

This is within my home directory. All of my experimental results and run scprits are kept here. This is the core of important files, that, if destroyed, would delay my PhD by years.

Sample Scripts

build.sh: Interatively build the project and perform a run:

#!/bin/bash

# Parameter checking / help
if [ $# -eq 1 ]; then
    if [ "$1" == "-h" ]; then
        echo "Usage: `basename $0` [order]"
        exit 65
    fi
fi

# Check build environment
if [ "$ARCH" = "" ]; then
    module load xl
    export ARCH=bgq
    export CC=mpixlc
fi

# Configure Build Directory
if [ $# -eq 1 ]; then
    ORDER=$1
else
    ORDER=`squeue | wc -l`
fi

BUILD=/gpfs/lb/data/RSNT/RSNTgons

if [ ! -e $BUILD/rossnet-build-$ORDER/ ]; then
    mkdir $BUILD/rossnet-build-$ORDER
fi

cd $BUILD/rossnet-build-$ORDER/

# gather run settings
echo -n "NP = "
read NP

echo -n "X = "
read X

echo -n "Y = "
read Y

echo "cmake -Dnp=$NP -Dx=$X -Dy=$Y rossnet/trunk"

# cmake
cmake28 -Dnp=$NP -Dx=$X -Dy=$Y -DROSS_BUILD_MODELS=ON /gpfs/lb/provisioned/RSNT/RSNTgons/rossnet/trunk

# make
cd ross/models/gates/
make -j 12

# configure the run
cd /gpfs/sb/home/RSNT/RSNTgons/ROSS/tests

echo -n "nodes = "
read NODES

echo -n "synch = "
read SYNCH

cat srun.sh.in | sed s/"@NP@"/$NP/g | sed s/"@BUILD@"/$BUILD/g | sed s/"@ORDER@"/$ORDER/g | sed s/"@SYNCH@"/$SYNCH/g | sed s/"@NODES@"/$NODES/g > $BUILD/rossnet-build-$ORDER/srun-$ORDER.sh

chmod u+x $BUILD/rossnet-build-$ORDER/srun-$ORDER.sh

# get the allocation and run
salloc --nodes $NODES --bell $BUILD/rossnet-build-$ORDER/srun-$ORDER.sh

srun.sh.in: run script template:

#!/bin/bash

echo "testing: $SLURM_JOB_ID"

srun --ntasks @NP@ @BUILD@/rossnet-build-@ORDER@/ross/models/gates/gates --batch=2 --gvt-interval=1024 --source_interval=2 --sink_interval=100 --synch=@SYNCH@ --end=3000 > srun-$SLURM_JOB_bID-n@NODES@.@NP@-s@SYNC@.out