Introduction
This are very general instructions to execute the BIEN range maps computations through on Longhorn, TACC's visualization cluster. The primary reason why Longhorn is needed is because the maxent/R package requires the X11 libraries for it's computations.
Resources
General Instructions
- (optional) place scripts into the $HOME/bien directory
- place data into $SCRATCH/bien directory
- generate the paramlist file(s) for the run (see genparamlist.pl and example paramlist files as examples)
- (optional) if necessary, create or modify the launcher submit script (see the launcher1.sge file as an example)
module load launcher
qsub launcher1.sge
- After the job is submitted with qsub, the output will be sent to the filename specified in the launcher1.sge script's -o setup parameter (see launcher1.sge).
Other Instructions
- Even if a particular species computation fails, there will be artifacts left from the computation. You can use the delete_bad.sh script to remove the bad runs (see below)
- For the the first bien computations, the data was so large that I submitted a parametric job to compress the files (see the paramlist-tar file for an example)
Example scripts
The following section contains the scripts used for the first bien computation.
script bien.sh
This script was used to execute the bien R scripts
#!/bin/sh
###
# this turns on debugging
###
DEBUG=1
###
#
# This is the command to use to execute R scripts, if you'd rather
# use R directly, just replace the values
#
###
RSCRIPT=/scratch/01267/edwin/local/bin/Rscript
###
#
# These are the names of the R scripts, relative (to the caller) or absolute path should be fine
#
###
RANGE_AREAS=/scratch/01267/edwin/bien/ComputeRangeAreas.r
MAXENT1=/scratch/01267/edwin/bien/CreateSDM_Bioclim.r
MAXENT2=/scratch/01267/edwin/bien/CreateSDM_BioclimSpatial.r
MAXENT3=/scratch/01267/edwin/bien/CreateSDM_Spatial.r
#environment
export JAVA_HOME=/scratch/01267/edwin/local/jdk1.6.0_27
export TMP=/scratch/01267/edwin/bien
export TEMP=/scratch/01267/edwin/bien
if [ $# != 1 ]; then
echo ""
echo " usage: $0 <input-file>"
echo ""
exit 0
fi
INPUTFILE=$1
# this is a function for debug printing
debug_print() {
if [ $DEBUG = 1 ]; then
echo "debug: $1"
fi
}
# validating each of the R scripts
debug_print "validating each of the R scripts"
if [ ! -f $RANGE_AREAS ]; then
echo "ERROR: R script '$RANGE_AREAS' does not exist or is not valid... aborting"
exit 1
fi
if [ ! -f $MAXENT1 ]; then
echo "ERROR: R script '$MAXENT1' does not exist or is not valid... aborting"
exit 1
fi
if [ ! -f $MAXENT2 ]; then
echo "ERROR: R script '$MAXENT2' does not exist or is not valid... aborting"
exit 1
fi
if [ ! -f $MAXENT3 ]; then
echo "ERROR: R script '$MAXENT3' does not exist or is not valid... aborting"
exit 1
fi
if [ ! -f $AGG ]; then
echo "ERROR: R script '$AGG' does not exist or is not valid... aborting"
exit 1
fi
debug_print " ...scripts are good"
# validate the input directory
debug_print " processing file $INPUTFILE"
debug_print " range areas"
$RSCRIPT $RANGE_AREAS $INPUTFILE
debug_print " maxent1"
$RSCRIPT $MAXENT1 $INPUTFILE
debug_print " maxent2"
$RSCRIPT $MAXENT2 $INPUTFILE
debug_print " maxent3"
$RSCRIPT $MAXENT3 $INPUTFILE
script genparamlist.pl
This script was used to generate an example paramlist that called the bien.sh script
#!/bin/perl
use File::Basename;
## The are the parameters to initialize
$INPUTDIR = "/ranger/work/01267/edwin/bien/RERUN";
$BIEN_SCRIPT = "/scratch/01267/edwin/bien/bien.sh";
$PARAMLIST = "/scratch/01267/edwin/bien/paramlist1";
$LOGDIR = "/scratch/01267/edwin/bien/Output/Logs";
open(FH, ">$PARAMLIST") or die $!;
@files = <$INPUTDIR/*>;
foreach $f (@files) {
$bn = basename ($f, ".csv");
print FH "$BIEN_SCRIPT $f >& $LOGDIR/$bn.out\n";
}
close(FH);
file paramlist1
This file, invoked from the launcher submit script, lists all the serial jobs to be executed, task per line. The example below has been truncated for the purposes of documentation.
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetracoccus_fasciculatus_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetracoccus_fasciculatus_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetracoccus_hallii_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetracoccus_hallii_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_clementiana_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_clementiana_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_cordata_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_cordata_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_falafa_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_falafa_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_fruticosa_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_fruticosa_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_goudotii_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_goudotii_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_herbacea_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_herbacea_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_hildeana_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_hildeana_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_isaloensis_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_isaloensis_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_nervosa_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_nervosa_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_riparia_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_riparia_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_tanganyikae_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_tanganyikae_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradymia_axillaris_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradymia_axillaris_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradymia_canescens_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradymia_canescens_UTM.out
script launcher1.sge
This file is the one used to submit to the SGE engine. If you have multiple jobs, you will want to copy this file and modify the reference to paramlist1 according to the correct paramlist file.
If you like to change the output location, you'll need to modify the -o setup parameter in this file.
In this example, I am submitting to the normal queue with a 6-hour time limit, 8 cores per node, 128 cores total. You could try for the long queue (24-hours), but this may impact your priority on the queue, though I'm not 100% certain.
Other setup parameters to change are:
-M the notification email address
-o the output file
-N job name
#!/bin/csh
#
# Simple SGE script for submitting multiple serial
# jobs (e.g. parametric studies) using a script wrapper
# to launch the jobs.
#
# To use, build the launcher executable and your
# serial application(s) and place them in your WORKDIR
# directory. Then, edit the CONTROL_FILE to specify
# each executable per process.
#-------------------------------------------------------
#-------------------------------------------------------
#
# <------ Setup Parameters ------>
#
#$ -M administrator@johndonoghue.net
#@ -m be
#$ -N ParametricBienR
#$ -pe 8way 128
#$ -q normal
#$ -o Parametric.o$JOB_ID
#$ -l h_rt=6:00:00
#$ -V
#$ -cwd
# <------ You MUST Specify a Project String ----->
#$ -P hpc
#------------------------------------------------------
#
# Usage:
# #$ -pe <parallel environment> <number of slots>
# #$ -l h_rt=hours:minutes:seconds to specify run time limit
# #$ -N <job name>
# #$ -q <queue name>
# #$ -o <job output file>
# NOTE: The env variable $JOB_ID contains the job id.
#
module load launcher
setenv EXECUTABLE $TACC_LAUNCHER_DIR/launcher
setenv CONTROL_FILE paramlist1
setenv WORKDIR .
#
# Variable description:
#
# EXECUTABLE = full path to the job launcher executable
# CONTROL_FILE = text input file which specifies
# executable for each process
# (should be located in WORKDIR)
# WORKDIR = location of working directory
#
# <------ End Setup Parameters ------>
#--------------------------------------------------------
#--------------------------------------------------------
#----------------
# Error Checking
#----------------
if ( ! -e $WORKDIR ) then
echo " "
echo "Error: unable to change to working directory."
echo " $WORKDIR"
echo " "
echo "Job not submitted."
exit
endif
if ( ! -f $EXECUTABLE ) then
echo " "
echo "Error: unable to find launcher executable $EXECUTABLE."
echo " "
echo "Job not submitted."
exit
endif
if ( ! -f $WORKDIR/$CONTROL_FILE ) then
echo " "
echo "Error: unable to find input control file $CONTROL_FILE."
echo " "
echo "Job not submitted."
exit
endif
#----------------
# Job Submission
#----------------
cd $WORKDIR/
echo " WORKING DIR: $WORKDIR/"
date
$TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE
date
echo " "
echo " Parameteric Job Complete"
echo " "
script delete_bad.sh
This script will delete all the bad outputs, assuming there is a file called bad.txt, which contains all the bad species .csv files, one per line. The script also assumes that the outputs have the same base name as the .csv file.
#!/bin/sh
# this script will delete bad runs
set -x
rerun="bad.txt"
batchdir="/scratch/01267/edwin/bien/batch2/Output"
for i in `cat rerun.txt`
do
bn=`basename $i .csv`
echo $bn
find $batchdir -name "${bn}*" -exec rm {} \;
done
This is an example bad.txt:
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xerophyta_dasylirioides_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xylosma_longipedicellata_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zinnia_acerosa_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zeyherella_mayumbensis_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Wyethia_mollis_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygia_englesingii_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xerophyta_setosa_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xanthosoma_granvillei_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xylosma_tweediana_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygodon_peruvianus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Ziziphus_cotinifolia_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zinnia_juniperifolia_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zeugites_munroanus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygogynum_pauciflorum_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zehneria_thwaitesii_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zornia_harmsiana_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xanthostemon_laurinus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zanthoxylum_verrucosum_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Yucca_madrensis_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xylopia_rubescens_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygodon_papillatus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Ziziphus_taylorii_UTM.csv
file paramlist-tar
This is an example paramlist file used to tar the files. These will take a while.
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-boundingbox.tgz BoundingBox
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-convexhull.tgz ConvexHull
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-latextent.tgz LatExtent
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-maxent.tgz Maxent
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-points.tgz Points
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-boundingbox.tgz BoundingBox
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-convexhull.tgz ConvexHull
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-latextent.tgz LatExtent
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-maxent.tgz Maxent
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-points.tgz Points