Range Modeling Documentation from Edwin

Introduction

This are very general instructions to execute the BIEN range maps computations through on Longhorn, TACC's visualization cluster. The primary reason why Longhorn is needed is because the maxent/R package requires the X11 libraries for it's computations.

Resources

General Instructions

  1. (optional) place scripts into the $HOME/bien directory
  2. place data into $SCRATCH/bien directory
  3. generate the paramlist file(s) for the run (see genparamlist.pl and example paramlist files as examples)
  4. (optional) if necessary, create or modify the launcher submit script (see the launcher1.sge file as an example)
  5. module load launcher
  6. qsub launcher1.sge
  7. After the job is submitted with qsub, the output will be sent to the filename specified in the launcher1.sge script's -o setup parameter (see launcher1.sge).

Other Instructions

  • Even if a particular species computation fails, there will be artifacts left from the computation. You can use the delete_bad.sh script to remove the bad runs (see below)
  • For the the first bien computations, the data was so large that I submitted a parametric job to compress the files (see the paramlist-tar file for an example)

Example scripts

The following section contains the scripts used for the first bien computation.

script bien.sh

This script was used to execute the bien R scripts

bien.sh
#!/bin/sh

###
# this turns on debugging
###
DEBUG=1

###
#
# This is the command to use to execute R scripts, if you'd rather
# use R directly, just replace the values
#
###
RSCRIPT=/scratch/01267/edwin/local/bin/Rscript

###
#
# These are the names of the R scripts, relative (to the caller) or absolute path should be fine
#
###
RANGE_AREAS=/scratch/01267/edwin/bien/ComputeRangeAreas.r
MAXENT1=/scratch/01267/edwin/bien/CreateSDM_Bioclim.r
MAXENT2=/scratch/01267/edwin/bien/CreateSDM_BioclimSpatial.r
MAXENT3=/scratch/01267/edwin/bien/CreateSDM_Spatial.r

#environment
export JAVA_HOME=/scratch/01267/edwin/local/jdk1.6.0_27
export TMP=/scratch/01267/edwin/bien
export TEMP=/scratch/01267/edwin/bien

if [ $# != 1 ]; then
	echo ""
	echo "	usage: $0 <input-file>"
	echo ""
	exit 0
fi
INPUTFILE=$1

# this is a function for debug printing
debug_print() {
	if [ $DEBUG = 1 ]; then
		echo "debug: $1"
	fi
}


# validating each of the R scripts
debug_print "validating each of the R scripts"
if [ ! -f $RANGE_AREAS ]; then
	echo "ERROR: R script '$RANGE_AREAS' does not exist or is not valid... aborting"
	exit 1
fi
if [ ! -f $MAXENT1 ]; then
	echo "ERROR: R script '$MAXENT1' does not exist or is not valid... aborting"
	exit 1
fi
if [ ! -f $MAXENT2 ]; then
	echo "ERROR: R script '$MAXENT2' does not exist or is not valid... aborting"
	exit 1
fi
if [ ! -f $MAXENT3 ]; then
	echo "ERROR: R script '$MAXENT3' does not exist or is not valid... aborting"
	exit 1
fi
if [ ! -f $AGG ]; then
	echo "ERROR: R script '$AGG' does not exist or is not valid... aborting"
	exit 1
fi
debug_print "	...scripts are good"

# validate the input directory

debug_print "	processing file $INPUTFILE"
debug_print "	range areas"
$RSCRIPT $RANGE_AREAS $INPUTFILE

debug_print "	maxent1"
$RSCRIPT $MAXENT1 $INPUTFILE
debug_print "	maxent2"
$RSCRIPT $MAXENT2 $INPUTFILE
debug_print "	maxent3"
$RSCRIPT $MAXENT3 $INPUTFILE

script genparamlist.pl

This script was used to generate an example paramlist that called the bien.sh script

genparamlist.pl
#!/bin/perl
use File::Basename;

## The are the parameters to initialize
$INPUTDIR = "/ranger/work/01267/edwin/bien/RERUN";
$BIEN_SCRIPT = "/scratch/01267/edwin/bien/bien.sh";
$PARAMLIST = "/scratch/01267/edwin/bien/paramlist1";
$LOGDIR = "/scratch/01267/edwin/bien/Output/Logs";

open(FH, ">$PARAMLIST") or die $!;
@files = <$INPUTDIR/*>;

foreach $f (@files) {
        $bn = basename ($f, ".csv");
        print FH "$BIEN_SCRIPT $f >& $LOGDIR/$bn.out\n";
}

close(FH);

file paramlist1

This file, invoked from the launcher submit script, lists all the serial jobs to be executed, task per line. The example below has been truncated for the purposes of documentation.

paramlist1
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetracoccus_fasciculatus_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetracoccus_fasciculatus_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetracoccus_hallii_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetracoccus_hallii_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_clementiana_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_clementiana_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_cordata_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_cordata_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_falafa_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_falafa_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_fruticosa_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_fruticosa_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_goudotii_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_goudotii_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_herbacea_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_herbacea_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_hildeana_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_hildeana_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_isaloensis_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_isaloensis_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_nervosa_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_nervosa_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_riparia_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_riparia_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradenia_tanganyikae_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradenia_tanganyikae_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradymia_axillaris_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradymia_axillaris_UTM.out
/scratch/01267/edwin/bien/bien.sh /ranger/work/01267/edwin/bien/temp/FIXED_UTM/Tetradymia_canescens_UTM.csv >& /scratch/01267/edwin/bien/Output/Logs/Tetradymia_canescens_UTM.out

script launcher1.sge

This file is the one used to submit to the SGE engine. If you have multiple jobs, you will want to copy this file and modify the reference to paramlist1 according to the correct paramlist file.

If you like to change the output location, you'll need to modify the -o setup parameter in this file.

In this example, I am submitting to the normal queue with a 6-hour time limit, 8 cores per node, 128 cores total. You could try for the long queue (24-hours), but this may impact your priority on the queue, though I'm not 100% certain.

Other setup parameters to change are:
-M the notification email address
-o the output file
-N job name

launcher1.sge
#!/bin/csh
#
# Simple SGE script for submitting multiple serial
# jobs (e.g. parametric studies) using a script wrapper
# to launch the jobs.
#
# To use, build the launcher executable and your
# serial application(s) and place them in your WORKDIR
# directory.  Then, edit the CONTROL_FILE to specify
# each executable per process.
#-------------------------------------------------------
#-------------------------------------------------------
#
#         <------ Setup Parameters ------>
#
#$ -M administrator@johndonoghue.net
#@ -m be
#$ -N ParametricBienR
#$ -pe 8way 128
#$ -q normal
#$ -o Parametric.o$JOB_ID
#$ -l h_rt=6:00:00
#$ -V
#$ -cwd
#   <------ You MUST Specify a Project String ----->
#$ -P hpc
#------------------------------------------------------
#
# Usage:
#	#$ -pe <parallel environment> <number of slots>
#	#$ -l h_rt=hours:minutes:seconds to specify run time limit
# 	#$ -N <job name>
# 	#$ -q <queue name>
# 	#$ -o <job output file>
#	   NOTE: The env variable $JOB_ID contains the job id.
#
module load launcher
setenv EXECUTABLE     $TACC_LAUNCHER_DIR/launcher
setenv CONTROL_FILE   paramlist1
setenv WORKDIR        .
#
# Variable description:
#
#  EXECUTABLE     = full path to the job launcher executable
#  CONTROL_FILE   = text input file which specifies
#                   executable for each process
#                   (should be located in WORKDIR)
#  WORKDIR        = location of working directory
#
#      <------ End Setup Parameters ------>
#--------------------------------------------------------
#--------------------------------------------------------

#----------------
# Error Checking
#----------------

if ( ! -e $WORKDIR ) then
        echo " "
	echo "Error: unable to change to working directory."
	echo "       $WORKDIR"
	echo " "
	echo "Job not submitted."
	exit
endif

if ( ! -f $EXECUTABLE ) then
	echo " "
	echo "Error: unable to find launcher executable $EXECUTABLE."
	echo " "
	echo "Job not submitted."
	exit
endif

if ( ! -f $WORKDIR/$CONTROL_FILE ) then
	echo " "
	echo "Error: unable to find input control file $CONTROL_FILE."
	echo " "
	echo "Job not submitted."
	exit
endif


#----------------
# Job Submission
#----------------

cd $WORKDIR/
echo " WORKING DIR:   $WORKDIR/"
date

$TACC_LAUNCHER_DIR/paramrun $EXECUTABLE $CONTROL_FILE

date
echo " "
echo " Parameteric Job Complete"
echo " "

script delete_bad.sh

This script will delete all the bad outputs, assuming there is a file called bad.txt, which contains all the bad species .csv files, one per line. The script also assumes that the outputs have the same base name as the .csv file.

delete_bad.sh
#!/bin/sh

# this script will delete bad runs
set -x

rerun="bad.txt"
batchdir="/scratch/01267/edwin/bien/batch2/Output"

for i in `cat rerun.txt`
do
        bn=`basename $i .csv`
        echo $bn
        find $batchdir -name "${bn}*" -exec rm {} \;
done

This is an example bad.txt:

bad.txt
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xerophyta_dasylirioides_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xylosma_longipedicellata_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zinnia_acerosa_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zeyherella_mayumbensis_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Wyethia_mollis_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygia_englesingii_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xerophyta_setosa_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xanthosoma_granvillei_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xylosma_tweediana_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygodon_peruvianus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Ziziphus_cotinifolia_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zinnia_juniperifolia_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zeugites_munroanus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygogynum_pauciflorum_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zehneria_thwaitesii_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zornia_harmsiana_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xanthostemon_laurinus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zanthoxylum_verrucosum_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Yucca_madrensis_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Xylopia_rubescens_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Zygodon_papillatus_UTM.csv
/ranger/work/01267/edwin/bien/temp/FIXED_UTM/Ziziphus_taylorii_UTM.csv

file paramlist-tar

This is an example paramlist file used to tar the files. These will take a while.

paramlist-tar
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-boundingbox.tgz BoundingBox
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-convexhull.tgz ConvexHull
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-latextent.tgz LatExtent
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-maxent.tgz Maxent
tar -C /scratch/01267/edwin/bien/batch1/Output -czf /scratch/01267/edwin/bien/batch1/batch1-points.tgz Points

tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-boundingbox.tgz BoundingBox
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-convexhull.tgz ConvexHull
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-latextent.tgz LatExtent
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-maxent.tgz Maxent
tar -C /scratch/01267/edwin/bien/batch2/Output -czf /scratch/01267/edwin/bien/batch2/batch2-points.tgz Points