/
2012.10.11 Range Maps

2012.10.11 Range Maps

Range Models

October 11, 2012

Meeting Objectives

  • Discuss how the various data files will be used.
    • List output files needed for the BIEN wg meeting. (PIs, John, Brad)
    • List output files the community is likely to want to access fairly often. (PIs, Mark, John)
    • List output files that can be archived. (PIs, Mark, John)
    • List applications needed to analyze and view the relevant output files. (PIs, Mark, John, Brad)
  • Develop a (preliminary) data management plan for the BIEN species range model data.
    • Identify the approximate size of each of those groups of files. (John)
    • Discuss management of data that will be actively used versus data to be archived. (Nirav, Mark, PIs)
    • Identify where and how the various files can/will be stored for use and for archiving. (Nirav, Mark, PIs)
    • Briefly discuss the longer term goals for range modeling as they impact computing and data management needs. (PIs, Mark, Nirav)
    • Discuss the estimated useful lifespan of these data (the various files). (PIs, Mark)
  • Plan of action
    • What needs to be done? (Martha)
    • Assign tasks (Martha)

Participants

John Donoghue, Brad Boyle, Brian Enquist, Nirav Merchant, Edwin Skidmore, Mark Schildhauer, Jim Regetz, Martha Narro

Current location of range model files
  • Edwin to contact Paul re leaving files on Longhorn for a while longer.
  • John to get rough timeframe for completing computing (1 week). 
Preliminary list of  what’s needed for the BIEN wg meeting in November
  • Products
    • Summary file for geographic ranges (Completed by JCD)
    • tables of the species modeled, outputs and sample sizes (Completed by JCD)
    • table of sources of raw data for acknowledgements (To be Done by Brad)
    • table of range areas and basic statistics (Completed by JCD)
    • map products: best probability map and thresholded map (Completed by JCD)
    • Diversity map for the New World*** Raster map displaying the number of species per cell.
      • We have 88,000+ range maps to stack. What is the best way to accomplish this?
      • Nirav has suggested that gridding the maps and computing the diversity of each cell. (To be done by iPlant - Nirav; Or to be done by John with iPlant's assistance)
    • Gridded maps (want to be able to access the same climate data – bioclim)
    • JPEGs for pubs and raw shape files simple visual display of range map on the BIEN website (To be Done by John)
    • Descriptive statistics of range-specific climate (and climate variability) for each species. (In Process by JCD)
  • need new r scripts to generate many/most of these products 
  • Analyses
    • Doing any sort of mapping, e.g. creating diversity maps (superimposing all range maps to create overall biodiversity map)
    • Diversity of different groups, different habits.
    • correlations on range size, conservatism
    • GIS work mapping out diversity
    • Doing spatial joins on maps
      • overlay range predictions, threshold to presence/absence
      • Ability to look at the maps 
Discussion of formats for the models
  • grid the ranges
  • store raster and vector
  • should be able to grid the vector versions
  • Then they will be tiled
  • 88,000 range models
  • for each tile, link to identifiers of the data
  • the shape files have the projection inforomation 
Discussion of how to make the range models accessible for analysis and viewing
  • iPlant can provide access to 15 TB of storage now.
    • Additional 300TB of new storage coming online at UA.
  • On average, 2-4 TB of new data come into the iPlant Data Store from all over the country every day. Moving this amount of data is not a problem. Moving 15 TB is doable, not that we want to move that amount of data around a lot.
  • Could have the subset you’ll compute on spinning at TACC
  • For now, keep all 15 TB of data on disk and don’t worry about it (i.e., in the iPlant Data Store which is replicated at UA and TACC).
    • Revisit annually to see what can be archived.
  • Currently data are stored organized by model
    • Needs to be organized by species
  • iRODS has the features available so we can create different views into the files using symbolic links. So can create different directory structures.
    • Can create metadata about the files.
    • There’s a tutorial on how to do that using iRODS.
    • iRODs has clients.
  • MS: suggesting BIEN group should learn to use the iRODS clients.
    • Want scientists to learn how to access the data while at BIEN mtg.
    • Mark and Jim will pave the way by learning to use iRODS to see if it will meet the groups needs in terms of features and ease of use.
  • Metadata is value/ attribute pairs, so not sophisticated, but a start.
  • Send getting started with iPlant link.
  • Move data to iPlant Data Store at TACC then let iRODS sync it to iPlant Data Store at Tucson.
    • There’s a flag that can be set to move the data this way.
  • Suggestion to provide the output files for 10 species to iPlant immediately so everyone can see how things will work (be accessed) and can start playing around.
    • Will help people decide if the solutions available will meet the group’s needs and help everyone make informed recommendations on else is needed in terms of file formats and access.
    • John to send Nirav some shape files, geotiff
  • Want to have jpegs of range maps available from BIEN website by Nov.
  • How to interface with Map of Life
    • They can pull the range maps from either the iPlant Data Store (any/all of them, any/all formats) or from iPlant’s geoserver (the maps of interest as shapefiles and possibly a few other formats).
  • ESA data publication – so people can access maps.
    • What format and can ESA handle this size of data?
Returning to discussion of data products needed for the next meeting
  • Don’t make 88,000 jpegs (yet).
  • Climatic variability for each species. (computation will be completed in about a week)
  • Large geotiffs are not that big. Could put Qgis on a machine at NCEAS. Let people look at maps that way.
    • Others suggested participants would prefer to access them from their own laptops 

Decisions 

  • Archive all the files on Ranch at TACC
  • For analyses, for now, keep all 15 TB of data on disk and don’t worry about it (i.e., in the iPlant Data Store which is replicated at UA and TACC).
    • Annually revisit to determine which data are not being used and can be archived. 

Action Items

  • Edwin: Contact Paul re leaving files on Longhorn for a week longer (done).
  • John: Compute climatic variability for each species. (In Process)
  • John: Compute a jpeg for each of the 88,000 species ranges. (Correction: not on hold.)
  • John: Send Nirav range model output files for 10 species (Completed).
  • Nirav, Martha: Work with Smaran to load shape files into geoserver. (week of Oct. 15)
  • John: Use sync to copy all files on Longhorn to Ranch for archiving. (Halted during maintenance of Longhorn. Will restart soon.)
  • Mark and Jim: Learn to use iRODS.
  • Martha: Send Mark and Jim the getting started with iPlant link. (done)
  • Brad: Create table of sources of raw data for acknowledgements.
  • John, iPlant, both? with assistance from iPlant: Diversity map of all New World species (To be done by iPlant - Nirav; Or to be done by John with iPlant's assistance)
  • John: Schedule meeting in about 2 weeks. 

Actions on hold

  • John with Edwin’s guidance: Move data to iPlant Data Store at TACC then let iRODS sync it to iPlant Tucson data store. (When computations are complete.)