Datasphere Workshop Report
Report on the Datasphere Workshop
Purpose
The workshop, sponsored by the National Science Foundation (NSF), will explore the software and cyberinfrastructure needs of the biodiversity and environmental sciences, and address how a software institute might advance these fields. Results of the workshop will be synthesized in a publication to be co-authored by all participants. The workshop’s conclusions regarding the role of software and cloud-hardware in enabling work in biodiversity and environmental science will be the foundation of a proposal for major NSF investment in an institute that can sustain the critical community software capable of transforming the ability to address major problems in the biodiversity, environmental and Earth sciences, and science more broadly.
iPlant Participants
- Martha Narro
- Nirav Merchant
Workshop Organizers
- Bryan Heidorn, University of Arizona
- Christine Borgman, University of California at Los Angeles
- Bill Howe, University of Washington
- Carl Kesselman, University of Southern California
- Ian Foster, University of Chicago (didn't attend)
- see https://sites.google.com/site/ieltrconcept/the-ieltr-team
Relevance to iPlant
The majority of participants, but not all, were directors of ecological field stations across the U.S. and as such, they work with numerous scientists who conduct field work. Environmental data are important to this group since many studies involve the impact of climate change. Essentially all analyses include spatial data and there were many examples of the need for infrastructure to support spatial data. The group described needing general data storage, ways to manage and use metadata, a platform for image data, a way to easily make data publicly available after papers are published, and wanting to make data available to distributed researchers. They also need TNRS (including beyond plants) and tools for scrubbing geospatial data such as those developed with the BIEN/NCEAS group.
In the two breakout groups Nirav was in, they all were looking for some form of "spatial data infrastructure" and data driven collaborations.
Data discovery is a big issue for these scientists:
- There isn't a repository for field data. The data are all over the place, therefore difficult to find.
- Several said they need to be able to draw box on map so they can discover and fetch any relevant data from that area (field station data and other data).
- Legacy data that isn't digitized (esp., in field notebooks) is crucial historical data for documenting and studying the impact of climate change.
- Interesting suggested CI feature: A data discovery feature analogous to Amazon's "People who bought this book also bought this one" that would be "People who accessed this dataset also accessed these datasets".
Some of these scientists (esp. Tosha Comendant and Michael Benard) and their colleagues are good candidates to review the Ecological Modeling straw proposal Ramona drafted.
Follow Up
Derek Masaki (USGS, BISON)
- Schedule a call to discuss how iPlant could help support their efforts which may include up to 1 petabyte of data storage.
Michael Benard (Case Western)
- Has data on an old FoxPro database that needs to be made accessible, first to the consortium of faculty who created it, and second to the broader research community.
- The data consist of about 100K observations of amphibians (and possibly other organisms) from 30 field sites over decades.
- Martha to discuss with Nirav if there's any way iPlant could help make the data more accessible.
- It's worth seeing if he will review the Ecological Modeling straw proposal Ramona drafted.
Ian Billick (Rocky Mountain Biological laboratory)
- Interested in whether iPlant could provide what he refers to as support for "workflows". After discussion between Ian and Nirav, it became clear that the workflows Ian has in mind are not complex.
- His field season is about to begin so he'll look into what iPlant has to offer in the fall.
- Martha will send and email with some relevant links and information, and will check in with Ian in September.
Becca Fenwick (UC Natural Reserve System; UC Merced)
- Martha will send out emails to introduce Becca and Naim since she's based at the UC Merced campus. (DONE)
- Works at the Conservation Biology Institute, a non-profit that does ecological research for others.
- Very interested in having a call with iPlant people to see if there are areas in which iPlant could alleviate some of their infrastructure challenges.
- Use ESRI for geospatial data, but are interested in other options.
- Need access to additional computational power for ecological modeling.
- Offered to help get scientists she works with to review the Ecological Modeling straw proposal Ramona drafted.
- Martha schedule a call.
Workshop organizers
- Martha and Nirav will discuss the appropriate way to follow up with the organizers to (if appropriate) send them descriptions of the CI solutions iPlant has in place. The solutions were mentioned during the workshop in break out sessions and with the whole group, but it could be useful to send a written summary. It was a bit awkward at times since the purpose of the workshop was to gather information to support a proposal to NSF to build computational support for this group of scientists.