Using Metadata with the CyVerse Data Store
What is metadata?
Metadata is data about a data file or folder that describes its contents and the context of its contents for purposes of locating and sharing that item. It is a useful way to recall specific information about a file or folder — such as a tool name, version, or settings you used when submitting an analysis — when you want to retrieve the item, help others retrieve it, or help collaborators with whom you share the item to understand the context of its contents (e.g, when and how it was collected, who collected it, or what were the input parameters or specimens used).
Metadata consists of attributes (a changeable characteristic of the item, i.e., filetype), values (the type of characteristic, i.e., text), and units — commonly referred to as AVUs or Attribute-Value-Unit triples.
About the metadata databases
Prior to the 2.7 release, all metadata both from within the Discovery Environment (DE) and via iCommands was written to and accessed from the iRODS metadata database. Because the iRODS metadata database allows only three fields (A, V, and U), it is not ideally suited for the dense data needs of semantic data (e.g., metadata linked to ontology terms).
As a result, CyVerse launched the CyVerse metadata database with the 2.7 release of the DE. This new database can store an unlimited number of metadata fields, providing a much more scalable framework for the needs of semantic data used in the CyVerse Data Store. Metadata created in the DE after the 2.7 release is stored in the new CyVerse metadata database.
Metadata created before the 2.7 release as part of a metadata template (e.g., SRA submission or DOI request) was already stored in and remains in the DE metadata database.
Metadata that was created before the 2.7 release without a template remains safely stored in the iRODS metadata database. That metadata can still be read within the DE, and both written read using iCommands imeta. Users can transfer their data from the iRODS metadata database to the CyVerse metadata database, but because it is no longer possible to create iRODS metadata from within the DE, the operation cannot be undone. User who require command-line access to metadata should continue to use iCommands until command-line access to the DE metadata database is available.
Both databases are indexed, searchable, and accessible through the DE.
Using metadata in the Discovery Environment
The CyVerse metadata database is the primary repository for DE metadata storage. Based on the familiar AVU-triples (Attributes-Values-Units) foundation of its predecessor iRODS metadata database, the new CyVerse metadata database allows an unlimited number of AVU combinations. For example, users now can use the same attribute with more than one value or more than one type of unit, significantly expanding the degree of metadata that can be stored for data analysis and retrieval.
In addition to the new metadata database, we have expanded the capabilities of metadata templates. Where previously, users could attach only one template to a data item, now users can attach an unlimited number of templates to the same data item. Once attached, any values or units that match an attribute in the AVU populate the fields in the template.
Access to DE metadata via command-line using the DE API will be available in a future release.
For information on using metadata and bulk metadata in the DE, see Using Metadata in the DE.
About metadata templates in the DE
A metadata template allows you to apply the same attribute to different files and folders you own or have write permissions to, greatly simplifying metadata entry. Metadata templates are a way to allow users to enter and view metadata, which is then stored as individual attributes based on their identifiers. With the 2.7 release, you now can attach multiple templates to the same data file or folder.
You also can use the bulk metadata feature to add metadata to multiple files in the same or different folders.
For more information on using metadata templates, see Using Metadata in the DE - Using metadata templates.
Using metadata via command-line with iRODS imeta commands
All data that was created prior to the 2.7 release without a metadata template can still be read and written after the 2.7 release with imeta iCommands (where it remains in the iRODS metadata database), and read through the DE. You can import that metadata to the CyVerse metadata database in the DE; however, be aware that this operation cannot be undone. For instructions on importing iRODS metadata to the CyVerse metadata database, see Using Metadata in the DE.
For information on adding metadata to a file or folder using imeta, see Adding Metadata to a File Using iRODS imeta (Metadata) Commands.