CharaParser Markup

CharaParser Markup

Community rating: 

CharaParser Learn and CharaParser Markup  makes a complete text mining pipeline for converting morphological descriptions of various taxa to a structured format (i.e. XML) that can be used to generate taxon-character matrices for biology research. The user would run CharaParser Learn first, and then be directed to a website (OTOLite) to review and categorize a set of terms the software learned, and then run CharaParser Markup to produce the final results. The current setup parses only plant morphological descriptions (see Test Data below).

Quick Start

  • After you've completed CharaParser Learn and categorized the terms on OTOLite,
  • Start CharaParser Markup
    • Select an ?{Analysis Name}.learn file, generated by a previous run of CharaParser Learn on the input data
    • Select your input data
    • Select the taxon group that best describes your input data
    • Launch analysis
  • Open the output data of the analysis
  • The result/*.xml files contain the marked up taxon descriptions
  • Optionally check charaparser_log/*.log files for any errors that may have occurred during execution

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> CharaParser.

The further required input file {Analysis Name}.learn can be generated from a run of CharaParser Learn on the test data. 

Input File(s)

  • Use ?{Analysis Name}.learn from a previous run of CharaParser Learn
  • Use the input directory found in the above directory as test input

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

  • Taxon group: Plant

Output File(s)

Expect a result directory as output. For the test case, one possible result directory you may find is given in the example_data directory.

Tool Source for App