CharaParser Learn

CharaParser Learn

Community rating: 

CharaParser Learn and CharaParser Markup  makes a complete text mining pipeline for converting morphological descriptions of various taxa to a structured format (i.e. XML) that can be used to generate taxon-character matrices for biology research. The user would run CharaParser Learn first, and then be directed to a website to review and categorized a set of terms the software learned, and then run CharaParser Markup to produce the final results. The current setup parses only plant morphological descriptions (see Test Data below)

Quick Start

  • To use CharaParser Learn, import your data in UTF8 encoded XML format valid against the input schema
    • The description of your data is expected to have a correct sentence syntax e.g. matched brackets
  • Start CharaParser Learn
    • Give a meaningful analysis name
    • Select your input data
    • Select the taxon group that best describes your input data
    • Optionally, select that you do not want to categorize terminology and are okay with inferior results *
    • Launch analysis
  • Open the output data of the analysis
  • Open the created nextStep.html file and categorize learned terminology, if you didn't decide to skip this step
  • Use the {Analysis Name}.learn file for a successive launch of CharaParser Markup for the same input taxon descriptions and using the categorized terminology of nextStep.html.
  • Optionally check charaparser_log/*.log files for any errors that may have occurred during execution

* If selected, CharaParser Learn and subsequently CharaParser Markup will be launched. Hence, the output will contain the output of both of these apps. 

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> CharaParser.

Input File(s)

Use the input directory found in the above directory as test input.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

  • Taxon group: Plant

Output File(s)

Expect a nextStep.html and ?{Analysis Name}.learn file in the directory as output.

  • nextStep.html presents terminology that can be categorized in order to improve CharaParser's results on the provided input data
  • ?{Analysis Name}.learn can be used to obtain the marked up result files by a successive run of CharaParser Markup

Tool Source for App