Validate

Validate: Known-Truth Analysis and Simulation Testing Workflow/Pipeline

Community rating: ?????

Understanding the effectiveness of Genome-Wide Association (GWAS) and Quantitative Trait Loci (QTL) analytical tools under various situations is crucial to deciding which tools are best given a particular problem. Validate provides a way to return classification and estimation performance metrics for large quantities of tool outputs generated from known-truth simulations. We also provide solutions for aggregating hundreds or thousands of outputs in to a single folder on the iPlant data store, so that Validate can be used.

Quick Start

  • To use Validate, merge all GWAS or QTL tool result files to a single folder in your data store and select the application Validate in the Discovery Environment.

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> Validate -> Validate_Test_Data.

Input File(s)

Use the entire folder from above as an input for Validate. Within Community Data -> iplantcollaborative -> example_data -> Validate there are two other files to use as inputs. They are titled, truth.txt and betas.txt. These both contain information about the "known truths" of our simulations used in the analysis.

Parameters Used in App

When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.

  • Use these parameters within the DE app interface under Inputs:** 
    • Folder containing GWAS/QTL app outputs - SELECT Community Data -> iplantcollaborative -> example_data -> Validate -> Validate_Test_Data 
    • Truth file - SELECT Community Data -> iplancollaborative -> example_data -> Validate -> truth.xt
    • Effect size file - SELECT Community Data -> iplantcollaborative -> example_data -> Validate -> betas.txt
  • Use these parameters within the DE app interface under Column Names and Additional Options:** 
    • Column name of variable measuring SNP importance - p_wald
    • Column name of variable containing SNP names - rs
    • Column name of variable containing estimated SNP effect - beta
  • Only the first severity ratio should be used; for best results, use either 1 or preferably the number of positive cases divided by the number of negative cases

Output File(s)

The output file should be a text file named Results.txt.