Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Table of Contents
 

Introduction and Overview

...

  • Query lincRNA file: The query lincRNA file can be taken straight from Evolinc-I, or it can be obtained from a published or private resource. Depending on your downstream goals, you may want to screen your lincRNA file against known RNAs using the rFAM batch BLAST tool located here. RNAs stored in rFAM that typically show up in our datasets are snoRNAs, miRNAs, and occasionally rRNAs. These will skew the apparent conservation of lincRNA loci, as they exhibit on average higher sequence conservation than most lincRNAs in our experience. It may also be useful for the user to merge lincRNA isoforms derived from the same locus into one longer lincRNA, as this can also bias the final results.

  • Folder of files: This folder can be anywhere within your directory on the Data Store. However, all files should be in the same folder and should be named exactly as they appear in the BLASTing list.

  • Species list: This file is a single column list of 4-letter format species IDs as mentioned above. The list needs to be arranged roughly according to phylogenetic relationships. If the exact relationships are not known, make an educated guess. For most published genomes, relationships are known at least at the level of genera.

    Tip

    Make sure the Species list does not contain any carriage returns.

    For the example above, the species list would be:

    Info
    iconfalse

    Atha
    Alyr
    Crub
    Brap
    Aara

  • BLAST e-value: Lastly, an e-value is required for all pairwise comparisons. We would recommend empirically determining this, but start at 1e-20. We have found that lowering the e-value substantially (1e-5) doesn't pull out a large number of false positives, due to our inclusion of a reciprocal BLAST step. Typically low e-value hits do not BLAST back to the same locus, and are therefore removed. The only issue that may arise with low e-values is that time to completion of the job may go up as more sequences are processed at each step.

...