Author(s): Dr. Upendra Kumar Devisetty, CyVerse/University of Arizona and Dr. Andrew D. L. Nelson, School of Plant Sciences, University of Arizona

Introduction

Evolinc is a two-part pipeline to identify lincRNAs from an assembled transcriptome file (.gtf output from cufflinks) and then determine the extent to which those lincRNAs are conserved in the genome and transcriptome of other species.

...

Info
Note, currently Evolinc only identifies intergenic non-coding RNAs. We will incorporate identification of all lincRNAs (including natural antisense, overlapping, and those of intra-genic/intronic origins) in a later version. This is a tutorial for first part of the pipeline as a distinct Atmosphere image.

Accessing Evolinc

This tutorial will take users through steps of:

...

Warning

title	Learn about allocations

Learn about CyVerse's allocation policies here.

Part 1: Connect to an instance of an Atmosphere Image (Virtual Machine)

Step 1. Go to https://atmo.iplantcollaborative.org and log in with your cyverse credentials.

...

Note: Instances can be configured for different amounts of CPU, memory, and storage depending on user needs. This tutorial can be accomplished with the small instance size, medium1 (4 CPUs, 8 GB memory, 80 GB root)

Part 2: Set up a Evolinc run using the Terminal window

Step 1. Open the Terminal. Enter the ssh, username along with your IP address to connect the instance through the terminal

...

Code Block

language	bash

$ ./evolinc-part-I.sh -h
 
Usage : sh evolinc-part-I.sh -c cuffcompare -g genome -r CDS [-b TE_RNA] [-t CAGE_RNA] [-x Known_lincRNA]
	-c </path/to/cuffcompare output file>
	-g </path/to/reference genome file>
	-r </path/to/cDNA reference file>
	-b </path/to/Transposable Elements file>
	-t </path/to/TSS file>
	-x </path/to/Known lincRNA file>
	-h Show this usage information

Explanation of the code line

-c: Cuffcompare output file in gtf format
-g: Reference genome file in fasta format
-r: Reference cDNA file in fasta format
-b: Transposable elements file in fasta format
-t: TSS site file in gff format
-x: Known Long non coding RNA in gff format

Part 3: Running sample data

The staged example data can be found in 2 folders - "Evolinc/sample.data.arabi" and "Evolinc/sample.data.brapa" within "Evolinc" folder. List its contents with the ls command:

...

lincRNA_final_transcripts.fa - Final Long intergenic ncRNA transcripts in fasta format
lincRNA_final_transcripts.bed - Final Long intergenic ncRNA transcripts in bed format
lincRNA_final_transcripts.promoters.fa - Promoter sequences of the final Long intergenic ncRNA transcripts in fasta format
lincRNA_final_transcripts_counts.txt - File showing the number of transcripts left at every step of the pipeline
lincRNA_final_transcripts_demographics.txt - Final Long intergenic ncRNA transcripts demographics
lincRNA_CAGE_final_transcripts.fa - Final Long intergenic ncRNA transcripts that have overlapping with the TSS transcripts (generated only when you have TSS file)
lincRNA_overlapping_known_final_transcripts.fa - Final Long intergenic ncRNA transcripts that have overlapping with the known lincRNA (generated only when you have known lincRNA file)
lincRNA_final_transcripts_updated.gtf - Final updated cuffcompare output with the final Long intergenic ncRNA transcripts

Part 4: Trying out your data

Make sure that you make a folder within the Evolinc folder and upload your files in to that folder and run the above script. Either Cuffcompare or Cuffmerge output files are acceptable. Genome.fasta file should be the same to which you are aligning your transcriptomic data. The transposable element data set can be either from your species of interest or from a family of closely related species. For example, there is a maintained data set of Brassicaceae transposable elements that can be used to compare A. thaliana lncRNAs against. If you have not generated TSS data yourself, there are publicly available data sets of transcription start sites that may be useful, but only for a limited number of species. If there are multiple public data sets of known lncRNAs for your species that you would like to compare your set against, merge them into one gff document.

...

Version	Old Version 11	New Version Current
Changes made by	kkennedy	upendra kumar Devisetty
Saved on	Apr 19, 2016	May 03, 2016

Versions Compared

Key

Author(s): Dr. Upendra Kumar Devisetty, CyVerse/University of Arizona and Dr. Andrew D. L. Nelson, School of Plant Sciences, University of Arizona

Introduction

Accessing Evolinc

Part 1: Connect to an instance of an Atmosphere Image (Virtual Machine)

Part 2: Set up a Evolinc run using the Terminal window

Explanation of the code line

Part 3: Running sample data

Part 4: Trying out your data

Content Comparison

Versions Compared

Key

Author(s): Dr. Upendra Kumar Devisetty, CyVerse/University of Arizona and Dr. Andrew D. L. Nelson, School of Plant Sciences, University of Arizona

Introduction

Accessing Evolinc

Part 1: Connect to an instance of an Atmosphere Image (Virtual Machine)

Part 2: Set up a Evolinc run using the Terminal window

Explanation of the code line

Part 3: Running sample data

Part 4: Trying out your data