D) Rename transcripts

Rename transcripts (app: Linux stream editor)

Description: The names of the coding sequences and peptide sequences within the "best_candidates.eclipsed_orfs_removed" files are long and awkward for downstream processing, but they do contain useful information. Renaming them is not essential, but can be a helpful way of keeping the information most useful to the user available at the sequence level. This step uses a general Linux tool for text editing and file reformatting to rename sequences. Documentation: http://pipeline.lbl.gov/code/3rd_party/licenses.win/sed-4.1.5/sed.html.

Log into the Discovery Environment: https://de.iplantcollaborative.org/de/.
Open the Linux stream editor app (Public Applications > General Utilities > Text and Tabular Data > Linux stream editor).
1. Change 'Analysis Name' to Rename_Transcripts, add a 'Description' (optional), and use the default 'output folder'.
Click on the Inputs and Settings tab.
1. Click on the 'Command script' field and enter " s/ .*) /_/ ".
2. Click on the 'Input File' field. Browse to the folder that holds the files that include the coding sequences found in the transcript sequences (best_candidates_eclipsed_orfs_removed.cds) and their matching peptide sequences (best_candidates_eclipsed_orfs_removed.pep) (Sample data: Community Data > iplant_training > rna-seq_without_genome > D_rename_transcripts).
3. Select the file best_candidates_eclipsed_orfs_removed*.cds*, then click on OK.
4. Click on the 'Output File' field. Set output file name to 'BA_transcripts_cds.fa'.
Click on "Launch Analysis".
Repeat this analysis with best_candidates_eclipsed_orfs_removed*.pep* as the input file, changing the 'Analysis Name' to Rename_PEP_Sequences, and changing the output file to 'BA_transcripts_peptides.fa'. (Naming consistency is helpful because the results from mapping the peptides for annotation will be used to rename the coding sequence files.)
Click on 'Analyses' from the DE workspace and monitor the 'Status' of the analysis (e.g., Idle, Submitted, Pending, Running, Completed, Failed).
1. Once launched, an analysis will continue whether the user remains logged in or not.
2. Email notifications update on the analysis progress; they can be switched off under 'Preferences'.
3. If the analysis fails or does not proceed in the anticipated timeline, check these tips for troubleshooting. (Using the sample data, the analysis should be complete in less than 5 minutes.)
4. To re-run an analysis, click the analysis "App" in the 'Analyses' window.
Access analysis results in one of two ways:
1. In the 'Analyses' window click on the analysis "Name" to open the output folder.
2. In the 'Data' window, click on user name, then navigate to the folder that holds the output of the analysis. (Find the output for the sample at Community Data > iplant_training > rna-seq_without_genome > D_rename_transcripts > output_from_sample_data.)
The output file 'BA_transcripts_peptides.fa' will be used to annotate the transcript protein sequences in Sections F through I. The output file 'BA_transcripts_cds.fa' will be brought into the workflow again in Section J (Annotate transcripts), when the annotations are mapped back from the peptide sequences to the transcriptome nucleotide sequences. Then, in Section K (Map RNA-Seq reads to transcripts), the differentially expressed reads will be mapped to the transcriptome that, at that point will consist of annotated transcripts and transcripts for which annotations could not be determined.