F) Map transcripts

Map transcripts (app: Blat (with options))

Description: Use Blat (with options) to map the translated transcripts in the renamed peptide sequence file from Section D (Rename transcripts) against the refseq protein database in FASTA format. (The alignment app Blat (with options) does not require a pre-made index and allows many options to be set.) As described in Section E (Split RefSeq file), the FASTA file for the refseq database was split into 3 smaller files to reduce the amount of memory used by Blat and to complete the mapping in a couple of hours. Documentation: http://www.kentinformatics.com/products.html.

Log into the Discovery Environment: https://de.iplantcollaborative.org/de/.
Open the Blat (with options) app (Public Applications > NGS > Aligners > Blat (with options)).
1. Change 'Analysis Name' to Map_Transcripts_0, add a 'Description' (optional), and use the default 'output folder'.
Click on the Input sequences files tab.
1. Click on the 'reference' field. Browse to the folder that holds the reference sequence (Sample data: Community Data > iplant_training > rna-seq_without_genome > F_map_transcripts > RefseqProtein.0). Select the file, then click on OK.
2. Click on the 'query' field. Browse to the folder that holds the renamed .pep file from Section D (Rename transcripts) (Sample data: Community Data > iplant_training > rna-seq_without_genome > F_map_transcripts > BA_transcripts_peptides.fa). Select the file, then click on OK.
Click on the Output file tab.
1. Change the output file name to 'BA_trnsPep_v_refseq0.psl'.
Click on the Options tab.
1. Select 'protein alignment', 'output has no header', 'fine mapping', and 'extend through N'.
Click on "Launch Analysis".
Repeat this analysis with any remaining reference sequence files that were generated in Section E (Split RefSeq file) (Sample data: RefseqProtein.1, RefseqProtein.2).
1. Change 'Analysis Name' accordingly (i.e. Map_Transcripts_1, Map_Transcripts_2).
2. Change the output file name to match the inputs (i.e. BA_trnsPep_v_refseq1.psl, BA_trnsPep_v_refseq2.psl).
Click on 'Analyses' from the DE workspace and monitor the 'Status' of the analysis (e.g., Idle, Submitted, Pending, Running, Completed, Failed).
1. Once launched, an analysis will continue whether the user remains logged in or not.
2. Email notifications update on the analysis progress; they can be switched off under 'Preferences'.
3. If the analysis fails or does not proceed in the anticipated timeline, check these tips for troubleshooting. (Using the sample data, the analysis should be complete in about 2 hours.)
4. To re-run an analysis, click the analysis "App" in the 'Analyses' window.
Access analysis results in one of two ways:
1. In the 'Analyses' window click on the analysis "Name" to open the output folder.
2. In the 'Data' window, click on user name, then navigate to the folder that holds the output of the analysis. (Find the output for the sample at Community Data > iplant_training > rna-seq_without_genome > F_map_transcripts > output_from_sample_data.)
Blat, the blast-like alignment tool, runs faster than using Blastp for this step.