/
CLUSTALW_Workflow_Example
CLUSTALW_Workflow_Example
Navigate space
Purpose
- This is an example workflow that demonstrates how to use CLUSTALW to do a multiple sequence alignment from the command line. It is also to demonstrate how to run this program in non-intractive mode, the first step to programmatic wrapping.
- The starting point is DNA sequences
Prerequisites
- Access to a linux/unix shell
- This work flow assumes that you have the BioPerl libraries and the CLUSTALW binary executables compiled and installed in your path, either by putting them in /usr/local/bin or by editing your $PATH environmental variable.
Note on CLUSTALW
Note that there are many ways to do multiple sequence alignments. This is just one example.
The DNA sequences
- I downloaded coding sequences (CDS) for actin genes from five metazoan species from NCBI.
- A complete CDS starts at the start codon (ATG; Methionine) and ends at the stop codon (TAG, TGA or TAA).
- Codons are three nucleotide units that encode particular amino acids or the stop-translation signal.
- This is a sample CDS for C. elegans, in FASTA format:
>c_elegans ATGTGTGACGACGAGGTTGCCGCTCTTGTTGTAGACAATGGATCCGGAATGTGCAAGGCCGGATTCGCCG GAGACGACGCTCCACGCGCCGTGTTCCCATCCATTGTCGGAAGACCACGTCATCAAGGAGTCATGGTCGG TATGGGACAGAAGGACTCGTACGTCGGAGACGAGGCCCAATCCAAGAGAGGTATCCTTACCCTCAAGTAC CCAATTGAGCACGGTATCGTCACCAACTGGGATGATATGGAGAAGATCTGGCATCACACCTTCTACAATG AGCTTCGTGTTGCCCCAGAAGAGCACCCAGTCCTCCTCACTGAAGCCCCACTCAATCCAAAGGCTAACCG TGAAAAGATGACCCAAATCATGTTCGAGACCTTCAACACCCCAGCCATGTATGTCGCCATCCAAGCTGTC CTCTCCCTCTACGCTTCCGGACGTACCACCGGAGTCGTCCTCGACTCTGGAGATGGTGTCACCCACACCG TCCCAATCTACGAAGGATATGCCCTCCCACACGCCATCCTCCGTCTTGACTTGGCTGGACGTGATCTTAC TGATTACCTCATGAAGATCCTTACCGAGCGTGGTTACTCTTTCACCACCACCGCTGAGCGTGAAATCGTC CGTGACATCAAGGAGAAGCTCTGCTACGTCGCCCTCGACTTCGAGCAAGAAATGGCCACCGCCGCTTCTT CCTCTTCCCTCGAGAAGTCCTACGAACTTCCTGACGGACAAGTCATCACCGTCGGAAACGAACGTTTCCG TTGCCCAGAGGCTATGTTCCAGCCATCCTTCTTGGGTATGGAGTCCGCCGGAATCCACGAGACTTCTTAC AACTCCATCATGAAGTGCGACATTGATATCCGTAAGGACTTGTACGCCAACACTGTTCTTTCCGGAGGAA CCACCATGTACCCAGGAATTGCTGATCGTATGCAGAAGGAAATCACCGCTCTTGCCCCATCAACCATGAA GATCAAGATCATCGCCCCACCAGAGCGCAAGTACTCCGTCTGGATCGGAGGATCTATCCTCGCTTCCCTC TCCACCTTCCAACAGATGTGGATCTCCAAGCAAGAATACGACGAGTCCGGCCCATCCATCGTTCACCGCA AGTGCTTCTAA
- View the whole FASTA file
Doing the multiple sequence alignment with CLUSTALW
menu-driven interface
- CLUSTALW can be run from the command line
- It is a binary executable that uses interactive menus
- A basic multiple sequence alignment starts with loading the file (select option 1, then enter the filename, actin.fa)
$ clustalw ************************************************************** ******** CLUSTAL 2.0.9 Multiple Sequence Alignments ******** ************************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice:
- Then do the alignment by choosing option 2.
****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now Slow/Accurate 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options I. Iteration = NONE S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:
- Then select option 1, and choose the default output file names when prompted. The alignments will be performed and saved to a file as well as printed to the screen.
Enter a name for the CLUSTAL output file [actin.aln]: Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 85 Sequences (1:3) Aligned. Score: 88 Sequences (1:4) Aligned. Score: 90 Sequences (1:5) Aligned. Score: 89 Sequences (2:3) Aligned. Score: 83 Sequences (2:4) Aligned. Score: 86 Sequences (2:5) Aligned. Score: 85 Sequences (3:4) Aligned. Score: 86 Sequences (3:5) Aligned. Score: 86 Sequences (4:5) Aligned. Score: 94 Enter name for new GUIDE TREE file [actin.dnd]: Guide tree file created: [actin.dnd] There are 4 groups Start of Multiple Alignment Aligning... Group 1: Sequences: 2 Score:19741 Group 2: Sequences: 2 Score:20738 Group 3: Sequences: 4 Score:19601 Group 4: Sequences: 5 Score:19209 Alignment Score 74162 Consensus length = 1131 CLUSTAL-Alignment file created [actin.aln] CLUSTAL 2.0.9 multiple sequence alignment b_xylophilus ATGTGTGACGAAGAAGTTGCCGCTCTTGTTGTGGACAATGGCTCCGGTATGTGCAAAGCC p_magellanicus ATGTGTGACGACGAGGTAGCAGCTTTAGTAGTAGACAATGGCTCCGGTATGTGCAAGGCC c_elegans ATGTGTGACGACGAGGTTGCCGCTCTTGTTGTAGACAATGGATCCGGAATGTGCAAGGCC c_briggsae ATGTGTGACGACGAGGTTGCAGCTCTCGTAGTGGACAATGGCTCCGGAATGTGCAAGGCC c_oncophora ATGTGTGACGACGAGGTTGCTGCTCTTGTGGTTGACAATGGATCCGGAATGTGCAAAGCC *********** ** ** ** *** * ** ** ******** ***** ******** *** b_xylophilus GGTTTCGCCGGAGATGATGCCCCACGTGCCGTCTTCCCCTCCATTGTCGGAAGACCCCGT p_magellanicus GGGTTCGCCGGAGACGATGCTCCACGCGCTGTGTTCCCCTCCATTGTTGGAAGGCCCCGT c_elegans GGATTCGCCGGAGACGACGCTCCACGCGCCGTGTTCCCATCCATTGTCGGAAGACCACGT c_briggsae GGATTTGCCGGAGACGATGCTCCACGCGCCGTCTTCCCATCCATCGTTGGACGCCCAAGA c_oncophora GGATTTGCCGGAGATGACGCTCCTCGAGCTGTCTTCCCCTCCATCGTCGGCCGACCCCGT ** ** ******** ** ** ** ** ** ** ***** ***** ** ** * ** * b_xylophilus CATCAAGGTGTCATGGTCGGTATGGGACAGAAGGACTCCTATGTCGGAGACGAGGCCCAG p_magellanicus CACCAGGGTGTCATGGTTGGTATGGGTCAGAAAGACAGCTACGTAGGAGATGAAGCTCAG c_elegans CATCAAGGAGTCATGGTCGGTATGGGACAGAAGGACTCGTACGTCGGAGACGAGGCCCAA c_briggsae CATCAAGGAGTCATGGTCGGTATGGGACAGAAGGACTCGTACGTCGGAGACGAGGCTCAA c_oncophora CACCAGGGTGTCATGGTTGGTATGGGACAGAAGGACTCGTACGTAGGAGACGAGGCTCAG Press [RETURN] to continue or X to stop:
- You are done, the alignment file is named actin.aln
Using CLUSTALW non-interactively
- A menu driven-interface is not useful for pipeline or programatic access.
- Fortunately, we can run the application by passing the commands via STDIN
- This is accomplished by creating a text file with the sequence of commands in it.
1 actin.fa 2 1 actin.aln actin.dnd X X X
- Annotated version:
- select menu option one, load the input file
1 actin.fa
- select option 2 (multiple alignments); option 1 runs the alignment.
2 1
- provide output file names for the alignments and guide tree files
actin.aln actin.dnd
- exit from alignment display; alignment menu; main menu
X X X
- select menu option one, load the input file
- To run the program non-interactively, save the commands as clustalw_commands.txt, then run CLUSTALW using this incantation:
$ clustalw <clustalw_commands.txt
- program output will scroll rapidly on screen and also save the multiple sequence alignments in actin.aln
Other ways to access CLUSTALW non-interactively
- Assembling low-level commands into a file can be tedious
- There are pre-rolled wrappers available, such as BioPerl and BioJava.
- See an example of how to do the above alignments using BioPerl