Transposable element annotation on JeStream
Tools prerequiste :
Â
scripts and intermediate files used to annotate TEs in Jiao et al. 2016
LTR Retrotransposons
scripts in ltr
Software needed:
ncbi blast+
genometools, (download), need to pass
64bit=yes with-hmer=yes threads=yes
to make, make install for ltrdigest hmm searches in parallel. I also had to passcairo=no
as well because I didn't have the right cairo libraries and it wouldn't compile otherwisesilix, (download), need to compile with
--enable-mpi
and--enable-verbose
hmmer (genometools with download and compile hmmer2 if you run
make with-hmmer=yes
)
Files needed, can be downloaded by get_tRNA_hmm_dbs.sh
in ltr
directory:
download hmms of TE protein coding domains from gydb in directory
gydb_hmms
, will be used to identify protein coding domains of TE models-need to fix a hmm with name ty1/copia because this is used as a filename by ltrdigest. to remove the forward slash:
sed -i "s#ty1/copia#ty1-copia#g" gydb_hmms/GyDB_collection/profiles/AP_ty1copia.hmm
download tRNAs of all eukaryotes
SINEs
Scripts in sine/
Software needed:
SINE-Finder, download (This is a supplemental file at The Plant Cell; need to make executable, and rename to sine_finder.py)
- I cannot make SINE-Finder function on reverse sequences. So I'm reporting SINEs only on the forward stand here, and will pick up sequences on the reverse strand with RepeatMasker.
LINEs
Scripts in line/
Software needed:
- MGEScanNonLTR I use the version generated for Galaxy here.
TIR including MITEs
Scripts in tir/
Software needed:
mTEA, genometools (see above, already installed for ltr annotation)
- mTEA needs fasta36 (specifically ggsearch36), bioperl, blast, muscle, supplied blogo directories to be put into PERL5LIB and PATH
Helitrons
Scripts in helitron/
Software needed:
Finding Homologous Fragments from Degraded TEs
Software needed:
- RepeatMasker, with prerequisites here
Â
Step 1: Git clone the repo
$ git clone https://github.com/mcstitzer/maize_v4_TE_annotation.git $ cd maize_v4_TE_annotation/ $ ls helitron line ltr README.md sine tir
Â
Step 2: To predict structural LTRs
2.1 predict LTRs:
download tRNA and GyDb HMMs using
get_tRNA_hmm_dbs.sh
, which are needed forltrdigestbut LTR TEs are nested, so we need to remove these copies and rerun. This is done in
mask_subtract
$ cd ltr $ sh get_tRNA_hmm_dbs.sh
This will download the tRNA database for all Eukaryotes
run_ltrharvest.sh
runs ltrharvest and ltrdigest on the genome
$ cd ltr $ sh get_tRNA_hmm_dbs.sh