GACT

GACT: Genome Assembly Challenge Team

Objectives

The iPlant Collaborative is keenly interested in providing support for draft-level genome assembly via a combination of our resources and some of the machines available on the TeraGrid. The folks in this group are known to be doing assembly operations on a host of architectures, using a variety of algorithms and approaches, to assemble the genomes of a veritable garden of species. The group will discuss your current and future sequencing projects and will work to identify:

  • Best practices and strategies for general sequencing bioinformatics
  • Best practices and strategies for genome assembly
  • An application stack that will be broadly useful on the various TeraGrid systems
  • Updates to TACC's operations or system configurations that would facilitate genome assembly tasks
  • The optimal systems on which to support genome assembly
  • Potential ways to present basic assembly in the iPlant Discovery Environment

Plans and Progress

Deliverable

Timeframe

Abyss PE assembler

  • Installed on Ranger
  • Foundational API
  • Integrated into DE (dev site)
  • Available into DE (live site)
  • Documentation

progress:

  • done
  • done
  • done (8/3/11)
  • done
  • done (9/2011)

FastQC tool for pre-filtering reads

  • Installed on Ranger
  • Foundational API
  • Integrated into DE (live site)
  • DocumentatioP

progress:

  • done
  • done
  • done
  • done

Tools to support multiple, mixed size libraries for input into Abyss

  • Combine all files into tarball (DE dev site)
  • Interleave paired-end files into single file (DE dev site)
  • Available on DE live site
  • Documentation

progress:

  • done (8/3/11)
  • done (8/3/11)
  • done
  • done

Stand-alone Kmer analysis-Tallymer or custom script

Fall 2011 (?)

Basic statistics analysis on FASTA files

  • seqstats2.pl (FASTA counts and length distributions table)
    • Integrated into DE (dev site)
    • Available on DE live site
    • Documentation
  • Graphical output (using R package?)

progress:

  • done (7/29/11)
  • done (7/29/11)
  • done
  • done
  • Fall 2011 (?)

External advisers/users/testers
Candidates include

  • Members of Ware lab
  • Stacy Smith Lab (U. Nebraska)

Ongoing

Test RNA-seq transcriptome read data

  • Solanum peruvianum raw reads and test assembly
    • User agreement and permission letter (Zach Lippman)
  • Solanum lycopersicum raw reads and test assembly
    • User agreement and permission letter (Zack Lippman)
  • Additional data sets
    • model and non-model transcriptomes
    • different sequence technologie

progress:

  • done (7/28/11)
  •  
  • done (7/28/11)
  •  
  • Fall 2011

Test Genomic data (model/non-model, sequence technologies)

SRA? urgent

Gene Space Discovery & Annotation

  • NCBI BLASTX SGE pipeline & parser
    • Top hit option on DE dev site
      • Available on DE live site
    • All hits option
      • Developed
      • Integrated in DE
  • Reference Blast Databases
    • NCBI RefSeq v47 Plants/plastids
    • Additional  (e.g. RefSeq plastid/mitochondria)
  • Homology Analysis Package
    • Per contig report of top hit (e.g. coverage data)
    • Summary quality output (coverage, query-hit length ratios, etc)
    • Diversity (Proportion of complete reference genome found)
    • Script generated README
    • Integrated into DE dev site
    • Available on DE live site
    • Documentation
  • Support for Functional Annotation
    • InterPro/GO on reference genomes: Rice, Arabidopsis,
    • Context with other projects

progress:

  •  
  • done (7/28/11)
  • done
  • done (7/28/11)
  • done (7/28/11)
  • ?
  •  
  • done (7/28/11)
  • Aug '11?
  •  
  • done (7/28/11)
  • done (7/28/11)
  • done (7/28/11)
  • done (8/3/11)
  • done
  • done
  • done
  •  

Additional Assembly tools

  • Trinity
    • Built on Ranger & Lonestar
    • Jobs API
  • SOAP denovo ?
  • Velvet?
  • ALLPATHS
    • Built an running on Ranger
  • Euler?

progress:

  •  
  • Done (7/28/11)
  • Aug '11
  • Fall 2011?
  •  
  •  
  • Done Oct '11

Read filters quality control

  • decGPU
    • In-Progress
    • DE integration
  • Quake

Progress:

  •  
  • ongoing
  • Oct. 2011
  • Fall 2011?

Annotation Planning

Resources

Meeting Agendas and Notes

Potential Outside Advisers

Outstanding Action Items

  • None