Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

MAKER Genome Annotation and gene editing using Apollo

Rationale and background:

MAKER-P is a flexible and scalable genome annotation pipeline that automates the many steps necessary for the detection of protein coding genes (Campbell et al. 2013). MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices.  MAKER was developed by the Yandell Lab and is described in several publications (Cantarel et al. 2008; Holt & Yandell 2011).  Additional background is available at the MAKER Tutorial at GMOD and is highly recommended reading

Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world. The task of manual curation is spread out among many hands and eyes, enabling the creation of virtual research networks of researchers linked by a common interest in a particular organism or population.

This tutorial will take users through steps of:

  1. Running MAKER on Jetstream cloud
  2. Running downstream qaulity control tools on the predicted genes
  3. Running Apollo gene editing tool to get highly curated gene annotations

Considerations

Sounds great, what do I need to get started?

  1. XSEDE account
  2. Later on, they can request a startup XSEDE allocation
  3. Your data (or you can run example data)

What kind of data do I need?

  1. Mandatory requirements
    1. Genome assembly (fasta file)
    2. Organism type
      1. Eukaryotic (default, set as: organism_type=eukaryotic)
      2. Prokaryotic (set as: organism_type=prokaryotic)
  2. Additional data that can be used to improve the annotation (Highly recommended)
    1. RNA evidence (at least one of them is needed)
      1. Assembled mRNA-seq transcriptome (fasta file)
      2. Expressed sequence tags (ESTs) data (fasta file)
      3. Aligned EST or transcriptome GFF3 from your organism
      4. Aligned EST or transcriptome GFF3 from a closely related organism
    2. Protein evidence
      1. protein sequence file in fasta format (i.e. from multiple organisms)

      2. protein gff (aligned protein homology evidence from an external GFF3 file)

  3. For this particular tutorial we will use maize specific test data.

What kind of resources will I need for my project?

  1. Enough storage space on the MAKER-P Jetstream instance for both input and output files
    1. Creating and mounting an external volume to the running MAKER-P instance would be recommended
  2. Enough AUs to run your computation

Part 1: Connect to an instance of an MAKER Jetstream Image (virtual machine)

Step 1. Go to https://use.jetstream-cloud.org/application and log in with your XSEDE credentials.

...

We will iCommands a service from iRODS for transfering evidence data from Cyverse data commons repositiry. iCommands is a collection of commands for Linux and Mac OS operating systems that are used in the iRODS system to interact with the CyVerse Data Store. iCommands can used to transfer large amounts from CyVerse data to the running JetStream instance. Complete list of iCommands and its usage is here

The first time you use iCommands, you must initiate the connection to iRODS.

 

  1. In a terminal window, enter iinit to initialize iCommands and your Data Store connection. For example, here's what you would do if your iRODS user name is cyverse-user:

    Code Block
    kap12@js-156-187:/vol_b/run_data$ iinit
    One or more fields in your iRODS environment file (irods_environment.json) are
    missing; please enter them.
    Enter the host name (DNS) of the server to connect to: data.cyverse.org
    Enter the port number: 1247
    Enter your irods user name: cyverse-user
    Enter your irods zone: iplant
    Those values will be added to your environment file (for use by
    other i-commands) if the login succeeds.
    
    Enter your current iRODS password:
    kap12@js-156-187:/vol_b/run_data$

     

     

  2. Once iinit has been finished, type ils to check that iCommands is working. You should see your home directory at /iplant/home/your_user_name

  3. Download the evidence set required for annotation

    Code Block
    $ iget -PVr /iplant/home/shared/commons_repo/curated/MaizeCode_annotation_evidence_data_2017 .
    $ mv MaizeCode_annotation_evidence_data_2017/* .

 

Part 3: Set up a MAKER run using the Terminal window

 

Step 1. Get oriented. You will find your test data within your mounted volume "/vol_b/run_data"  List its contents with the ls command:

...

Code Block
test.all.gff- 'MAKER generated annotaiton file'
test.all.maker.augustus_masked.proteins.fasta
test.all.maker.augustus_masked.transcripts.fasta
test.all.maker.non_overlapping_ab_initio.proteins.fasta
10:22 test.all.maker.non_overlapping_ab_initio.transcripts.fasta
test.all.maker.proteins.fasta- 'MAKER generated proteins file'
test.all.maker.transcripts.fasta- 'MAKER generated transcripts file'

Part 4: Quality control of annotated genes

Once the MAKER run is finsihed, the next step is to filter out missannotated and low evidence supporting gene models. Below  section descirbes some details to filter out such gene models.

...