Create BLAST database-2.6.0+

Rationale and Background

The makeblastdb application produces BLAST databases from FASTA files. In the simplest case the FASTA definition lines are not parsed by makeblastdb and may be completely unstructured. The text in the definition line will be stored in the BLAST database and displayed in the BLAST report

Mandatory arguments

  • Input file: Path to the query file name. Nucleotide sequences in fasta format or Amino acid sequences in fasta format
  • Input Sequence Format: Type of sequence formats of the input files (nucleotide or Protein)
  • Input type: Type of data specified in input file (Fasta or ASN1 (txt) or Blastdb)
  • Prefix to use for database: Database name

Parameters

  • Title for the database: Title for BLAST database (Default = input file name provided)
  • File containing masking data (csv format): Comma-separated list of input files containing masking data as produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)
  • Max per file size: Maximum file size for BLAST database files (Default = `1GB')

Test Run

All files are located in the Community Data directory of the CyVerse Discovery Environment at the following path:

Community Data > iplantcollaborative > example_data > makeblastdb (/iplant/home/shared/iplantcollaborative/example_data/makeblastdb)

Mandatory arguments: 

  • Input file: plant.118.1.genomic.fna/plant.118.protein.faa
  • Input Sequence Format: Nulceotide/Protein
  • Input type: Fasta
  • Prefix to use for database: blastdb_n/blastdb_p

Parameters:

  • Leave these as default

Output

  • With plant.118.1.genomic.fna as input file and nucleotide as sequence format
    • blastdb_n.nhr
    • blastdb_n.nin
    • blastdb_n.nsq
  • With plant.118.protein.faa as input file and protein as sequence format
    • blastdb_p.phr
    • blastdb_p.pin
    • blastdb_p.pseq

The Blastp-2.6.0+ and  Blastn-2.6.0+ apps take a folder as input for the database because there are multiple files involved. The best thing to do when you are creating a database is to give the database and the output file the same name e.g. "mygenome". Then after it has run, make a new folder inside the output directory, name it "mygenome", and drag all the database files into it, but not the logs directory. You can then drag that directory "mygenome" to one of your other directories so it will be easy to find. When you run Blastp or Blastn drag and drop the database directory you created into the database input for Blastp/Blastn

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org or click the intercom button on this page. Thank you.

References

For more options of makeblastdb visit this page