GeneMANIA query runner

How to download the tool or source code including installation and usage instructions as well as any source code that might be associated with the executable. This should also include a listing of any dependencies for this tool or script.

Minimum system requirements

Java 1.5+ JVM
Queries that use more than ~1.5 GB of RAM require Cytoscape to be launched with a 64-bit JVM
2 GB RAM

Installation Instructions

The GeneMANIA plugin is distributed as a JAR file. Available through Cytoscape's main

plugin repository under the Network Inference category

http://chianti.ucsd.edu/cyto_web/plugins/displayplugininfo.php?name=GeneMANIA

A complete list of networks currently in the GeneMANIA system are available at

http://www.genemania.org/pages/networkList.jsf. Network data can be downloaded from within GeneMANIA plugin

Required version of the program necessary to perform the desired task
Version: 2.0
Release Date: 2010-12-01
Verified to work in: Cytoscape2.6,2.7,2.8

Sample dataset and expected results to be output

USAGE (32-bit JVM):

java -Xmx1800M -cp GeneMANIA.jar org.genemania.plugin.apps.QueryRunneroptions query-file-1 [ query-file-2 ... ]

USAGE (64-bit JVM):

java -d64 -Xmx3G -cp GeneMANIA.jar org.genemania.plugin.apps.QueryRunneroptions query-file-1 [ query-file-2 ... ]

Query file should contain the gene list and other parameters, tab delimited flat file.

Flat Query File Format:

organism name

query-gene-1 [ \t query-gene-2 ... ]

networks
related-gene-limit
weighing method

Organism options

Organism name Taxonomy ID

A. Thaliana 3702

C. Elegans 6239

D. Melanogaster 7227

H. Sapiens 9606

M. Musculus 10090

S. Cerevisiae 4932

Available network types

coexp co-expression

coloc Co-localization

gi Genetic interactions

pi physical interactions

predict predicted

spd Shared protein domains

other Networks that don't belong to any of the above types.

all all available networks

preferred shorthand for coexp gi pi

weighing methods

automatic The networks are weighted such that the query genes interact as much as possible(Default).

average All networks are weighted equally.

average_category Networks are weighted such that each type of network has the same overall weight.

For Organisms With GO Annotations:

bp Networks are weighted in an attempt to reproduce Gene Ontology Biological Process co-annotation patterns.

mf Networks are weighted in an attempt to reproduce Gene Ontology Molecular Function co-annotation patterns.

cc Networks are weighted in an attempt to reproduce Gene Ontology Cellular Component co-annotation patterns.

Example Query file: Query1.txt

A. thaliana
HY5 ELF3 PHYB PHYA
coexp
50
bp

Excerpt from the out put file query1.txt-results.report -

Gene Score Description
ELF3 ELF3 (EARLY FLOWERING 3); protein C-terminus binding / transcription factor
PHYB PHYB (PHYTOCHROME B); G-protein coupled photoreceptor/ protein histidine kinase/ red or far-red light photoreceptor/ signal transducer
HY5 HY5 (ELONGATED HYPOCOTYL 5); DNA binding / double-stranded DNA binding / transcription factor
PHYA PHYA (PHYTOCHROME A); G-protein coupled photoreceptor/ protein histidine kinase/ red or far-red light photoreceptor/ signal transducer
ATBETAFRUCT4 0.18 ATBETAFRUCT4; beta-fructofuranosidase/ hydrolase, hydrolyzing O-glycosyl compounds
TOC1 0.12 TOC1 (TIMING OF CAB EXPRESSION 1); transcription regulator/ two-component response regulator
SCL3 0.12 SCL3; transcription factor
F3H 0.12 F3H (FLAVANONE 3-HYDROXYLASE); naringenin 3-dioxygenase
AKIN11 0.11 AKIN11 (Arabidopsis SNF1 kinase homolog 11); protein binding / protein kinase
AtMYB32 0.11 AtMYB32 (myb domain protein 32); DNA binding / transcription factor
GA3 0.11 GA3 (GA REQUIRING 3); ent-kaurene oxidase/ oxygen binding
AT3G19100 0.09 calcium-dependent protein kinase, putative / CDPK, putative
COL9 0.09 COL9 (CONSTANS-LIKE 9); transcription factor/ zinc ion binding
ELIP2 0.09 ELIP2 (EARLY LIGHT-INDUCIBLE PROTEIN 2); chlorophyll binding
SPL7 0.09 SPL7 (SQUAMOSA PROMOTER BINDING PROTEIN-LIKE 7); DNA binding / transcription factor
AT2G25730 0.09 hypothetical protein
ATMRP4 0.09 ATMRP4 (ARABIDOPSIS THALIANA MULTIDRUG RESISTANCE-ASSOCIATED PROTEIN 4); ATPase, coupled to transmembrane movement of substances / folic acid transporter
AT5G48250 0.09 zinc finger (B-box type) family protein
AAE17 0.09 AAE17 (ACYL-ACTIVATING ENZYME 17); catalytic/ ligase
RHA2B 0.08 RHA2B (RING-H2 FINGER PROTEIN 2B); protein binding / ubiquitin-protein ligase/ zinc ion binding
MP 0.08 MP (MONOPTEROS); transcription factor
FLS 0.08 FLS (FLAVONOL SYNTHASE); flavonol synthase
AT3G61580 0.08 delta-8 sphingolipid desaturase (SLD1)
AT4G37180 0.08 myb family transcription factor
RPL23AA 0.08 RPL23AA (RIBOSOMAL PROTEIN L23AA); RNA binding / nucleotide binding / structural constituent of ribosome
POP1 0.08 POP1; transporter
GDH1 0.08 GDH1 (GLUTAMATE DEHYDROGENASE 1); ATP binding / glutamate dehydrogenase NAD(P)+/ oxidoreductase
DRT102 0.08 DRT102 (DNA-DAMAGE-REPAIR/TOLERATION 2)
ACC1 0.08 ACC1 (ACETYL-COENZYME A CARBOXYLASE 1); acetyl-CoA carboxylase
PHV 0.08 PHV (PHAVOLUTA); DNA binding / protein binding / transcription factor

One prediction report per query file.

Set of parameters and command line switches that match the expected execution of the tool including the possible command line definitions according to the occurrence of optional parameters. Also, validation instructions for parameters are requested.

actual command-line parameter	name and brief description of the parameter	required	default value	text, number, or name of file	description of validation rules
--data	directory of geneMANIA dataset; file path	yes		text eg. /Users/username/genemania_plugin/gmdata-2010-12-01

--in	input-format; format of the query file	optional	flat (tab delimited)	text
--out	output-format; format of the output files genes: List of result genes ordered by score; one per line. flat:Tab-delimited report containing details of prediction results and query parameters. xml: XML-formatted report containing details of prediction results and query parameters. scores: List of result genes with scores ordered by score for the entire genome (ignores related genes limit); one per line.	optional	genes	text
--scoring-method	method used to compute the gene scores Discriminant: GeneMANIA's classic scoring method Z: Z-scores.	optional	disciminant	text	Discriminant z
--ids	gene identifier types.A comma separated file of gene identifier types in descending order of preference. Ensembl Gene Name Entrez Gene Name Ensembl Gene ID Refseq mRNA ID TAIR ID Uniprot ID Refseq Protein ID Ensembl Protein ID Entrez Gene ID ID types listed in the default order of preference.If the most preferred identifier is not available for a given gene, the next most preferred identifier is selected	optional		text	Ensembl Gene Name Entrez Gene Name Ensembl Gene ID Refseq mRNA ID TAIR ID Uniprot ID Refseq Protein ID Ensembl Protein ID Entrez Gene ID
--results	output file directory.Path to where the prediction result files will be created (one per input query file)	optional	working directory	text
--threads	The maximum number of parallel predictions. Ideally this should be set to the number of processing cores.	optional	1	number
--verbose	print more details about what's happening.	optional		text
--list_networks organism name	Lists the available networks for the given organism. Put quotes around the organism name.	optional		text
--list-genes organism name	Lists the genes that are recognized for the given organism. Put quotes around the organism name. Each line in the output contains a gene and all its synonyms, if any.	optional		text

Example invocation of the command line application and its associated parameters such that it can perform an analysis.

java -Xmx1800M -cp GeneMANIA.jar org.genemania.plugin.apps.QueryRunner --data /users/usename/genemania_plugin/gmdata-2010-12-01 --out flat query1.txt

Reference

J. Montojo, K. Zuberi, H. Rodriguez, F. Kazi, G. Wright, S. L. Donaldson, Q. Morris and G. D. Bader. (2010).GeneMANIA Cytoscape plugin: fast gene function predictions on
the desktop. Bioinformatics,26 (22):2927-2928.

Mostafavi,S. et al. (2008) GeneMANIA: a real-time multiple association network integration algorithm for prediction gene function. Genome Biol., 9, S4.

Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, Maitland A, Mostafavi S, Montojo J, Shao Q, Wright G, Bader GD, Morris Q(2010).The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function.Nucleic Acids Res. 38 Suppl:W214-20.

[http://nar.oxfordjournals.org/cgi/content/abstract/38/suppl_2/W214]

How to download the tool or source code including installation and usage instructions as well as any source code that might be associated with the executable. This should also include a listing of any dependencies for this tool or script.

Required version of the program necessary to perform the desired task

Sample dataset and expected results to be output

Set of parameters and command line switches that match the expected execution of the tool including the possible command line definitions according to the occurrence of optional parameters. Also, validation instructions for parameters are requested.

Example invocation of the command line application and its associated parameters such that it can perform an analysis.

Reference