GEMMA


GEMMA

What is GEMMA?

GEMMA (Genome-wide Efficient Mixed Model Association) is an analysis tool designed primarily for linear mixed models and variations thereof. More specifically, GEMMA handles three types of mixed models: a linear mixed model for marker associations with a single phenotype, a multivariate linear mixed model for testing marker associations with multiple phenotypes, and a Bayesian sparse linear mixed model for estimating PVE by typed genotypes, predicting phenotypes, and identifying associated markers.

How to Get Started

Like the FaST-LMM program above, the GEMMA software is also located in the /usr/bin directory, so the gemma executable can be called from anywhere on the computer. To see all possible inputs for GEMMA, type gemma –h into the command line.

To run a basic mixed model analysis with GEMMA, you will need your inputs in either PLINK binary format (BED/BIM/FAM extensions) or BIMBAM formats (with mean genotype, phenotype, and an optional annotation file). Once you have your data in this format, GEMMA also requires a relatedness matrix to run the mixed model; however, GEMMA has a relatedness matrix calculation algorithm built in.

Calculating a Relatedness Matrix

With PLINK binary files

Type your commands in as shown:

gemma -bfile <filesetName/prefix> -gk <1 or 2> -o <outputPrefix>

These file options indicate:

  • -bfile : A character string.The name of the PLINK binary file set, given without the extension. For example, if your files are names dat.bed, dat.bim, and dat.fam, you would just type in dat after the bfile flag.

  • -gk: An integer, either 1 or 2. Tells GEMMA the type of relatedness matrix to calculate. Option 1 calculates the centered relatedness matrix, while option 2 calculates the standardized relatedness matrix.

  • -o: A character string. Your designated name for the analysis output.

With BIMBAM files

Type your commands in as shown:

gemma -g <filename> -p <filename> -gk <num> -o <output prefix>

These file options indicate:

  • -g: A character string. Indicates the mean genotype file in your set. The full name, including extension, is required.

  • -p: A character string. Indicates the phenotype file for your set. Again, the full name is required.

  • -gk and -o options are the same as above.

Once the relatedness matrix algorithm is finished, GEMMA will create a folder in the current directory called output. Here, you will find the relatedness matrix file: <output name>.CXX.txt.

Running univariate and multivariate analysis

Now that you have a relatedness matrix, you can use that file in either a univariate mixed model analysis or a multivariate mixed model analysis.

Univariate with PLINK binary files

Type out your commands in the terminal like so:

gemma -bfile <filesetName/prefix> -k <filename> -lmm <num, 1-4> -o <outputPrefix>

These file options indicate:

  • -bfile: A character string. The PLINK binary file set name. Like previously, only the prefix is required; do not type any of the extensions in for this option

  • -k: A character string. The name of the previously calculated relatedness matrix file. Full name, including file extension, is required.

  • -lmm: An integer between 1 and 4 inclusive. Specifies which frequentist test to use and which corresponding p-value to list in the output. Option 1 gives the Wald test, option 2 gives the likelihood ratio test, option 3 gives the score test, and option 4 performs all three tests.

  • -o: Character string. Specifies your desired output prefix for the analysis file.

Once the analysis is complete, check the output folder in your current directory for your mixed model output: .assoc.txt.

Univariate with BIMBAM files

gemma -g <filename> -p <filename> -a <filename> -k <filename> -lmm <num, 1-4> -o <output prefix>

These file options indicate:

  • -g: Character string; the mean genotype file for your fileset. The full filename, including extension, is required

  • -p: Character string; the phenotype file for your set. Again, the full name is required.

  • -a: Character string; the annotation file of the set (optional)

  • -k: A character string. The name of the previously calculated relatedness matrix file. Full name, including file extension, is required.

  • -lmm: An integer between 1 and 4 inclusive. Specifies which frequentist test to use and which corresponding p-value to list in the output. Option 1 gives the Wald test, option 2 gives the likelihood ratio test, option 3 gives the score test, and option 4 performs all three tests.

  • -o: Character string. Specifies your desired output prefix for the analysis file.

Multivariate

For the multivariate mixed model, the only additional command line argument required is:

   -n: One or more integers separated by spaces; indicates which phenotype values in the phenotype file (whether PLINK binary or BIMBAM format) are included in the association analysis.

Further Information

User Manual: GEMMA_user_manual.pdf

Developer’s Website: http://www.xzlab.org/software.html

Example Data is the same as PLINK data and can be found:  Example data

This tool is still in development and we are testing it currently. If you notice any issues or have any comments we would greatly appreciate them!
Please contact us at labstapleton@gmail.com. Thank you for using our tools!

Unable to render {include} The included page could not be found.