FaST-LMM


FaST-LMM

What is FaST-LMM?

FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a GWAS analysis tool from Microsoft Research designed for large data sets.  It has been tested on data sets with over 120,000 individuals. Normally, running a linear mixed model on a dataset is thorough, but computationally demanding and may not even work on especially big data sets. FaST-LMM changes things by reducing the runtime needed to produce such a model. Normally, when dealing with SNPs, a genetic similarity matrix is formed. FaST-LMM works by obtaining the spectral decomposition of this similarity matrix without actually computing the matrix itself. This decomposition is then used to test all SNPs in the data set for statistical significance. Such a method allows for proportionally smaller computation time in contrast to other programs.

This tutorial relates specifically to FaST-LMM as run through a Validate Workflow instance. For information about the FaST-LMM image see here.

FaST-LMM can be computationally demanding, and if an instance lacks sufficient memory, the process may be "killed" mid computation. In our experience, filesets in excess of 1GB need at least 4GB of memory to guarantee processing.


How to Get Started

To run the FaST-LMM executable, simply type fastlmmc into the command line. This will bring up a help menu with all the possible input options.

Input files

  1. A PEDMAP set of files

  2. A phenotype file corresponding to the PEDMAP set

  3. A set of PLINK formatted files to compute the genetic similarity matrix decomposition. This does not need to be different from number 1

  4. A set of corresponding covariates (optional)

Input flags

  • -file : Denotes the file name for the PLINK .ped/.map files

  • -bfile : Denotes the name for PLINK .bed/.bim/.fam files

  • -tfile : Denotes the name for PLINK .tped/.tfam files

Note: As the filenames for the input files are the same, only one name is required as long as the correct flag type is used (i.e. bfile, file, tfile)

  • -pheno : Denotes the name of the phenotype file (including extension)  

  • -out : The name of your final output file, which is placed into the same directory as your program and data unless otherwise specified

  • -fileSim:  The name of the PLINK set used for computing the genetic similarity matrix and its decomposition (will be the same as the fileset used, hence, file extension not necessary)

Additional options

  • -verboseOutput : use this flag to show more complex and detailed output; does not require a file to be named  

  • -covar : Denotes the name of the covariate file (including file extension)
  • -pValuePrintThreshold : Restricts the output file to only include SNPs with a p-value less than or equal to the specified threshold

Example command line for running FaST-LMM

fastlmmc -verboseOutput -file <filename> -fileSim <filename> -pheno <filename.txt or filename.pehno> -covar <filename.covar.txt> -out MyResults.csv -pValuePrintThreshold 0.05

Further Information

User Manual: http://nbviewer.ipython.org/github/MicrosoftGenomics/FaST-LMM/blob/master/doc/ipynb/FaST-LMM.ipynb

Paper on FaST-LMM from Microsoft Research: http://www.nature.com/nmeth/journal/v8/n10/abs/nmeth.1681.html

Source code from Microsoft Research Github: https://github.com/MicrosoftGenomics/FaST-LMM

Example Input and Output data can be found as an attachment to this page.

This tool is still in development and we are testing it currently! If you notice any issues or have any comments we would greatly appreciate them!
Please contact us at labstapleton@gmail.com. Thank you for using our tools!

Unable to render {include} The included page could not be found.