FaST-LMM
FaST-LMM
What is FaST-LMM?
FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a GWAS analysis tool from Microsoft Research designed for large data sets. It has been tested on data sets with over 120,000 individuals. Normally, running a linear mixed model on a dataset is thorough, but computationally demanding and may not even work on especially big data sets. FaST-LMM changes things by reducing the runtime needed to produce such a model. Normally, when dealing with SNPs, a genetic similarity matrix is formed. FaST-LMM works by obtaining the spectral decomposition of this similarity matrix without actually computing the matrix itself. This decomposition is then used to test all SNPs in the data set for statistical significance. Such a method allows for proportionally smaller computation time in contrast to other programs.
This tutorial relates specifically to FaST-LMM as run through a Validate Workflow instance. For information about the FaST-LMM image see here.
How to Get Started
To run the FaST-LMM executable, simply type fastlmmc into the command line. This will bring up a help menu with all the possible input options.
Input files
A PEDMAP set of files
A phenotype file corresponding to the PEDMAP set
A set of PLINK formatted files to compute the genetic similarity matrix decomposition. This does not need to be different from number 1
- A set of corresponding covariates (optional)
Input flags
-file : Denotes the file name for the PLINK .ped/.map files
-bfile : Denotes the name for PLINK .bed/.bim/.fam files
-tfile : Denotes the name for PLINK .tped/.tfam files
Note: As the filenames for the input files are the same, only one name is required as long as the correct flag type is used (i.e. bfile, file, tfile)
-pheno : Denotes the name of the phenotype file (including extension)
-out : The name of your final output file, which is placed into the same directory as your program and data unless otherwise specified
- -fileSim: The name of the PLINK set used for computing the genetic similarity matrix and its decomposition (will be the same as the fileset used, hence, file extension not necessary)
Additional options
-verboseOutput : use this flag to show more complex and detailed output; does not require a file to be named
- -covar : Denotes the name of the covariate file (including file extension)
-pValuePrintThreshold : Restricts the output file to only include SNPs with a p-value less than or equal to the specified threshold
Example command line for running FaST-LMM
fastlmmc -verboseOutput -file <filename> -fileSim <filename> -pheno <filename.txt or filename.pehno> -covar <filename.covar.txt> -out MyResults.csv -pValuePrintThreshold 0.05
Further Information
User Manual: http://nbviewer.ipython.org/github/MicrosoftGenomics/FaST-LMM/blob/master/doc/ipynb/FaST-LMM.ipynb
Paper on FaST-LMM from Microsoft Research: http://www.nature.com/nmeth/journal/v8/n10/abs/nmeth.1681.html
Source code from Microsoft Research Github: https://github.com/MicrosoftGenomics/FaST-LMM
Example Input and Output data can be found as an attachment to this page.
This tool is still in development and we are testing it currently! If you notice any issues or have any comments we would greatly appreciate them!
Please contact us at labstapleton@gmail.com. Thank you for using our tools!