FaST-LMM-2.07

 

FaST-LMM-2.07

FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a program for performing genome-wide association studies (GWAS) on large data sets. It runs on both Windows and Linux system, and has been tested on data sets with over 120,000 individuals. FaST-LMM is described more fully in the paper at http://www.nature.com/nmeth/journal/v8/n10/abs/nmeth.1681.html. [Description from the Codex page.]

Command Line Usage

FaST-LMM has a great many options; see user-manual.pdf

Many of the standard options for FaST-LMM ask for prefixes of filenames instead of actual filenames. A separate wrapper has been developed to handle those types of options. The wrapper adds an additional option, --folder=<foldername>, and that folder name will then be prepended automatically to the FaST-LMM options that require a name prefix, specifically -file, -tfile, -bfile, -dosage, -pheno, -filesim, -bfilesim, -tfilesim, -dosagesim, -sim, and -covar will be altered.

Test Data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> fastlmm -> input.

Input Files

  • SNP data to be tested. (required

    There three possible input formats, only one may be used at a time.   Make sure to follow the directions on the application when entering input data.  For example, if using the regular fileset (.map/.ped) then enter the inputs into .map and .ped box respectively, leaving all other input boxes empty.  If using transposed fileset (.tfam/.tped) then enter inputs in the labeled .tfam and .tped boxes, leaving the other boxes empty .  In ADDITION the user must go to the parameters section and check the box saying 'Transposed files are inputs'.  The same thing must be done when entering the binary fileset.

    1. Regular Fileset (.map/.ped).
    2. Transposed Fileset (.tfam/.tped)
    3. Binary Fileset (.bed/.bim/.fam)
  • SNP dosage files (optional) If dosage files are used, the first two files should be eliminated. SNP dosages are specified using a .dat file. Example from the documentation:

    SNP     A1  A2   Fam1 Ind1   Fam1 Ind2   Fam2 Ind3
    rs0001   A   C   0.98 0.02   1.00 0.00   0.00 0.01
    rs0002   G   A   0.00 1.00   0.00 0.00   0.99 0.01
    


    From the documentation: This file represents data for two SNPs on three individuals. The first three columns list the SNP, first nucleotide, and second nucleotide. The minor allele is coded A1 and the major allele is coded A2. Each genotype is represented by two numbers. Here, the two numbers for the first SNP represent the probability of an A/A, then an A/C genotype. The probability of a C/C is 1 minus the sum of these. The header row is optional, but if used, it must start with ‘SNP A1 A2’ and have a Family Id / Individual Id pair for each genotype probability pair. If there is no header, the genotype entries must be in the same order as found in the .fam file. Dosage files typically do not contain missing data, but -9 -9 may be used to specify a missing entry.  Additionally, a .fam file must be specified if a .dat file is used; the entries in both must correspond. An optional .map file can be used to provide additional SNP location information.

  • SNP similarity data. (optional) Used to determine the genetic similarities between individuals. It may be different from the first file but MUST have the same format as the first file.
  • Phenotype data. (optional) Uses the PLINK alternative phenotype. Includes at least three columns, family ID, individual ID, and phenotype value. Missing value as default is -9 but it can be changed. The first two fields create a unique ID which matches the first two SNP files. Optional header row allowed, and any number of phenotypes can be specified in different columns. An example with two phenotypes from the documentation with optional header:

    FID IID MyPheno YourPheno
    1 IND0 2 3.05043
    1 IND1 2 1.72797
    1 IND2 -9 4.19592
    1 IND3 2 3.4492
    1 IND4 1 -8.99843
    1 IND5 1 -0.768613
    1 IND6 2 6.73734
    ...
    
  • Set of covariates (optional). Must have at least three tab-delimited columns: family ID, individual ID, and covariate value, and may have any number of covariate values. The same missing value signifier from the third file must be used. Example from the documentation; the file should not have a header row:

    1 IND0 1
    1 IND1 1
    1 IND2 1
    1 IND3 1
    1 IND4 1
    1 IND5 -9
    1 IND6 1
    ...
    
  • Direct genetic similarities file (optional). The file must be tab-delimited and have both row and column labels for individual IDs; the value of the top-left corner of the file (first column header) should be “var”. Example from the documentation:

    var IND0 IND1 IND2  IND3 ...
    IND0 1.0 0.5 0.5 0.25 ...
    IND1 0.5 1.0 0.5 0.5 ...
    IND2 0.25 0.5 1.0 0.5 ...
    ...
    

Output Files

The expected output from the command line:

fastlmmc -tfile geno_test -tfilesim geno_cov -pheno pheno.txt -covar covariate.txt -mpheno 1

can be found at:

Community Data / iplantcollaborative / example_data / fastlmm / output

From the program documentation, the possible output fields, either standard or using the -verboseOut columns:

SNP

The rs# or SNP identifier for the SNP tested. Taken from the PLINK file.

Chromosome

The chromosome identifier for the SNP tested or 0 if unplaced. Taken from the PLINK file.

GeneticDistance

The location of the SNP on the chromosome. Taken from the PLINK file. Any units are allowed, but typically centimorgans or morgans are used.

Position

The base-pair position of the SNP on the chromosome (bp units). Taken from the PLINK file.

Phenotype

[under --verboseOut] The name of the phenotype as specified in the header of the phenotype file. NoName means that no header row was specified.

Pvalue

The p-value computed for the SNP tested

Qvalue

The q-value computed for the SNP tested estimated from the p-values of all testSNPs in the PLINK file using the procedure of Benjamini and Hochberg

N

The sample size or number of individuals that have a been used for this analysis

NumSNPsExcluded

[under --excludeByGeneticDistance]

IndexExclusionStart

[under --excludeByGeneticDistance]

DOF

[under --verboseOut] The degrees of freedom of the statistical test

NullLogLike

The log likelihood of the null model

AltLogLike

The log likelihood of the alternative model

SnpWeight

The fixed-effect weight of the SNP

SnpWeightSE

The standard error of the SnpWeight

WaldStat

The Wald stat of the SnpWeight

NullLogDelta

The ratio between the residual variance and the genetic variance on the null model

NullGeneticVar

The genetic variance on the null model

NullResidualVar

The residual variance on the alternative model

NullBias

The offset term in the null model

LogDelta

[under --verboseOut] The ratio between the residual variance and the genetic variance on the alternative model

geneticVar

[under --verboseOut] The genetic variance on the alternative model

ResidualVar

[under --verboseOut] The residual varianceon the alternative model

NullBias_

[under --verboseOut] The offset term in the alternative model

SNPIndex

The column index of the SNP tested in the PLINK file starting at 1

SNPCount

The number of SNPs tested

Rounding Issues

FaST-LMM sometimes shows slightly different results on different platforms due to rounding issues. The example data available in the Discovery Environment was generated in the Discovery Environment and will likely vary in a very minor way from results run locally.

Tool Source