Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Qxpak

"Mixed models have a long and fruitful history in statistics. They are pertinent to genomics problems because they are highly versatile, accommodating a wide variety of situations within the same theoretical and algorithmic framework.

Qxpak is a package for versatile statistical genomics, specifically designed for sophisticated quantitative trait loci and association analyses. Multiple loci, multiple trait, infinitesimal genetic effects, imprinting, epistasis or sex linked loci can be fitted. The new version (v. 5) allows us, among other new features, to include either relationship matrices obtained with molecular information or user defined matrices that can be read from an input file. This feature can be used for genome selection or - more importantly - to correct for population structure in association studies. In crosses, two parental lines, not necessarily inbred, can be accommodated."

[Overview from Pérez-Enciso1 and Misztal; see that publication for a detailed overview of Qxpak. The software and full documentation are available at Institució Catalana de Recerca i Estudis Avançats.]

Qxpak is also available from a command line in an Atmosphere image Validate Workflow v.07. You can call qxpak from the command line in an instance created from that image.

...

 

...

Include Page

...

Command Line Usage

Normally in QxPak you would alter the parameter file so that you call specific input files for each data analysis fun. This tutorial is written around a wrapper script that was written to make the process of data analysis more easy to do with many different inputs. Instead of having to alter the parameter file each time, you can simply changes the inputs within the command line. 

(Development note, currently the validate image does not have the qxpakwrapper.py script installed, but it can be installed by downloading a copy of the script from the links below then uploading it to your instance)

1. Install the qxpakwrapper.py and copy it to your instance.

2. Download and copy the input files from the Discovery environment which can be found here.  

3. Then run the following line of code making sure that all the input files you are calling are within your current working directory.

(this example utilizes all the options, all the example inputs, and uses an output results file with a .CSV extension so that it is easier to read)

(note, this will output directly in your current working directory and has many files as output so )

Code Block
qxpakwrapper.py -p parameterFile.par -d dataFile.dat -g pedigreeFile.ped -m markerFile.mkr -i userInverse.inverse -t UserDirect.direct -h haplotypes.haplo -o results.csv

Image Removed 

4. After you copy this line into your terminal and press enter a large amount of data will scroll through the terminal and when the QxPak analysis has finished the terminal prompt will return.

Image Removed

5. The example of the output which will be created can be found in the Discovery Environment here

Options

-p, --par

Name of the parameter file using the $ markers above.

-d, --data

Name of the data file; the replacement for $data.

-g, --ped

Name of the pedigree file; the replacement for $pedigree.

-m, --mkr

Name of the marker file; the replacement for $marker.

-i, --uInv

Name of the user-defined inverse file; replacement for $userInverse.

-t, --uDir

Name of the user-defined direct file; replacement for $userDirect.

-h, --haplo

Name of the haplotype file; replacement for $haplotype.

-o, --output

Name of the output file; defaults to result.txt. Several other files may be generated depending upon the parameter file.

Example Data

Test data can be found at:

Code Block
/iplant/home/shared/iplantcollaborative/example_data/qxpak/input

Results using the parameter file including with the relevant input files can be found at:

Code Block
/iplant/home/shared/iplantcollaborative/example_data/qxpak/output

Input Files

Parameter File

The  Qxpak documentation includes all the necessary details for specifying the parameter file, which, as the name implies, supplies the program parameters. A parameter file should be prepared and uploaded to the Data Store along with the other input files. Some special markers, though, are replaced by the wrapper program so that filenames do not need to be (and should not be) hardcoded into the parameter file. Specifically, the following markers should be used, and they will be replaced with the command line arguments.

$data

Data file

$pedigree

Pedigree file

$marker

Marker file

$userInverse

User inverse - user-defined covariance matrix file

$userDirect

User direct - user-defined covariance matrix file

$haplotype

Haplotype file

$output

Output file

Data File

The data file is a .dat file, a free-format file without a header that always contains the individual in the first column, whether it be numeric or alphanumeric. Record order is not important. Subsequent columns include traits and effects and may include more than are used in the model.

Missing values must be coded as 0. If the actual value is 0, recode it as 0.000001.

Pedigree File

A pedigree file is required for quantitative trait locus (QTL) analysis. This file includes the individual, father, mother, sex, and breed. The last two are optional if analyzing non-sex chromosomes or within breed populations. Individuals do not have to be coded; missing parents should be indicated with a 0. Breed is irrelevant unless at least one parent is unknown.

Marker File

Two formats are available: “usual” and “transposed”. In the usual format, the first record contains the chromosome name and successive records contain: individual, allele1_mkr1, allele2_mkr1, etc. Missing alleles are specified by 0.

The transposed format is appropriate when there are many more markers than individuals. In this format, the first row is a list of individual codes, and successive rows contain: SNP_name, chr_number, ind1_allele1, ind1_allele2, ind2_allele1, ind2_allele2, etc. Unknown markers should be coded as 0, and chromosomes must have numbers rather than names; markers should be arranged by chromosome 1, 2, etc.

User-Defined Covariance Matrix Files

One or two files can be included to allow for the including of random effects distributed as N(0,V), where V can be any positive definite matrix which is stored in the file. The matrix is then invertyed to obtain random effects predictions, and the inverse can also be included to save computation. The parameter file must be modified appropriately to apply the effects to specific columns.

The format of these files is: row, column, value in space-delimited form like the other files.

Haplotype File

Contains known haplotypes if any. The first record contains the name of the chromosome. Successive records include individual, order of markers where phases known. If several chromosomes are analyzed, the format should be repeated for each.

Output Files

q.0

Contains running output that might be useful for, among other things, checking convergence.

Primary Output File

A variety of different results are reported in the same file.

Haplotype Output File

If the applicable section in the parameter file is specified, the haplotypes sampled at each MCMC iteration are written. The format is: chromosome, MCMC_iteration, individual, phase, alleles.

Z Files

Z files contain the IDB probabilities or SNP configurations.

Other Output Files

There are numerous other undocumented output files.

Tool Source

...

 

...

Qxpak

...

Wrapper

...

Binary

...

Qxpak

...

N/A

...

Source

...

Qxpak

...

qxpakwrapper.py (Python 2.7)

...

Version

...

5.0.5

...

1.1

...

User Guide

...

Qxpak Manual

...

qxpakwrapper.py_usage.txt

...

Requires

...

N/A

...

N/A

...

JSON

...

N/A

...

docs:_DE_archived_apps_blurb
docs:_DE_archived_apps_blurb