QTL Reaper

What is the QTL Reaper?

QTL Reaper is command line software for rapidly mapping quantitative trait loci from large datasets with many phenotypes per genotype (per subject or inbred strain) such as those generated using microarrays, where every gene expression level measurement may be treated as a phenotype. This program is a batch-mode version of WebQTL and is written in C and compiled as a Python module.

QTL Reaper requires as input a phenotype data set and genotypes for a mapping population such as set of recombinant inbred strains, an F2 intercross, or a backcross. The program computes linkage between each trait and all genotypes, evaluating the P-value of linkage by permutation. For each trait, QTL Reaper ramps up the number of permutations as necessary to define the empirical P-value to a reasonable level of precision. It also performs bootstrap resampling to estimate the confidence region for the location of putative QTLs.

How to Setup and Use QTL Reaper

1. Download QTL Reaper from SourceForge.

2. Building and installing QTL Reaper in Mac, Linux, or Windows (using cygwin with Python installed):
    a) Open a terminal or command prompt.
    b) Gunzip and untar the package file, then change directories into the qtlreaper directory in your terminal.
    c) To compile qtlreaper module, execute:python setup.py build. Administrator privileges are needed to install qtlreaper, as it is a global module for python. To compile qtlreaper on a PC, Visual Studio 7.1+ (or its Python-component libraries) is required. The setup.py script may be modified if you wish to use your own blas and lapack libraries. To install the qtlreaper module, execute: sudo python setup.py install (followed by entering your administrator password).
    d) Check that the library reaper.so has been installed in a location that is included in the Python path. You can check the Python path when Python is running by first starting Python (simply type python in your command prompt and hit enter). Once in Python, which you can tell by a >>> prompt, enter the Python commands import sys and sys.path. Under Mac OS X 10.12, the standard location will be something like /Library/Python/2.7/site-packages. QTLreaper should be installed in that directory (i.e. it should be located at /Library/Python/2.7/site-packages/qtlreaper/).
    e) Run Python, if you have not already, by typing python at the terminal prompt. If reaper.so is installed properly, it is not necessary to change to a specific directory before running Python. Alternately, the user may navigate to the directory with reaper.so, then start python, then type 'import reaper' for the same functionality, without requiring the module be installed with administrator privileges.

3. Instructions for Using QTL Reaper
    a) Once your terminal is running Python (denoted by >>> at the command prompt), import qtlreaper by typing import reaper in the terminal.
         From here, the user can create a data set into which data can be placed by executing the following command. This command creates a Dataset object which, in this example, we call mydataset. In place of mydataset, you can use any name that begins with a letter.
         mydataset = reaper.Dataset().
         At this point a properly formated file can be read into the newly created Dataset object using the command:
         mydataset.read("path/to/file").
    b) The user now may examine a number of data fields in the Dataset object by simply typing their names in the format mydataset.command. This usage is standard Python syntax for displaying the data of an object.
          The names of these data fields are:

type: Displays the type of dataset (RI Set or Intercross or Backcross, etc.)
name: Displays the name of the mapping population (BXD, B6D2F2, CXBRIX, etc.)
prgy: Lists the names of the individuals or strains
nprgy: Lists the number of individuals or strains
pat: Shows the paternal parent type or strain
mat: Shows the maternal parent type or strain
nloci: Displays the number of marker loci used to map

c) Data related to a specific chromosome or a specific marker on a chromosome can be displayed by using the following syntax, where chr# represents a chromosome number and marker# represents a marker number. The syntax, which is standard Python sytax applied to a Dataset object is:
mydataset[chr#].name or mydataset[chr#][marker#].name, where the possible data field names include:

mydataset[chr#].name: Displays the name of the selected chromosome
[0]: Indicates the first chromosome
[-1]: Used as a shortcut to the last chromosome
mydataset[chr#][marker#].name: Displays the marker at the selected array location
[0][0]: Indicates the first chromosome and first marker location
mydataset[chr#][marker#].cM: Displays the location of the marker in cM
mydataset[chr#][marker#].Mb: Displays the location of the marker in Mb (if available)
mydataset[chr#][marker#].genotype: Shows the genotype (scored as - 1, 0, and +1) of the elements in the array
mydataset[chr#][marker#].genotext: Describes the selected elements in the array as maternal, heterozygous, or paternal genotypes
len(mydataset[chr#]): Displays the number of markers in the selected chromosome

d) Manually putting data into QTL Reaper

NewData = ['String1', 12.5, 'String2']: Creates a list containing any number of strings and/or floats - quotes are needed between each string element, but not around a numerical element.

NewData2 = 'Thing1', 3.141, 'Thing2': Creates a tuple containing any number of strings and/or floats - the same as above, but without square brackets around the sides.

e) Generating new tables and data

NewVariableName = YourVariableName.addinterval(): Creates a new object, calculating intervals at every 1cM

NewVariableName.nloci: Displays the number of loci in the newly calculated interval.

Chrom_No_Variable = reaper.Dataset(): Creates a new object that is a duplicate of the original YourVariableName.

Chrom_No_Variable.chromosome = [YourVariableName[#]]: Extracts a specific chromosome out of YVN and sets ChromVar equal to it.

f) Calculating regression, QTLs, bootstrap tests, permutations, and group variance
(All variable names with 'x' below are examples and the entire variable name can be changed.)

QTLx = YourVariableName.regression(strains = NewData, trait = NewData2, variance = NewData3, control="NewData4": Sets up a QTL with data to be analyzed. Variance and control are optional controls.

QTLx.sort(): Sorts the QTL data by absolute value of the LRS.

max(QTLx): Displays the data for the highest QTL

max(QTLx).lrs: Gives the exact LRS value for the highest QTL

max(QTLx).locus: Shows the locus data

min(QTLx): Displays the data for the lowest QTL

min(QTLx).lrs: Gives the exact LRS value for the lowest QTL

min(QTLx).locus: Shows the locus data

bootX = YourVariableName.bootstrap(strains = newData, trait = NewData2, variance = newData 3, nboot = #): Does a bootstrap test on selected data. Variance is an optional control, nboot can be set to any positive integer.

permuX = YourVariableName.permutation(straints = newData, trait = NewData2, variance = newData3, nperm = #, thresh = #): Returns a list of highest LRS from each permutation, ascending. Variance, nperm, and thresh are all optional. Thresh and nperm should be set as positive integers if either is used.

pvalueX = reaper.pvalue(max(QTLx).lrs, permuX): Calculates the p-value for the selected permutation.

ANOVA: ANalysis Of VAriance between groups, a two-step function.

listvariable = [3, 2.1, 5, 3.2, 1.3, 3.6, 1.6, 3.9, 4.1, 2]: Creates the list that will be analyzed.

mean, median, variance, stdev, stderr, N = reaper.anova(listvariable): Does the group calculations. The calculations are then displayed by entering them as an individual command (i.e. mean or median).

g) Reading from an Input file, such as trait data. (Sample trait.txt is included in the QTLReaper download's Example folder) Below: This pulls the first line of File.txt and puts it into 'header' and the second line into 'fileData.'

import string
newFile = open("..Path/..To/File.txt")
header = newFile.readline()
header = string.split(header)
header = map(string.strip, header)
strains = header[1:]
fileData = newFile.readline()
fileData = string.split(fileData)
fileData = map(string.strip, fileData[1:])
fileData = map(float, fileData)

4. Formatting Input Files:

The primary input files for QTLReaper must be in an exact, specific format to work properly. Before exporting this data, ensure that your first three columns of data are in this order: Chromosome, Locus, and cM. Mb location may be added as an optional fourth column, though it cannot be a substitute for cM. It must be a plain text, tab-delimited document with Unix line breaks. Microsoft Excel, BBEdit, and other applications will have the option to export as a tab-delimited text document, most likely under the "save as" option. Many applications have the option to change the line breaks to Unix (in BBEdit, for example, in the Options section of "Save As"). This can also be conveniently accomplished in Windows by using fromdos.exe, a small free utility available here.
Once the file is exported, the data may be pushed down a few lines to make room for (optional) header information. (Remember to convert these line breaks to Unix style if adding the header information on a Windows or Macintosh machine!) At the top, above all the data, the user may add the information that is called by the .type, .name, and .mat/.pat commands listed above. This is done by: @type:, @name:, and @mat/@pat: with stating the necessary information following the colon (no space). All these declarations should be placed on separate lines.
BXD.txt and BXD2.txt are examples of properly formatted files, and are contained in the QTLReaper download's Example folder. A larger, non-sample-only dataset is available for download (722k, its Macintosh line breaks will have to be converted to Unix line breaks to function properly.)

5. Example:

#Import the reaper module
import reaper
#Initiate a Python Dataset object
bxdGeno = reaper.Dataset()
#read genotype information from a file
#argument is a string containing the filename and location
#file is a tab-delimited text file
bxdGeno.read("../../Example/BXD.txt")
################
# Dataset Object
################
#dataset type (RI Set or Intercross)
bxdGeno.type
#dataset name
bxdGeno.name
#progeny
bxdGeno.prgy
#number of progeny
bxdGeno.nprgy
#parent 1
bxdGeno.pat
#parent 2
bxdGeno.mat
#Number of Loci
bxdGeno.nloci
#Number of Chromosome
len(bxdGeno[0])
#1st Chromosome
bxdGeno[0].name
#last Chromosome
bxdGeno[-1].name
#1st Locus
bxdGeno[0][0].name
#last Locus
bxdGeno[-1][-1].name
#Locus cM
bxdGeno[-1][-1].cM
#Whether physical info (Mb) are available
bxdGeno[-1][-1].Mbmap
#Locus Mb
bxdGeno[-1][-1].Mb, if physical info are available
#Locus genotype in numbers
bxdGeno[-1][-1].genotype
#Locus genotype in abbrevs
bxdGeno[-1][-1].genotext
################
# Interval Map
################
#Calculate intervals at 1cM, generate a new dataset object 
bxdIntervalGeno = bxdGeno.addinterval()
#Number of Loci
bxdIntervalGeno.nloci
################
# individual chromosome
################
#You can create a new dataset object which contains
only one chromosome for individual chromosome analyses 
chr_1_bxdGeno = reaper.Dataset()
chr_1_bxdGeno.chromosome = [bxdGeno[0]]
#Number of Loci
chr_1_bxdGeno.nloci
################
# Member functions
################
#Most reaper functions require two lists as inputs
#the first list is the case or strain list; the second is the trait values list
#the two lists should have the same number of members
#the strain list should have the same cases or strains as those listed in the header 

 of the genotype file or should be a subset of those
strains =['BXD1', 'BXD2', 'BXD5', 'BXD6', 'BXD8', 'BXD9', 'BXD11', 'BXD12', 'BXD13',
          'BXD14', 'BXD15', 'BXD16', 'BXD18', 'BXD19', 'BXD20', 'BXD21', 'BXD22',
          'BXD23', 'BXD24', 'BXD25', 'BXD27', 'BXD42']
trait =  [53.570 ,63.885 ,56.700 ,61.750 ,66.325 ,65.150 ,60.400 ,57.920 ,
          51.925 ,62.350 ,67.175 ,65.850 ,52.425 ,60.925 ,65.350 ,56.750 ,
          59.750 ,57.888 ,60.250 ,64.433 ,57.125 ,63.600]
#vaiance list will be used in weighted regression (using 1/variance as the weights)
variance = [0.777, 0.108, 1.78, 1.18, 0.370, 0.808, 1.549, 0.710, 0.257, 1.482, 1.816, 
            0.711, 1.204, 0.059, 0.182, 0.591, 0.357, 0.072, 0.490, 0.239, 0.905, 1.327]
#genotypes of a control locus as a cofactor in the regression
control = "D1Mit1"
################
# Regression
################
#the result is a list of QTL Objects
#simple regression (no variance and no control cofactor)
qtl1 = bxdGeno.regression(strains = strains, trait = trait)
#weighted regression (variance weighting but no control cofactor)
qtl2 = bxdGeno.regression(strains = strains, trait = trait, variance = variance)
#composite regression (control cofactor with or without variance weighting)
qtl3 = bxdGeno.regression(strains = strains, trait = trait, control = "D1Mit1")
qtl4 = bxdGeno.regression(strains = strains, trait = trait, variance = variance, control = "D1Mit1")
#maximum QTL
max(qtl1)
max(qtl2)
max(qtl3)
max(qtl4)
#maximum LRS
max(qtl1).lrs
max(qtl2).lrs
max(qtl3).lrs
max(qtl4).lrs
#Locus with maximum LRS
max(qtl1).locus
max(qtl1).locus.name
#sort the qtl list
qtl1.sort()
qtl2.sort()
qtl3.sort()
qtl4.sort()
################
# Permutation
################
#returns a list of highest LRS value for each permutation in ascending order
#fixed number permutations
permu1 = bxdGeno.permutation(strains = strains, trait = trait,nperm=1000)
permu2 = bxdGeno.permutation(strains = strains, trait = trait, variance = variance,nperm=1000)
#progressive permutation
#keep on doing permutations until the threshold LRS is not in the top 10 
#or the total number of permutations reaches 1,000,000 
permu3 = bxdGeno.permutation(strains = strains, trait = trait, thresh = 23)
#calculate p-value
pv1 = reaper.pvalue(max(qtl1).lrs, permu1)
################
# Bootstrap
################
#returns a list of counts of times that a locus has the highest LRS score
#the length of the list equal to the total number of markers
boot1 = bxdGeno.bootstrap(strains = strains, trait = trait, nboot=1000)
boot2 = bxdGeno.bootstrap(strains = strains, trait = trait, variance = variance, nboot=1000)
################
# Anova 
################
# A simple ANOVA (ANalysis Of VAriance between groups) function
list = [1, 2, 4.1, 2, 4, 3.2, 1.1, 5, 5.6, 7.1, 2.3, 4.3, 3.6]
mean, median, variance, stdev, stderr, N = reaper.anova(list)
#######################
# Read from Input file
#######################
#It is easy to read from tab-delimited input file to generate the strain and trait value lists
#use the included trait.txt as example
import string
fp = open("../../Example/trait.txt")
header = fp.readline()
header = string.split(header)
header = map(string.strip, header)
#strip any blank characters
strains = header[1:]
#the header here is the strain list
trait = fp.readline()
trait = string.split(trait)
trait = map(string.strip, trait[1:])
#strip any blank characters
trait = map(float, trait)
#the trait here is the trait valuelist

About this text file

QTL Reaper Tutorial written by Evan Williams, July 2005. Edits by EGW, Aug 4. 2005; KFM, March 20, 2006. Last edit by KFM, March 20, 2006.