Download datasets and supplementary data files.
BXD genotype current revision 050423
BXD genotype 2017
BXD genotype 2001-2016
(Updated July 1, 2022 by D. Ashbrook)
All variants are publicly available for anyone to get whatever type and frequency of variant that they want to. The variant vcf is under analyses files in project PRJEB45429 https://www.ebi.ac.uk/ena/browser/view/PRJEB45429?show=analyses
(Updated March 15, 2018 by RW Williams)
BXD Genotypes file status (January 2017): From September 2016 to January 2017, Robert Williams, Jesse Ingels, Lu Lu, and Danny Arends released a new genotype file for the original BXD strains (BXD1 through BXD102) and for all of the new strains (BXD104 to BXD220). Version 1 of this genotype file (used from jan 2017 to March 13, 2018) contained data for 7324 markers and 198 strains. Version 2 of March 14, 2018 fixed some errors of marker location detected by Karl Broman (five markers were out of order in the latest mouse genome assembly). We deleted three markers and retained a final set of 7321 markers, now all in correct order based on the SNP position using the mm10 assembly.
Of the 198 BXD strains, 191 are independent, whereas 7 are substrains (e.g., BXD48 and BXD48a). The file provides approximate locations of 10300 recombinations, an average of 52 per strain. Genotypes were generated using Affymetrix, MUGA, MegaMUGA, and GigaMUGA Illumina platforms. Microsatellites and eQTL genotypes were generated by the Williams/Lu laboratory. Unknown genotypes were imputed as B or D, or were called as H (heterozygous) if the genotype was uncertain. Genotypes were manually curated by RW Williams. Genotypes were smoothed to remove unlikely recombination events. Almost all recombinations are supported by multiple markers, although only one or two representative markers may be provided in this file. The original parent file (BXD_El_Grande_Master_Used_to_Proof_Final_Genotypes_2016.xlxs) contains data for approximately 37000 markers. Genotypes for Chr Y and Chr M are provisional and will be verified in 2017. As of 2016, many strains with higher numbers (BXD100 and above) are not fully inbred.
A link to the genotype file is provide here
Genotypes were generated at GeneSeek (Neogen Inc) with financial support from the University of Tennessee Center for Integrative and Translational Genomics. We thank Drs. Fernando Pardo-Manuel de Villena (University of North Carolina) and Gary Churchill (The Jackson Laboratory) for developing the GigaMUGA arreay.
The new genotypes are now available in GeneNetwork as the 2017 Genotype file. All SNPs were mapped to the newer Dec 2011, mm10, GRCm38 assembly.
As of Jan 2017 GeneNetwork uses mm10 coordinates for mapping functions. Older mm9 versions of GeneNetwork are available on the GN TimeMachine (see upper right side of Search page).
BXD Genotype: The state of a gene or DNA sequence, usually used to describe a contrast between two or more states, such as that between the normal state (wildtype) and a mutant state (mutation) or between the alleles inherited from two parents. All species that are included in GeneNetwork are diploid (derived from two parents) and have two copies of most genes (genes located on the X and Y chromosomes are exceptions). As a result the genotype of a particular diploid individual is actually a pair of genotypes, one from each parents. For example, the offspring of a mating between strain A and strain B will have one copy of the A genotype and one copy of the B genotype and therefore have an A/B genotype. In contrast, offspring of a mating between a female strain A and a male strain A will inherit only A genotypes and have an A/A genotype.
Genotypes can be measured or inferred in many different ways, even by visual inspection of animals (e.g. as Gregor Mendel did long before DNA was discovered). But now the typical method is to directly test DNA that has a well define chromosomal location that has been obtained from one or usually many cases using molecular tests that often rely on polymerase chain reaction steps and sequence analysis. Each case is genotyped at many chromosomal locations (loci, markers, or genes). The entire collection of genotypes (as many a 1 million for a single case) is also sometimes referred to as the cases genotype, but the word "genometype" might be more appropriate to highlight the fact that we are now dealing with a set of genotypes spanning the entire genome (all chromosomes) of the case.
For gene mapping purposes, genotypes are often translated from letter codes (A/A, A/B, and B/B) to simple numerical codes that are more suitable for computation. A/A might be represented by the value -1, A/B by the value 0, and B/B by the value +1. This recoding makes it easy to determine if there is a statistically significant correlation between genotypes across of a set of cases (for example, an F2 population or a Genetic Reference Panel) and a variable phenotype measured in the same population. A sufficiently high correlation between genotypes and phenotypes is referred to as a quantitative trait locus (QTL). If the correlation is almost perfect (r > 0.9) then correlation is usually referred to as a Mendelian locus. Despite the fact that we use the term "correlation" in the preceding sentences, the genotype is actually the cause of the phenotype. More precisely, variation in the genotypes of individuals in the sample population cause the variation in the phenotype. The statistical confidence of this assertion of causality is often estimated using LOD and LRS scores and permutation methods. If the LOD score is above 10, then we can be extremely confident that we have located a genetic cause of variation in the phenotype. While the location is defined usually with a precision ranging from 10 million to 100 thousand basepairs (the locus), the individual sequence variant that is responsible may be quite difficult to extract. Think of this in terms of police work: we may know the neighborhood where the suspect lives, we may have clues as to identity and habits, but we still may have a large list of suspects.
The BXD genotype file was initially upgraded in 2010-2011 using the new high density Affymetrix array (580,000 high quality SNPs) developed in the laboratories of Drs. Fernando Pardo-Manuel de Villena (University of North Carolina) and Gary Churchill (The Jackson Laboratory, see Yang H, Ding Y, Hutchins LN, Szatkiewicz J, Bell TA, Paigen BJ, Graber JH, Pardo-Manuel de Villena, F, Churchill GA (2009) A customized and verstatile high density genotyping array for the mouse. Nat Methods 6:663-666)
The BXD genotype file used from June 2005 through December 2016 exploits a set of approximatey 3796 markers typed across 88 extant and extinct BXD strains (BXD1 through BXD102). The mean interval between informative markers is about 0.7 Mb. This genotype file includes all markers, both SNPs and microsatellites, with unique strain distribution patterns (SDPs), as well as pairs of markers for those SDPs represented by two or more markers. In those situations where three or more markers had the same SDP, we retained only the most proximal and distal marker in the genotype file. This particular file has also been smoothed to eliminate genotypes that are likely to be erroneous. We have also conservatively imputed a small number of missing genotypes (usually over very short intervals). Smoothing genotypes is this way reduces the total number of SDPs and also lowers the rate of false discovery. However, this procedure also may eliminate some genuine SDPs.
The new smoothed BXD genotype data file (2017) can be downloaded from
GeneNetwork at the URL http://www.genenetwork.org/genotypes/BXD.geno.
Please Note: For a limited number of markers and strains, the genotypes of BXDs have been called heterozygous. This is usually done over comparatively short intervals in some of the newer strains that may not have been fully inbred when they were initially genotyped. Use of the genotype file above in external software packages such as R/qtl, requires careful treatment of this issue to prevent bias in empirical significance thresholds. It is recommended to treat these rare heterozygous loci as missing data and ensure that only the additive effects of B vs. D alleles are estimated by these packages. (note by Elissa Chesler, Dec 2010).
Source of Genotypes:
In collaboration with members of the CTC (Richard Mott, Jonathan Flint, and colleagues), we have helped genotype a total of 480 strains using a panel of 13,377 SNPs. These SNPs were combined with our previious microsatellite genotypes to produce the older "classic" consensus maps for the expanded set of BXD using the older mouse assemblies (Mouse Build 36 - UCSC mm8 and then mm9). (Files were updated from mm6 to mm8 in January 2007, and from mm9 to mm10 in January 2017).
A total of 198 strains have be genotyped as of Jan 2017 using the full set of SNPs, and about 7324 of these are informative. Informative in this sense simply means that the C57BL/6J and DBA/2J parental strains have different alleles. To reduce false positive errors when mapping using this ultra dense map, we have eliminated most single genotypes that generate double-recombinant haplotypes that are most commonly produced by typing errors ("smoothed" genotypes). For this reason, the genotypes used in the GeneNetwork differ from those downloaded directly from Richard Mott's web site at the Wellcome Trust, Oxford or from the Jackson Laboratory.
We have genotyped all available BXD strains from The Jackson Laboratory. BXD1 through BXD32 were produced by Benjamin Taylor starting in the late 1970s. BXD33 through BXD42 were produced by Taylor in the 1990s (Taylor et al., 1999). All BXD strains with numbers higher than BXD42 (BXD43 through BXD100) were generated by Lu Lu and Robert Williams at UTHSC, and by Jeremy Peirce and Lee Silver at Princeton University. We thank Guomin Zhou for generating the advanced intercross stock used to produce most of these advanced RI strains both at UTHSC and Princeton. There are approximately 48 of these advanced BXD strains, each of which archives approximately twice the recombinations present in a typical F2-derived recombinant inbred strain (Peirce et al. 2003).
Due to the very high density of markers, the mapping algorithm used to map BXD data sets has been modified and is a mixture of simple marker regression, linear interpolation, and standard Haley-Knott interval mapping. When two adjacent markers have identical SDPs, they will have identical linkage statistics, as will the entire interval between these two markers (assuming complete and error-free haplotype data for all strains). On a physical map the LRS and the additive effect values will therefore be constant over this physical interval. Between neighboring markers that have different SDPs and that are separated by 1 cM or more, we use a conventional interval mapping method (Haley-Knott) combined with a Haldane estimate of genetic distance. When the interval is less than 1 cM, we simply interpolate linearly between markers based on a physical scale between those markers. The result of this mixture mapping algorithm is a linkage map of a trait that has an unusal profile that is particular striking on a physical (Mb) scale, with many plateaus, abrupt linear transitions between plateaus, and a few regions with the standard graceful curves typical of interval maps.
Archival BXD Genotype file: Prior to July 2005, the marker genotypes used to map all BXD data sets consisted of a set of 779 markers described by Williams and colleagues (2001) that also included a small number of additional SNPs from Tim Wiltshire and Mathew Pletcher (GNF, La Jolla), new microsatellite markers generated by Grant Morahan and Jing Gu (Msw type markers), and a few CTC markers by Jing Gu. This old marker data set was made obsolete by the ultra high density Illumina SNP genotype data generated Spring, 2005.
The entire BXD genotype data set used for mapping traits can be downloaded at BXD.geno.