Genomics Institute of the Novartis Research Foundation (GNF) and Groningen Hematopoietic Stem Cell mRNA U74Av2 Database (March/04 Freeze)
Accession number: GN7
The original March 2004 data freeze provides estimates of mRNA expression in hematopoietic stem cells (HSC) from adult female BXD recombinant inbred mice measured using Affymetrix U74Av2 microarrays (Bystrykh et al., 2005). Data were generated at the Genomics Institute of the Norvartis Research Foundations (Cooke and colleagues) and at the University of Groningen (de Haan and colleagues). Samples from 30 strains were hybridized to 60 arrays in two batches (Mar03 includes only the first batch). Data were processed using the RMA protocol.
REFERENCE: Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, Su AI, Vellenga E, Wang J, Manly KF, Lu L, Chesler EJ, Alberts R, Jansen RC, Williams RW, Cooke M, de Haan G (2005) Uncovering regulatory pathways affecting hematopoietic stem cell function using "genetical genomics." Nature Genetics, 37:225-232
About the mice used to map microarray data:
BXD recombinant inbred mice were purchased from the Jackson Laboratory and upon arrival were housed under clean conventional conditions in the Central Animal Facility of the University of Groningen, Netherlands. We used female mice between 3 and 6 months of age.
Stem cells (described below) were isolated from pooled bone marrow obtained from three BXD animals per strain. Pooled RNA samples were split in two aliquots and each sample was independently amplified and hybridized to the U74Av2 array (3 mice x 2 arrays).
About the tissue used to generate these data:
Bone marrow cells were flushed from the femurs and tibiae of three mice and pooled. After standard erythrocyte lysis nucleated cells were incubated with normal rat serum for 15 min at 4 degrees Celsius. Subsequently cells were stained with a panel of biotinylated lineage-specific antibodies (murine progenitor enrichment cocktail, containing anti-CD5, anti-CD45R, anti-CD11b, anti-TER119, anti-Gr-1, and anti-7-4, Stem Cell Technologies, Vancouver, Canada), FITC-anti-Sca-1 and APC-anti-c-kit (Pharmingen). Cells were washed twice, and incubated for 30 minutes with streptavidin-PerCP (Pharmingen). After two washes cells were resuspended in PBS with 1% BSA, and purified using a MoFlo flow cytometer. The lineage-depleted bone marrow cell population was defined as the 5% cells showing least PerCP-fluorescence intensity. Stem cell yield across all BXD samples varied from 16,000 to 118,000 Lin-Sca-1+ c-kit+ cells. A small aliquot of each sample of purified cells was functionally tested for stem cell activity by directly depositing single cells in a cobblestone area forming cell assay. The remainder of the cells was immediately collected in RNA lysis buffer. Total RNA was isolated using StrataPrep Total RNA Microprep kit (Stratagene) as described by the manufacturer. RNA pellets were resolved in 500 microliters absolute ethanol, and sent on dry ice by courrier to GNF, La Jolla, CA. The March 2004 data set was processed in two batches. The first batch consisted of samples from 22 strains, BXD1, 2, 5, 6, 8, 9, 11, 12, 14, 16,18, 19, 21, 28, 31, 32, 33, 34, 38, 39, 40, 42. The second batch included 8 strains, BXD15, 22, 24, 25, 27, 29, 30, 36.
How to Download these Data:
Array data files are available on the NCBI GEO site using the accession identifier GDS1077. Individual U74Av2 arrays are GEO IDs GSM36673 through GSM36716. The single most appropriate reference is: Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, Su AI, Vellenga E, Wang J, Manly KF, Lu L, Chesler EJ, Alberts R, Jansen RC, Williams RW, Cooke M, de Haan G (2005) Uncovering regulatory pathways affecting hematopoietic stem cell function using "genetical genomics". Nature Genetics 37:225-232.
About amplification and hybridization:
Total RNA was quantified using RiboGreen and split into equal aliquots of approximately 10 ng, representing RNA from approximately 10,000 cells, and labeled using a total of three rounds of RNA amplification, exactly as described previously (Scherer et al. 2003). Labeled cRNA was fractionated and hybridized to the U74Av2 microarray from Affymetrix according to manufacture's protocol.
About data processing:
Probe (cell) level data from the CEL file: These CEL values are the 75% quantiles from a set of 36 pixel values per cell.
- Step 1: We added an offset of 1 to the CEL expression values for each cell to ensure that all values could be logged without generating negative values.
- Step 2: We took the log2 of each cell.
- Step 3: We computed the Z score for each cell within array.
- Step 4: We multiplied all Z scores by 2.
- Step 5: We added 8 to the value of all Z scores. The consequence of this simple set of transformations is to produce a set of Z scores that have a mean of 8, a variance of 4, and a standard deviation of 2. The advantage of this modified Z score is that a two-fold difference in expression level corresponds approximately to a 1 unit difference.
- Step 6: We computed the arithmetic mean of the values for the set of microarrays for each of the individual strains.
Probe set data: Probe set expression data were processed by Ritsert Jansen. The original CEL files produced by the Affymetrix analysis software were read into the R environment (Ihaka and Gentleman 1996). Data were normalized using the Robust Multichip Average (RMA) method of background correction, quantile normalization, and summarization of signal intensity (Irrizary et al. 2003). Probe set intensities were log2 transformed. Probe set data are averages of two technical replicates after batch correction (see below) and were treated as single samples. Please seee Bolstad and colleagues (2003) for a helpful comparison of RMA and two other common methods of processing Affymetrix array data sets.
Samples were processed in two batches. To adjust for the effect of technical batch processing differences, a linear model was applied to RMA normalized expression data. The following ANOVA model fitting the processing batch was applied for each set of perfect match probes:
PMij = M + Bi + eij
in which PMij are the RMA probe intensities for arrays i = 1,...,30 and probe j = 1,...,J. M is the overall mean; Bi represents the batch effect, and eij is the error term. The batch effect parameter was estimated and subtracted from PM probe expression values. Probe level intensities were averaged for each probe set to produce the batch corrected expression.
Affymetrix U74Av2 GeneChip: The expression data were generated using U74Av2 arrays. The chromosomal locations of U74Av2 probe sets were determined by BLAT analysis of concatenated probe sequences using the Mouse Genome Sequencing Consortium May 2004 (mm5) assembly. This BLAT analysis is performed periodically by Yanhua Qu as each new build of the mouse genome is released (see http://genome.ucsc.edu/cgi-bin/hgBlat?command=start&org=mouse). We thank Yan Cui (UTHSC) for allowing us to use his Linux cluster to perform this analysis. It is possiible to confirm the BLAT alignment results yourself simply by clicking on the Verify link in the Trait Data and Editing Form (right side of the Location line).
About the array probe set names:
Most probe sets on the U74Av2 array consist of a total of 32 probes, divided into 16 perfect match probes and 16 mismatch controls. Each set of these 25-nucleotide-long probes has an identifier code that includes a unique number, an underscore character, and several suffix characters that highlight design features. The most common probe set suffix is at. This code indicates that the probes should hybridize relatively selectively with the complementary anti-sense target (i.e., the complemenary RNA) produced from a single gene. Other codes include:
f_at (sequence family): Some probes in this probe set will hybridize to identical and/or slightly different sequences of related gene transcripts.
s_at (similarity constraint): All Probes in this probe set target common sequences found in transcripts from several genes.
g_at (common groups): Some probes in this set target identical sequences in multiple genes and some target unique sequences in the intended target gene.
r_at (rules dropped): Probe sets for which it was not possible to pick a full set of unique probes using the Affymetrix probe selection rules. Probes were picked after dropping some of the selection rules.
i_at (incomplete): Designates probe sets for which there are fewer than the standard numbers of unique probes specified in the design (16 perfect match for the U74Av2).
st (sense target): Designates a sense target; almost always generated in error.
Descriptions for the probe set extensions were taken from the Affymetrix GeneChip Expression Analysis Fundamentals.
Data source acknowledgment:
Cell and samples were generated by Leonid V. Bystrykh, Ellen Weersing, Bert Dontje, Gerald de Haan, Department of Stem Cell Biology, University of Groningen, the Netherlands.
RNA amplification and array processing were carried out by Michael Cooke, John Hogenesch, Andrew Su and colleagues at GNF.
The batch correction of this March 2004 data set was carried out by Ritsert Jansen and his student Rudy Albert in the Department of Bioinformatics (University of Groningen). Conversion for WebQTL was carried out by Robert W. Williams, Kenneth Manly, Jintao Wang, and Yanhua Qu at UTHSC.
Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, Su AI, Vellenga E, Wang J, Manly KF, Lu L, Chesler EJ, Alberts R, Jansen RC, Williams RW, Cooke M, de Haan G (2005) Uncovering regulatory pathways affecting hematopoietic stem cell function using “genetical genomics�? Nature Genetics, in press.
Ihaka R, Gentleman R (1996) R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 5:299-314.
Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249-264.
Gautier L, Cope L, Bolstad B, Irizarry R (2004) affy -- analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20:307-315.
Scherer A, Krause A, Walker JR, Sutton SE, Seron D, Raulf F, Cooke MP (2003) Optimized protocol for linear RNA amplification and application to gene expression profiling of human renal biopsies. Biotechniques 34:546-550, 552-554, 556.
de Haan G, Bystrykh LV, Weersing E, Dontje B, Geiger H, Ivanova N, Lemischka IR, Vellenga E, Van Zant G (2002) A genetic and genomic analysis identifies a cluster of genes associated with hematopoietic cell turnover. Blood 100:2056-2062.
Wang J, Williams RW, Manly KF (2003) WebQTL: Web-based complex trait analysis. Neuroinformatics 1:299-308.
Williams RW, Manly KF, Shou S, Chesler E, Hsu HC, Mountz J, Wang J, Threadgill DW, Lu L (2002) Massively parallel complex trait analysis of transcriptional activity in mouse brain. International Mouse Genome Conference 16:46.
Information about this text file:
This text file originally generated by GdH and RWW, March 2004. Updated by RWW, Oct 30, 2004, Dec 6, 2004. EJC Apr 25, 2005.