Standard Operating Procedures for Entering Phenotype Data modify this page

The following standard operating procedures (SOP) are intended to help investigators who would like to enter new data sets into GeneNetwork. These instructions are applicable both to standard genetic reference panels such as the BXD strains and to standard F2 or backcross populations. Please follow these instructions.

If your data are for a genetic reference population such as a set of recombinant inbred strains please prepare a single Excel workbook with two major spreadsheets.

  1. Case_Data Spreadsheet. This spreadsheet should contain the individual cases (or case pools) that were phenotyped. This file contains the core data along with cofactors and other information. In the case of an F2 or N2 this may be the only spreadsheet in the Excel workbook. In the case of a genetic reference population you can skip these spreadsheet if you do not want to make individual case data available. But in that case you will definitely need to provide us with second spreadsheet.
  2. Line_Data Speadsheet: this spreadsheet contains the strain or line averages, the standard errors of the means, and the numbers of samples per line (usually independent replicates).

The top part of both the Case_Data and the Mean_Line_Data spreadsheets should be organized as follows:

Please use the first 26 rows (1 to 26) to enter information pertaining to the data source (metadata). Include information for as many as possible of the following types of data, each in its own row.

If you do not have data for a row, please leave it blank. (Single * signifies data we would like; two ** signifies data we really really need)

  • **#1. name of experiment and data set (no more than 60 letters)
  • *#2. description of this particular spreadsheet of data (for example: Brain Weight of Males, or, Strain Mean Data)
  • **#3. full names of investigators (for example, Joanna Q. Kent, Jr.)
  • *#4. mailing address of one lead investigator (Department of Zoology, University of Tennessee, 880 Washington Street, Knoxville, TN, 38144, USA)
  • *#5. telephone number of one lead investigator (901-448-7040)
  • *#6. current email contacts for questions (myemail@institution.edu)
  • #7. URL addresses of laboratories involved in study or URLs/URIs of supplementary data (http://aURLgoesHere.html)
  • #8. the purpose of the study (no more than 100 words)
  • *#9. short summary of the experimental design (no more than 100 words)
  • #10. methods of statistical analyses
  • #11. any other information about the design, experiment, or data
  • #12. any other information about the design, experiment, or data
  • #13. any other information about the design, experiment, or data
  • #14. any other information about the design, experiment, or data
  • **#15. information about publication status (published, unpublished, submitted, in press, confidential)
  • *#16. PubMed identifier numbers for this data set (can be added later)
  • #17. PubMed identifiers for recent relevant papers from same group of investigators for background information
  • #18. lists of abbreviations and units in this general format: Abbrev1=First abbreviation, units; Abbrev1=Second abbreviation, units. For example: OF=Open Field, percentage; PM=Plus Maze, count of beam interruptions; LD=Light Dark, seconds; BW=body weight, gm
  • *#19. year or range of dates over which data were acquired
  • *#20. places at which data were acquired and processed
  • #21. general methods used to acquire data
  • *#22. source of material (animal or plant stock source). Example: The Jackson Laboratory for mice, Med Associates (Light:Dark box), Ethovision (open field and plusmaze)
  • *#23. geographical location source material was raised or processed (e.g., Annex 2, The Jackson Laboratory, Bar Harbor, MA)
  • *#24. funding sources
  • **#25. name and email address of person who prepared this data file
  • **#26. date that this data file was created and last modified (Created: 2/6/06; Modified: 2/8/06)
  • **#27. Preferred name of Species (binomial) and common name (Mus musculus, mouse; Rattus novegicus, rat). This will be used in CHOOSE SPECIES pull down menu of GeneNetwork
  • *#28. Preferred name of group (e.g., BXD, B6BTBRF2, BayXSha) to be used in the GROUP menu of GeneNetwork
  • #29. Preferred name of Data Type. Please look at the TYPE pulldown menus in GeneNetwork to see common options such as "Genotypes", "Phenotypes", "Eye mRNA"
  • #30. Date when the data should be public (unless there are compelling reasons, agreed to in advance, usually no later than 2 years after data receipt)
  • #31. blank
  • #32. blank
  • #33. blank
  • #34. blank
  • #35. blank

Below the metadata section, you now need to place the data for either individual cases, strains, or lines.

Fore the CASE_DATA use the following structure:

  1. The header information for all columns of data should be in ROW 30.
  2. The first column head (A30) should be labeled "Case_Index" and this column should start at 00001 and continue in increments of 1 to the last case listed in this spreadsheet.
  3. The second column (B30) should be labeled "Case_ID" and this column should contain a unique identifier that begins with any trio of letters and have at least 6 additional numbers, such as EJS099912. This unique identifier should NOT encode sex or strain or age of cases. However, it may encode the laboratory or investigator. The second column must be a single word and may not contain other punctuation marks. Note that a "case" may include data from more than one individual; for example a pool of three individuals may be treated as a single case.
  4. The third column (C30) should contain your laboratory ID in what ever form you have used.
  5. The fourth column should contain the official strain or line background in so far as you know it. For example, C57BL/6J, or B6D2F2, or BXD12, or HXB12, or AXB1xBXA3F1. Please make every effort to use the recommended and complete nomenclature for the strain, hybrid, intercross, backcross, recombinant inbred strain, congenic, consomic, accession, or line. However, please do not use hyphens for recombinant inbred strains (use BXD2, not BXD-2/Ty). We use these data to determine alleles segregating in a cross.
  6. The fifth column should contain the age of the case if applicable. If not applicable, just leave the column blank The sixth column should contain the sex of the case if applicable. If not applicable, just leave the column blank
  7. The seventh through sixteenth columns (10 columns) should contain any cofactors that you wish. These may include treatment variables, conditions, time points, batches, operator, machine, concentrations, etc. If not applicable, just leave these ten columns blank.
  8. The seventeenth column should contain the value of trait 1. The head of the column should be labeled T1_YourTextHere, where "YourTextHere" and "T1_" or "T2_" is just an indicator that we will use to confirm that this is the first trait. 'YourTextHere" will be used as the short name of that trait. The short name of the trait must be under 21 characters long. Please start with unique features of the trait, such as T1_HeartWeightMale rather than T1_MaleWeightHeart. You may add indicators of the units at the end, but please do so after an underscore, such as HeartWeightMale_mg. These short labels are used on graphs and that is why we impose the length limit. For legibility, please use a mix of UpperAndLowerCase to label traits rather than ALLUPPERCASE or alllowercase.
  9. The eighteenth column should contain the standard error of the mean of trait 1 if applicable
  10. The nineteenth column should contain the number of samples in the case. For example, if the case is actually a pool of data from four individuals, then 4 should be entered in column 19. In most cases, the value will be 1.
  11. Repeat mean, SEM, N in trios across the columns for each trait that your have for that case.

For LINE_DATA use the following structure. Please note that if you have male and female strain means, or young and old strain means, it will be easiest to prepare a separate LINE_DATA spreadsheet for each data type.

  1. The header information for all columns of data should be in ROW 30.
  2. The first column head (A30) should be labeled "Line_Index" and this column should start at 00001 and continue in increments of 1 to the last case listed in this spreadsheet.
  3. The second column (B30) should be labeled "Line_ID" and this column should either remain blank or contain a unique identifier that begins with any trio of letters and have at least 6 additional numbers, such as ECG199912. However, it may NOT be the same as the ID used in the CASE_DATA. This unique identifier should NOT encode sex or strain or age of cases. It may encode the laboratory or investigator. The second column must be a single word and may not contain other punctuation marks. Note that a "line" will often include data from more than one individual; for example a pool of three individuals may be used as a single line.
  4. The third column (C30) should contain your laboratory ID in what ever form you have used. You can leave this blank.
  5. The fourth column should contain the official strain or line name in so far as you know it. For example, C57BL/6J, or B6D2F2, or BXD12, or HXB12, or AXB1xBXA3F1. Please make every effort to use the recommended and complete nomenclature for the strain, hybrid, intercross, backcross, recombinant inbred strain, congenic, consomic, accession, or line. However, please do not use hyphens (use BXD2, not BXD-2/Ty). We use these data to determine the alleles segregating in a cross. Please put these strains in some reasonable order. BXD strains should be in proper numerical order, not in alphanumerical order (example of an incorrect order: BXD1, BXD11, BXD12, BXD2, BXD21, BXD5).
  6. The fifth column should contain the average age of the line if applicable. If not applicable, just leave the column blank The sixth column should contain the sex of the data if applicable-- M, F, or MF when combined. If not known, just leave the column blank.
  7. The seventh through sixteenth columns (10 columns) should contain any cofactors that you wish. These may include treatment variables, conditions, time points, batches, operator, machine, concentrations, etc. If not applicable, just leave these ten columns blank.
  8. The seventeenth column should contain the value of trait 1. The head of the column should be labeled T1_YourTextHere, where "YourTextHere" and "T1_" or "T2_" is just an indicator that we will use to confirm that this is the first trait. 'YourTextHere" will be used as the short name of that trait. The short name of the trait must be under 21 characters long. Please start with unique features of the trait, such as T1_HeartWeightMale rather than T1_MaleWeightHeart. You may add indicators of the units at the end, but please do so after an underscore, such as HeartWeightMale_mg. These short labels are used on graphs and that is why we impose the length limit. For legibility, please use a mix of UpperAndLowerCase to label traits rather than ALLUPPERCASE or alllowercase.
  9. The eighteenth column should contain the standard error of the mean of trait 1, if available
  10. The nineteenth column should contain the number of samples used to generate the line data. For example, if the line is actually based on four samples, then 4 should be entered in column 19.
  11. Repeat mean, SEM, N in trios across the columns for each trait that your have for that case.

       

       

      We hope to eventually provide an on-line template for this "Moderate Amount of Information About Design, Experiment, and Data" (MAIDED), but for the time being, if you can at least fill in these lines, we will be in good shape.

       

File started Sept 2005 by RWW. This version by RWW, Jan 31, 2006.