Advanced Searching and General Advice
Enter one or more terms into the ANY or ALL fields. The ANY field will typical retrive more records (logical OR) whereas the ALL field will find only records that match all terms (logical AND). You can search using standard text, gene symbols, GenBank IDs, mRNA reference sequence IDs (NM_*), probe/probe set IDs or even Gene Ontology IDs (for example GO:16798). These fields are not case-sensitive; app and APP are equivalent. Terms can be separated by a space, comma, slash, colon or semicolon.
* or ? can be used to represent any of several characters. Use * for one or more characters and ? for single characters such as hyphens or periods.
When in doubt, start with short terms and use an asterisk at the start or end of the term (e.g., *enkephalin or Hoxb*). When searching for probes or probe sets such as 1436869_at_B, it is easiest to enter 1436869*.
To search for a term or word that is in GeneWiki, please just enter "wiki=xxx", for example, wiki=GENSAT to list all genes and transcripts for which there is a GeneWiki entry that includes the text string "GENSAT." These searches are not case sensitive.
A maximum of 500 characters are allowed in either search field. Approximately 60 GenBank, RefSeq, Unigene, or probe set IDs or other IDs will fit. It is a good idea to enter the full string, for example Mm.57202 including the period for Unigene IDs. You can enter the reference mRNA sequence (Refseq) for a gene, such as NM_007467. Enter them with the underscore character (_). Although *57202 will work, this search may also pick up unintended records.
As mentioned the ANY field will retrieve records that match any of the terms in any order (logical OR). A search string such as amyloid beta may generate too many records (over 1000 in some databases) because beta is so common. In contrast, the ALL field performs a logical AND operation and retrieves only records that intersect all terms. Searching for amyloid beta or beta amyloid in this field yields fewer than 50 hits.
A single Search Results page lists up to 40 records, and provides links to as many as 12 other pages and a maximum of 500 records. If a search generates more than 500 matches, you will need to make the search more selective. Try using the ? wildcard to retain a specific sequence and order of words such as in receptor?binding.
All Published Phenotypes databases can be searched by the last names of authors. These databases cannot yet be reliably searched using general terms such as morphology or neuropharm* or year of publication.
Multiple Phenotype databases can be searched in a single operation by selecting the All Species option in the pull-down selection menus ("Choose Species"). You can then enter a phrase such as "body weight" in the ALL field to generate a Search Results list of phenotypes in multiple groups (AXB, BXD, BXH, CXB, etc.).
Genotype databases can usually be searched by the name, chromosome, or location of markers. To find all markers on Chr 7, type the number 7 into either entry field. To find all markers on Chr X between 50 and 80 Mb, type this string into either entry field: Mb=(ChrX 50 80).
Set To Default: Please use the option labeled Set To Default. This allows you to change the initial database displayed when you begin a search. For this option to work, permission for cookies needs to be enabled on your browser. A cookie is a small text file stored on your computer used by our server to keep track of preferred settings. (If you are logged in for special projects, the cookie also keeps your user name and password.) To test that the Default option works properly, change the settings and reload the search page. If this does not work as expected, check the preference settings of your browser.
In some cases you may need more data than is available from a standard GeneNetwork output page. Please review the FAQ page and get familiar with the Simple Query Interface (note that this complex page may load slowly in some browsers).
Advanced Search Methods
More complex searches of some databases are possible using controlled syntax. Gene expression databases can be searched by the chromosomal locations of genes, by the average expression of their transcripts, by the range of values among cases or strains, by the peak linkage values (LRS scores), or by Gene Ontology membership. These search parameters can be combined. For example, to find all transcripts that are transcribed from genes located on chromosome 1 between 98 and 104 megabases use this search format:
- Position=(Chr1 98 104) [Note: No space between Chr and the number or letter of the chromosome. As usual, the search string is case insensitive. Commas may be added between elements for visual clarity.]
- Pos=(Chr1 98 104)
- Mb=(Chr1 98 104)
To find all transcripts with expression that average between 15.0 and 16.0 units, use this format:
To find all transcripts that vary 10-fold to 100-fold among strains or cases, use this format:
To search for a term or word that is in GeneWiki, please just enter either:
rif=XXX wiki=XXX, for example, rif=autism wiki=autism to access all genes and transcripts with either a RIF entry or WIKI entry that included "autism."
- WIKI=xxx, for example, WIKI=GENSAT to access all genes and transcripts for which there is a GeneWiki entry that includes the text string "GENSAT." These searches are not case sensitive.
- RIF=xxx, for example, RIF=autism to access all genes and transcripts for which there is a GeneRIF entry inthe GeneWiki for the term "autism".
In the examples above, the search terms are not case sensitive.
Many of the GeneNetwork databases have been exhaustively analyzed using QTL Reaper, a high throughput mapping program designed to handle large array data sets. It is possible to search most array databases to find those transcripts that have QTLs with peak LRS or LOD scores within a particular range of values. Genome-wide P values are computed using a permutation test.
To find traits by peak LRS value or by p value range, the search syntax needs to follow these rules:
- LRS=(Low_LRS_limit, High_LRS_limit): for example, LRS=(20 30) will find all traits that have a best QTL that has a peak genome-wide LRS value between 20 and 30 (LOD = LRS/4.61). It will not tell you where these QTLs are located, but it will instead provide you a list of the traits that meant this condition.
- pvalue=(Low_limit, High_limit): for example, pvalue=(0.0001 0.001) where the P value is the genome-wide significance level established by permutation. This is very similar to the LRS search above but uses permutation P values rather than LRS or LOD scores.
- CisLRS=(Low_LRS_limit, High_LRS_limit, Mb_buffer): This command will find all expression traits that have a single best QTL that is located close to the gene from which it is expressed. The inclusion buffer value (in megabases) is used to set the limits on how close the QTL peak must be to the gene location. The inclusion buffer should usually be set to a value of 10 Mb or less, depending on the mapping population. Commas are not required between parameter values.
- TransLRS=(Low_LRS_limit, High_LRS_limit, Mb_buffer): This command will find all transcripts that have a single best QTL that is not located close to the gene from which the transcript is expressed located more than the exclusion buffer value (in megabases) from the gene from which the transcript is expressed. The exclusion buffer should usually be set at greater than 10 to 20 Mb. Commas are not required.
- LRS=(Low_LRS_limit, High_LRS_limit, ChrNN, Mb_Low_Limit, Mb_High_Limit): for example, LRS=(20, 900, Chr12, 0, 130). This command will find all transcripts that have a single best QTL that is located on Chr 12 between 0 Mb and 130 Mb in the LRS range of 20 to 900. Commas are not required.
- LRS=(Low_LRS_limit, High_LRS_limit, ChrNN, Mb_Low_Limit, Mb_High_Limit) and TransLRS=(Low_LRS_limit, High_LRS_limit, Mb_buffer): for example, LRS=(20, 900, Chr12, 0, 130) transLRS=(20, 900, 25). This combination of commands will find all transcripts that have a single best trans-QTL that is located on Chr 12 between 0 Mb and 130 Mb in the LRS range of 20 to 900 with a 25 Mb exclusion buffer. Commas are not required.
You need to replace the text such as "Low_LRS_limit" with a real value such as "15". But do not use the quotes.
For example, you might type this string into the ALL field to find CisQTLs that map to Chr 1 between 170 and 180 Mb with LRS values between 100 and 500.
CisLRS=(100, 500, 10) LRS=(100, 500 chr1 170 180)
The search strings above require a database of values that we precompute using QTL Reaper. If QTL Reaper has not yet been used, then these searches will not return records.
These search strings can be combined to generate more complex queries. For example, enter these search phrases into the ALL (intersection of) field:
- Mb=(Chr1 50 100) LRS=(20 200) to find all transcripts with genes on Chr 1 between 50 and 100 Mb that also have top LRS scores in the range from 20 to 200 anywhere in the genome.
- MB=(ChrX 0 20) Mean=(10 25) to find all transcripts with genes on Chr X between 0 and 20 Mb that also have mean expression in the range from 10 to 25.
- transLRS=(9.2 1000 20) LRS=(9.2 1000 Chr11 50 80) will find all transcripts with best trans QTLs (LRS > 9.2) that map to Chr 11 between 50 and 70 Mb (with a 20 Mb trans exclusion buffer).
- Mb=(Chr2 100 200) GO:0007268 to find any transcripts on Chr 2 between 100 and 200 Mb that belong to the Gene Ontology category GO:0007268 "synaptic transmission." More below on GO searches.
- Mb=(Chr1 0 210) Mean=(12 20) TransLRS=(15 300 25) in the ALL field to find all transcripts located on Chr 1 (the Mb values of 0 and 210 cover the entire chromosome) that have mean expression above a value of 12 (quite high) and that have a major trans-acting QTL located at least 25 Mb away for the location of the transcript's "parent" gene. If this search fails, then confirm that it works when used in combination with the Hippocampus Consortium Dec05 PDNN database. You should get 13 returns, including Psmc6, Offrl1, and Ptp4a1. Start with lenient criteria to ensure that the search works with the database that interests you, and if it does, then increase the selectivity.
Searches for Categories of Genes
Gene Ontology term searches: This search feature allows you to find transcripts related to particular categories using appropriate GO identifers. For example, to extract all transcripts associated with "synapse" enter the string GO:0045202, or for more specificity, enter the string GO:0016079 for "synaptic vesicle exocytosis" in the ANY field. Similarly, to review all transcripts associated with transcriptional control AND that have high LRS scores, enter the string GO:0003700 in the ALL field along with a string such as "LRS=(30 300)". This combination will retrieve all transcription factor-associated genes with QTL scores between 30 and 300.
To browse or find GO terms and classes browse AmiGo.
Or use GoPubMed and a set of search terms such as "visual transduction photoreceptor" to extract the correct GO term and identifier "phototransduction" = GO:0007602.
As of September 2005, the GO contains approximately 20,000 terms of which 6,300 terms are associated with genes/transcripts in one or more of the GeneNetwork databases. Approximately 700 high level GO terms will return well over 200 hits. It is therefore useful to select more specific GO terms that return 100 or fewer transcripts or genes. GO search ID numbers can be used together with other search parameters (OR and AND Booleans by using the ANY and ALL fields).
Multiple Database Searches, GET commands, and Scriptable Interface Queries
Multiple database searches: It is possible to retrieve expression estimates for a single gene from many databases simultaneously by pasting a Search command into the URL entry field of your browser. Use syntax below but replace **** with the gene symbol, and decide whether you want to search rat or mouse databases (default is mouse). You can also specify a tissue (default is all tissues). The final &alias=1 arguement will search for the official symbol AND all known aliases.
- http://gn1.genenetwork.org/webqtl/main.py?cmd=sch&species=(rat or mouse)&tissue=(cerebellum or striatum or brain or hsc or fat or kidney)&gene=****&alias=1
Other GET commands
A GET command is a simple data request that takes the form of an odd looking URL address. For more details on the many allowed GET commands used by the GeneNetwork please see the Scriptable Interface overview. The Scriptable Interface is designed primarily to handle queries from other databases and web services, but you can also use this method as a quick way to generate more comprehensive output files. For example, if you need to review the complete list of correlations of Huntingtin (probe set 1425969_a_at_A) with all 45137 expression traits in the INIA Brain mRNA M430 (Apr05) PDNN database then you would paste this particular GET command in the URL box of your browser:
&sort=pvalue (Please note that this type of query may take several minutes and will not accompanied with a progress bar.)
To obtain a complete list of the database abbreviations, including databases listed on the BETA site links to http://gn1.genenetwork.org/cgi-bin/beta/main.py?cmd=help.
To completely avoid learning the structure of GET commands, the GeneNetwork also has a Simple Query Interface mentioned already once above (look under the Search menu). This interface assembles the GET command for you. All you need to do is select parameters.
The size of a GeneNetwork database can be determined by entering a single * in either search field.
Administrators and Curators: The command Flag=N searches for records in which a review request flag value has been entered. 0 = unmodified due to data conflict or overwrite risk, 1 = excellent BLAT score of probe set and no known problem, 2 = poor BLAT score requiring verification, 3 = potentially serious probe set position or identification problem requiring further curation or caution.