What is a GO subset?
GO subsets (also known as GO slims) are cut-down versions of the GO ontologies containing a subset of the terms in GO. They give a broad overview of the ontology content without the detail of the specific fine-grained terms.
How are GO subsets used?
GO subsets are particularly useful for giving a summary of the results of GO annotation of a genome, microarray, or cDNA collection when broad classification of gene product function is required. Some groups annotate to GO subsets that are relevant to their domain of interest, rather than using the full GO.
Who creates and maintains GO subsets?
GO subsets are created by users according to their needs, and may be specific to species or to particular areas of the ontologies. GO provides a generic GO subsets which, like the GO itself, is not species-specific, and which should be suitable for most purposes. Alternatively, users can create their own GO subsets or use one of the model organism-specific subsets integrated into GO. Please email the GO helpdesk for more information about creating and submitting your GO subsets.
GO subsets available
Maintained GO subsets
The GO subsets in this list are maintained as part of the GO flat file. The files available below for download are generated by script from that file.
|Organism or Usage||Download|
|Generic GO subset Developed by GO Consortium||OBO format|
|Plant subset Developed by The Arabidopsis Information Resource||OBO format|
|Candida albicans Developed by Candida Genome Database||OBO format|
|Protein Information Resource subset Developed by Darren Natale, PIR||OBO format|
|Schizosaccharomyces pombe subset Developed by Val Wood, PomBase||OBO format|
|Yeast subset Developed by Saccharomyces Genome Database||OBO format|
|Aspergillus subset Developed by Aspergillus Genome Data||OBO format|
|Metagenomics subset Developed by Jane Lomax and the InterPro group||OBO format|
|Virus subset Developed by Jane Lomax and Rebecca Foulger.||OBO format|
|Chembl Drug Target subset developed by Prudence Mutowo and Jane Lomax||OBO format|
For internal checking purposes we also provide two "anti-slims"
- Do not annotate -- the set of high level terms that are useful for grouping, but should have no direct annotations
- Do not manually annotate -- as above, but it's permitted for automated tools to make direct annotations to these
Archived GO Slims
There is also an archive of deprecated GO slims that are no longer maintained or updated. These files have been deposited for two reasons; the first is to give easy access to the GO slim used in a particular publication or analysis; the second is for reuse by others in the community.
Users should note that the majority of these GO slims are no longer maintained by the authors, and they may contain GO terms which are now obsolete. All archival GO slims are in the deprecated GO flat file format.
|Topic / Usage||Information||Download|
|Generic GO slim||Suparna Mundodi and Amelia Ireland Aug 2002||old GO format|
|Honey bee ESTs||C.W. Whitfield, M.R. Band, M.F. Bonaldo, C.G. Kumar, L. Liu, J.R. Pardinas, H.M. Robertson, M.B. Soares, G.E. Robinson, PMID:11923340 Apr 2002||old GO format|
|Drosophila||M. Adams, M. Ashburner, G.M. Rubin, S.E. Lewis et al.; Adams et al., PMID:10731132 Mar 2000||old GO format|
|Glossina ESTs||M. Berriman Sep 2002||old GO format|
|UniProtKB-GOA||N. Mulder, M. Pruess PMID:12230037 Nov 2002||old GO format|
|Mouse||The RIKEN Genome Exploration Group Phase II Team and the FANTOM Consortium PMID:11217851 Feb 2001||old GO format|
|P. falciparum||M. Berriman July 2002||old GO format|
|Plant||Suparna Mundodi Dec 2002||old GO format|
|Rice (Beijing)||J. Yu et al. PMID:11935017 Apr 2002||old GO format|
|Rice (Syngenta)||J. Yu et al.PMID:11935018 Apr 2002||old GO format|
|Yeast||SGD curators Aug 2003||old GO format|
Map2Slim option in OWLTools
Given a GO subset file, and a current ontology (in one or more files), the Map2Slim script will map a gene association file (containing annotations to the full GO) to the terms in the GO subset. This script is an option of OWLTools, and it can be used to either create a new gene association file, which contains the most pertinent GO slim accessions, or in count-mode, in which case it will give distinct gene product counts for each subset term.
Background information and details on how to download, install, and implement OWLTools, as well as instructions on how to run the Map2Slim script are available from the OWLTools Wiki at https://github.com/owlcollab/owltools/wiki/Map2Slim.
On the web
Similarly, there are a couple of online tools that may be of use. The first is the Princeton slimming tool, the second is the legacy amigo slimmer. It should be noted that online tools do often contain limitations and timeouts.
In addition to GO subsets, GO also provides a prokaryote-specific subset of GO terms. This subset contains only terms that are applicable to prokaryotes, so for example, the terms nucleus and mitochondrion are excluded from the subset while the terms membrane and cytoplasm are included. At approximately 7000 terms, the set is much larger than the GO slims and it is intended to be used as a viewing tool for reducing the complexity of the ontologies for those users not interested in viewing eukaryotic terms, rather than to get a high-level view of an annotation set in the way that GO slims are used.
Unlike GO subsets, the prokaryotic subset is not generated as a separate file, but is stored as a category in the GO ontology file. It is possible to view the terms in this subset by searching for gosubset_prok in Protégé.