- Ontology Structure
- GO as a Graph
- One Ontology... or Three?
- Obsolete Terms
- Term Structure
- Essential Elements
- Optional Extras
- Sample GO Term
- Cross-Products and Logical Definitions
- Cross-Products with external ontologies
What is an Ontology?An ontology is a formal representation of a body of knowledge, within a given domain. Ontologies usually consist of a set of classes or terms with relations that operate between them. The domains that GO represents are biological processes, functions and cellular components.
GO as a GraphThe structure of GO can be described in terms of a graph, where each GO term is a node, and the relationships between the terms are edges between the nodes. GO is loosely hierarchical, with 'child' terms being more specialized than their 'parent' terms, but unlike a strict hierarchy, a term may have more than one parent term (note that the parent/child model does not hold true for all types of relation, see the relations documentation). For example, the biological process term hexose biosynthetic process has two parents, hexose metabolic process and monosaccharide biosynthetic process. This is because biosynthetic process is a subtype of metabolic process and a hexose is a subtype of monosaccharide.
The following diagram is a screenshot from the ontology editing software OBO-Edit, showing a small set of terms from the ontology.
A set of terms under the biological process node pigmentation.
In the diagram, relations between the terms are represented by the colored arrows; the letter in the box midway along each arrow is the relationship type. Note that the terms get more specialized going down the graph, with the most general terms—the root nodes, cellular component, biological process and molecular function—at the top of the graph. Terms may have more than one parent, and they may be connected to parent terms via different relations. The GO relations documentation describes these relations in greater detail.
One Ontology... or Three?As the diagram above suggests, the three GO domains (cellular component, biological process, and molecular function) are each represented by a root ontology term. All terms in a domain can trace their parentage to the root term, although there may be numerous different paths via varying numbers of intermediary terms to the ontology root. The three root nodes are unrelated and do not have a common parent node, and hence GO is three ontologies. Some graph-based software may require a single root node; in these cases, a "fake" term can be added as a parent of the three existing root nodes.
The three GO ontologies are is_a disjoint, meaning that no is_a relations operate between terms from the different ontologies. However, other relationships such as part_of and regulates do operate between the GO ontologies. For example the molecular function term 'cyclin-dependent protein kinase activity' is part_of the biological process 'cell cycle'.
Unique identifier and term nameEvery term has a term name—e.g. mitochondrion, glucose transport, amino acid binding—and a unique zero-padded seven digit identifier (often called the term accession or term accession number) prefixed by GO:, e.g. GO:0005125 or GO:0060092. The numerical portion of the ID has no inherent meaning or relation to the position of the term in the ontologies. Ranges of GO IDs are assigned to individual ontology editors or editing groups, and can thus be used to trace who added the term.
NamespaceDenotes which of the three sub-ontologies—cellular component, biological process or molecular function—the term belongs to.
DefinitionA textual description of what the term represents, plus reference(s) to the source of the information. All new terms added to the ontology must have a definition; there remains a very small set of terms from the original ontology that lack definitions, but the vast majority of terms are defined.
Relationships to other termsOne or more links that capture how the term relates to other terms in the ontology. All terms (other than the root terms representing each namespace, above) have an is a sub-class relationship to another term; for example, GO:0015758 : glucose transport is a GO:0015749 : monosaccharide transport. The Gene Ontology employs a number of other relations, including part of (e.g. GO:0031966 : mitochondrial membrane part of GO:0005740 : mitochondrial envelope) and regulates (e.g. GO:0006916 : anti-apoptosis regulates GO:0012501 : programmed cell death). The relations documentation has more information on the relations used in the ontology.
Secondary IDs (also known as Alternate ID)Alternate IDs that refer to a term. Secondary IDs come about when two or more terms are identical in meaning, and are merged into a single term. All terms IDs are preserved so that no information (for example, annotations to the merged IDs) is lost.
SynonymsAlternative words or phrases closely related in meaning to the term name, with indication of the relationship between the name and synonym given by the synonym scope. The scopes for GO synonyms are:
- an exact equivalent; interchangeable with the term name
- e.g. ornithine cycle is an exact synonym of urea cycle
- the synonym is broader than the term name
- e.g. cell division is a broad synonym of cytokinesis
- the synonym is narrower or more precise than the term name
- e.g. pyrimidine-dimer repair by photolyase is a narrow synonym of photoreactive repair
- the terms are related in some way not covered above
- e.g. cytochrome bc1 complex is a related synonym of ubiquinol-cytochrome-c reductase activity virulence is a related synonym of pathogenesis