Cellular Component Ontology Guidelines

The cellular component ontology describes locations, at the levels of subcellular structures and macromolecular complexes. Examples of cellular components include 'nuclear inner membrane', with the synonym 'inner envelope', and the 'ubiquitin ligase complex', with several subtypes of these complexes represented.

Generally, a gene product is located in or is a subcomponent of a particular cellular component. The cellular component ontology includes multi-subunit enzymes and other protein complexes, but not individual proteins or nucleic acids. Cellular component also does not include multicellular anatomical terms.

The Cell

What is in a cell?

The cell is defined in GO as all components within and including the plasma membrane and any external encapsulating structures, such as the cell wall and the cell envelope. intracellular ; GO:0005622 is defined as the contents of the cell excluding the plasma membrane and any structures outside the plasma membrane. For this reason, cell projection ; GO:0042995 is a direct child of cell ; GO:0005623; cell projection membrane ; GO:0031253 is part of cell projection ; GO:0042995 and plasma membrane ; GO:0005886.

Membrane Proteins

As GO cellular component terms describe locations where a gene product may act, rather than physical features of proteins or RNAs, the terms integral membrane protein and peripheral membrane protein are present only as non-exact synonyms. GO distinguishes classes of membrane-related location:

extrinsic component of membrane ; GO:0019898 refers to gene products that are associated with membranes, but are neither directly embedded in the membrane nor anchored by covalent bonds to any moiety embedded in the membrane.

intrinsic component of membrane ; GO:0031224 refers to gene products that have some covalently attached moiety embedded in the membrane, and is further split into integral component of membrane ; GO:0016021 and anchored component of membrane ; GO:0031225. The former refers to proteins in which some part of the peptide sequence spans all or part of the membrane (in theory, it could also be used for RNAs embedded in a membrane, if any such exist). A subclass of this covers transmembrane proteins - those that completely span the membrane (see examples in figure 2). The latter refers to gene products tethered to a membrane by a covalently attached anchor, such as a lipid moiety, which is embedded in the membrane.

Each of these terms can have child terms referring to specific membranes, for example intrinsic component of plasma membrane ; GO:0031226 or extrinsic component of vacuolar membrane ; GO:0000306.

Additionally, monotopic (non-membrane spanning proteins, as illustrated in figure 3) can be assigned to a side of membrane : "A cellular component consisting of one leaflet of a membrane bilayer and any proteins embedded or anchored in it or attached to its surface." As for whole membranes, proteins can be more specifically annotated as integral (Fig 3. 1 & 2), anchored (Fig 3. 3) or extrinsic components (Fig 3. 4) of a side of membrane.

Figure 1 - membrane component types

Figure 1: Membrane component types.

Figure 2 polytopic proteins

Figure 2: Schematic representation of transmembrane proteins: 1. a single transmembrane α-helix (bitopic membrane protein) 2. a polytopic transmembrane α-helical protein 3. a polytopic transmembrane β-sheet protein. The membrane is represented in light-brown. Source for image and legend: Wikipedia. See link for image credit.

monotopic

Figure 3: Schematic representation of the different types of interaction between monotopic membrane proteins and the cell membrane: 1. interaction by an amphipathic α-helix parallel to the membrane plane (in-plane membrane helix) 2. interaction by a hydrophobic loop 3. interaction by a covalently bound membrane lipid (lipidation) 4. electrostatic or ionic interactions with membrane lipids (e.g. through a calcium ion). Source for image and legend: Wikipedia. See link for image credit.

Unsupported assertions about membrane proteins

The cellular component ontology does not include terms for type I, II, etc., membrane proteins, because these classifications are not locations, but instead describe a different feature of the proteins, namely topological orientation with respect to the membrane and other cellular components. Furthermore, the wording "type I integral membrane protein" describes a class of gene products.

Definition patterns

extrinsic component of X membrane: "The component of the X membrane consisting of gene products that are loosely bound to one of its surfaces, but not integrated into the hydrophobic region."

intrinsic component of X membrane: "The component of the X membrane consisting of gene products and protein complexes that have some covalently attached part (e.g. peptide sequence or GPI anchor), which spans or is embedded in one or both leaflets the membrane."

integral component of X membrane: "The component of the X membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane."

anchored component of X membrane: "The component of the X membrane consisting of the gene products and protein complexes that are tethered to the membrane only by a covalently attached anchor, such as a lipid group embedded in the membrane. Gene products with peptide sequences that are embedded in the membrane are excluded from this grouping."

spanning component of X membrane "The component of the X membrane consisting of gene products and protein complexes that have some part that spans both leaflets of the membrane."

X side of Y membrane: "The side (leaflet) of the X that faces the Y."

extrinsic component of X side of Y membrane: "The component of the Y membrane consisting of gene products and protein complexes that are loosely bound to its X surface, but not integrated into the hydrophobic region."

intrinsic component of X side of Y membrane: "The component of the X membrane consisting of gene products and protein complexes that penetrate the Y side of the membrane only, either directly or via some covalently attached hydrophobic anchor."

Ontology structure

Using plasma membrane as an example, each membrane has intrinsic and extrinsic components:

intrinsic_vs_extrinsic_membrane_comp_0.png

Each membrane has 2 sides - in this case cytoplasmic and external, each with their own intrinsic and extrinsic components:

sides_of_pm_0.png

Intrinsic and extrinsic components of the sides are subclasses (is_a children) of the corresponding terms for the whole membrane:

side_of_mem_and_in_vs_ex_0.png

Logical definitions

side of membrane terms

We have general classes for cytoplasmic and lumenal membrane sides, defined using adjacent to, e.g.

'cytoplasmic side of membrane' EquivalentTo 'side of membrane' that adjacent_to some cytoplasm

This general pattern is sufficient for classification of side of membrane terms:

'side of membrane' that (part_of some X) and (adjacent_to some Y)

For example, this allows inferred classification as 'early endosome membrane part' and 'cytoplasmic side of membrane':

'cytoplasmic side of early endosome membrane' EquivalentTo: 'side of membrane' that (part_of some 'early endosome membrane') and (adjacent_to some cytoplasm)"

extrinsic, intrinsic, integral and anchored components

'X component' that part_of some 'Y membrane/side of membrane' e.g.:

'integral component of membrane' that (part_of some 'Golgi membrane')
'integral component of membrane' that (part_of some 'lumenal side of endoplasmic reticulum membrane')

These patterns automate classification, but part_of relationships (see plasma membrane example above for a guide) still need to be added by hand. This could be automated in future by the use of GCIs in OWL, but these are rather ungainly to roll by hand and so will probably be added via termgenie templates following these patterns, or via some script-based support mechanism.

Membranes And Envelopes

Terms and structure

GO distinguishes single and double membranes surrounding organelles: an organelle envelope ; GO:0031967 is defined as two lipid bilayers plus the space, or lumen, between them, whereas an organelle membrane ; GO:0031090 is defined as a single bilayer. For double-membrane organelles, the membrane term refers to either of the lipid bilayers, but excludes the intermembrane space. The envelope is part of the organelle and is a organelle envelope ; GO:0031967; the membrane is part of the envelope, and inner membrane and outer membrane terms can be included:
  • chloroplast
    • [p] chloroplast envelope
      • [p] chloroplast membrane
        • [i] chloroplast inner membrane
        • [i] chloroplast outer membrane

...

  • organelle envelope
    • [i] chloroplast envelope

History

Prior to December 2005, nuclear envelope ; GO:0005635 was named 'nuclear membrane', with 'nuclear envelope' as a synonym; this reflected a usage fairly common in the literature. For consistency with other organelle envelope and membrane terms, GO:0005635 is now named 'nuclear envelope', consistent with its definition, and a separate term, nuclear membrane ; GO:0031965, has been added.

Standard Definitions

organelle envelope
The double lipid bilayer enclosing the organelle and separating its contents from the rest of the cytoplasm; includes the intermembrane space.
organelle membrane, organelle with a single membrane
The lipid bilayer surrounding a(n) organelle.
organelle membrane, organelle with a double membrane
Either of the lipid bilayers that enclose the organelle and form the organelle envelope.
organelle inner membrane
The inner, i.e. lumen-facing, lipid bilayer of the organelle envelope.
organelle outer membrane
The outer, i.e. cytoplasm-facing, lipid bilayer of the organelle envelope.
organelle membrane lumen
The region between the inner and outer lipid bilayers of the organelle envelope.

Standard synonyms

The following synonym can be added to terms as long as the synonym string makes sense and does not have alternative meanings. Note that the term name and synonym can be switched depending on typical usage.
organelle membrane lumen
exact_synonym: organelle intermembrane space

Protein Complexes

Definition of a Protein Complex

A cellular component should include more than one gene product; complexes of one gene product with a cofactor, e.g. heme and chlorophyll, should not be included. Homomultimeric proteins, e.g. the homodimeric alcohol dehydrogenase, may be included as cellular component terms, as should heteromultimeric proteins, e.g. hemoglobin with alpha and beta chains. All complexes in the component ontology should be given parentage under the general term protein complex ; GO:0043234 . To distinguish cellular components from functions, use 'complex' in the term name of a component, and append enzyme names with the word 'activity'. For example, the molecular function term pyruvate dehydrogenase activity ; GO:0004738 describes the enzyme activity whereas the cellular component term pyruvate dehydrogenase complex ; GO:0045254 describes the multi-subunit structure in which the enzyme activity resides.

Receptor-ligand complexes

As a rule, GO terms to indicate association of a receptor with its ligand should not be created, as their complex may not always be stable, and there could be a potential explosion of terms. However, we should allow for exceptions. The IntAct database wouldn't curate receptor-ligand complexes if these consisted of a single chain of each, but it will curate complexes when the ligand is not monomeric and receptors oligomerize upon ligand binding. An example of this is GO:1990270 'platelet-derived growth factor receptor-ligand complex', where the ligands are always dimeric and the receptor dimerizes upon ligand binding. (Also, in the case of GO:1990270, the complex has been shown to exist in a variety of experiments, including crystals, pull downs, comigrations, competition assays and more.)

Integration with SAO (Subcellular Anatomy Ontology)

The primary use of the GO Cellular Component Ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience.

Recently, the GO Cellular Component Ontology was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways, one of which was amalgamation of SAO terms with GO Cellular Component ones. As a result, nearly 100 new neuroscience-related terms were added to the GO.

A recent paper describes this effort, along with other recent developments in the GO Cellular Component Ontology.

Maintaining complete 'is_a' and 'part_of' trees in cellular component

The cellular component ontology is is_a complete, meaning that every term has a path to the root node which passes solely through is_a relationships. This should be preserved; the following guidelines should help maintain this structure.

Make sure the term has an is_a path to the root, i.e. there are is_a parent terms by at least one path all the way to 'cellular component'. Make sure the term has at least one part_of relation in its ancestry, to ensure that there are no part_of orphans. It does not need to be an immediate part_of parent, but every term has to be part_of something. So, for complex Y, this would be okay for example:

  • cell
    • [p] complex X
      • [i] complex Y
because complex Y is transitively part_of cell. Ensure that all logical is_a parents are added. So, for example, if your term is a protein complex, make sure it has the parent 'protein complex'. Or if your term, or one of its parents, is part_of cell, it will need to be is_a 'cell part', or have 'cell part' in its ancestry. Check none of the relations you create are redundant. You can check for this in OBO-Edit by using the reasoner, and then using the link filter [self] [self] [is redundant]. As an added check, a weekly job runs every Monday night (US West Coast time) to remove any redundant relationship.