Guide to GO Evidence Codes

This document is a guide to the standard usage of the GO evidence codes. Annotators may also find the evidence code decision tree useful in selecting the correct evidence code for an annotation.

Introduction

A GO annotation consists of a GO term associated with a specific reference that describes the work or analysis upon which the association between a specific GO term and gene product is based. Each annotation must also include an evidence code to indicate how the annotation to a particular term is supported. Although evidence codes do reflect the type of work or analysis described in the cited reference which supports the GO term to gene product association, they are not necessarily a classification of types of experiments/analyses. Note that these evidence codes are intended for use in conjunction with GO terms, and should not be considered in isolation from the terms. If a reference describes multiple methods that each provide evidence to make a GO annotation to a particular term, then multiple annotations with identical GO identifiers and reference identifiers but different evidence codes may be made.

Out of all the evidence codes available, only Inferred from Electronic Annotation (IEA) is not assigned by a curator. Manually-assigned evidence codes fall into four general categories: experimental, computational analysis, author statements, and curatorial statements.

Use of an experimental evidence code in a GO annotation indicates that the cited paper displayed results from a physical characterization of a gene or gene product that has supported the association of a GO term. The Experimental Evidence codes are:

Use of the computational analysis evidence codes indicates that the annotation is based on an in silico analysis of the gene sequence and/or other data as described in the cited reference. The evidence codes in this category also indicate a varying degree of curatorial input. The Computational Analysis evidence codes are:

Author statement codes indicate that the annotation was made on the basis of a statement made by the author(s) in the reference cited. The Author Statement evidence codes used by GO are:

Use of the curatorial statement evidence codes indicates an annotation made on the basis of a curatorial judgement that does not fit into one of the other evidence code classifications. The Curatorial Statement codes are:

All of the above evidence codes are assigned by curators. However, GO also used one evidence code that is assigned by automated methods, without curatorial judgement. The Automatically-Assigned evidence code is:

Evidence codes are not statements of the quality of the annotation. Within each evidence code classification, some methods produce annotations of higher confidence or greater specificity than other methods, in addition the way in which a technique has been applied or interpreted in a paper will also affect the quality of the resulting annotation. Thus evidence codes cannot be used as a measure of the quality of the annotation.

Tags: 
User story: 

Experimental Evidence Codes

The experimental evidence codes are:

EXP: Inferred from Experiment

This code is used in an annotation to indicate that an experimental assay has been located in the cited reference, whose results indicate a gene product's function, process involvement, or subcellular location (indicated by the GO term). The EXP code is the parent code for the IDA, IPI, IMP, IGI and IEP experimental codes.

The EXP evidence code can be used where any of the assays described for the IDA, IPI, IMP, IGI, or IEP evidence codes is reported. However it is highly encouraged that groups should annotate to one of the more specific experimental codes (IDA, IPI, IMP, IGI, or IEP) instead of EXP, and all curators directly involved in the GO Reference Genome annotation effort are obliged to use these and not EXP.

The EXP code exists for groups who would like to contribute high-quality GO annotations that are produced from directly associating GO terms to gene products by citing experimental published results, but where the group is unable to fit the appropriate specific experimental GO evidence code to each annotation.

A published reference should always be cited in the reference column, and no value should be entered into the with/from column of EXP annotations. 

IDA: Inferred from Direct Assay

Updated November 9, 2007

  • Enzyme assays
  • In vitro reconstitution (e.g. transcription)
  • Immunofluorescence (for cellular component)
  • Cell fractionation (for cellular component)
  • Physical interaction/binding assay (sometimes appropriate for cellular component or molecular function)

The IDA evidence code is used to indicate a direct assay was carried out to determine the function, process, or component indicated by the GO term. Curators therefore need to be careful, because an experiment considered as a direct assay for a term from one ontology may be a different kind of evidence for a term from another of the ontologies. In particular, there are more kinds of direct assays for cellular component than for function or process. For example, a fractionation experiment might provide "direct assay" evidence that a gene product is in the nucleus, but "protein interaction" (IPI) evidence for its function or process.

For transfection experiments or other experiments where a gene from one organism or tissue is put into a system that is not its normal environment, the annotator should use the author's intent and interpretation of the experiment as a guide as to whether IMP or IDA is appropriate. When the author is comparing differences between alleles, regardless of the simplicity or complexity of the assay, IMP is appropriate. When the author is using an expression system as a way to investigate the normal function of a gene product, IDA is appropriate.

Examples where the IDA evidence code should be used:

  • Binding assays can provide direct assay evidence for annotating to the xxx binding molecular function terms. (Use IDA only when no identifier can be placed in the with/from column; when there is an appropriate ID for the with/from column, use IPI).
  • Assays describing the isolation of a complex by immunoprecipitation of a tagged subunit should use IDA, not IPI. Thus this type of assay can provide IDA for annotation to a component term for the specific complex because it is a direct assay for a complex.
  • Transfections into a cell line, overexpression, or ectopic expression of a gene when the expression system used is considered to be an assay system to address basic, normal functions of gene product even if it would not normally be expressed in that cell type or location. If the experiments were conducted to assess the normal function of the gene and the assay system is believed to reproduce this function, i.e., the authors would consider their experiment to be a direct assay, and not a comparison between various alleles of a gene, then the IDA code should be used. This is in contrast with a situation where overexpression affects the function or expression of the gene and that difference from normal is used to make an inference about the normal function; in this case use the IMP evidence code.

Examples where the IDA evidence code should not be used:

  • Binding assays where it is possible to put an ID corresponding to the specific binding partner that was shown to interact directly the gene product being annotated should be annotated with the IPI code, not with IDA.
  • Transfection into a cell line, overexpression, or ectopic expression of a gene where the effects of various alleles of a gene are compared to each other or to wild-type. For this type of experiment, annotate using IMP.

IPI: Inferred from Physical Interaction

Updated October 10, 2014

  • 2-hybrid interactions
  • Co-purification
  • Co-immunoprecipitation
  • Ion/protein binding experiments

Covers physical interactions between the entity of interest and another molecule (such as a protein, ion or complex). IPI can be thought of as a type of IDA, where the actual binding partner or target can be specified, using "with" in the with/from field.

Often it is difficult to tell from the evidence presented in a paper whether an interaction is direct or not. Any in vivo/cell lysate method always has the possibility of a third 'bridge' protein - there are many examples of this happening in, for example, yeast 2-hybrid experiments when yeast proteins have proven essential for interactions between two human proteins to occur. The only methods that show direct evidence of two proteins binding are when the two proteins have been isolated and pre-purified. Ideally, curators should only capture direct interactions however, it is acceptable to curate interactions even if it is not known whether they are direct or not.

Examples where the IPI evidence code should be used:

  • Binding assays where it is possible to put an ID corresponding to the specific binding partner that was shown to interact with the entity being annotated should be annotated with the IPI code, not with IDA.
  • Annotations to the GO term ‘binding’ (GO:0005488) or ‘protein complex' (GO:0043234), or their child terms, which are supported by the isolation of a complex by co-immunoprecipitation or pull-down assays may use IPI with the ID corresponding to the ‘antibody target' or ‘tagged' subunit in the with/from column.
  • The GO term ‘protein binding’ (GO:0005515) should only be used with the evidence code IPI and an identifier in the ‘with’ field. A reciprocal annotation must also be made to indicate the interaction in the opposite direction.
  • Annotations to Molecular Functions (except ‘catalytic activity’ GO:0003824 or its child terms, see below) or Biological Processes may be made using IPI and an entry in the ‘with/from’ field in order to indicate the inference that the annotated entity is involved in the process or function because it interacts with another entity that was shown experimentally to be involved in that process or function.

Examples where the IPI evidence code should not be used:

  • The GO term ‘protein binding’ (GO:0005515) should not be used to describe an antibody binding to another protein. However, an effect of an antibody on an activity or process can support a function or process annotation, using the IMP code.
  • Annotations to the GO term ‘binding’ (GO:0005488), or its child terms, which are supported by binding assays where it is NOT possible to put an ID corresponding to the specific binding partner that was shown to interact with the gene product being annotated should be annotated with the IDA code, not with IPI (see table 1).
  • Annotations to the GO term ‘catalytic activity’ (GO:0003824), or its child terms, should not use the IPI evidence code. It is unlikely that enough information can be obtained from a binding interaction to support such an annotation.
Table 1. Example annotation where it is not possible to add the interacting partner.
Object_ID Object_Symbol GO term Reference Evidence With/From
MGI:2137706 Actn1 GO:0051015 (actin filament binding) PMID:15465019 IDA -

Usage of the With/From Column for IPI.

All proteins or gene products annotated using the IPI evidence code should include an identifier in the with/from column, identifying the other protein, macromolecule or chemical involved in the interaction. When multiple entries are placed in the with/from field, they are separated by pipes. Consider using IDA when no identifier can be entered in the with/from column.

The exception to this rule is when annotating a macromolecular complex identifier to an equivalent complex GO term, e.g. annotating the IntAct Complex Portal identifier ‘EBI-706546’ (human BCL-2 homodimer) to ‘BCL-2 complex’ (GO:0097148). In this case, it is acceptable to use the IPI evidence code without an entry in the with/from field, since the member subunits of the complex are encoded in the complex identifier.

e.g. In Table 2 the protein kinase CK2 variant 2 complex (EBI_1253636) has been annotated with the GO term “protein kinase CK2 complex”. The complex was identified by co-purification of the subunits, therefore there is no need to indicate the subunits in the With/From field as these are already encoded in the object identifier.

Table 2. Example of an annotation to a complex identifier.
Object_ID Object_Symbol GO term Reference Evidence With/From
EBI-1253636 protein kinase CK2 variant 2 GO:0005956 (protein kinase CK2 complex) PMID:12242279 IPI -

Note that when a complex binds, for example, a protein or chemical that is not an integral part of the complex, it is necessary to indicate in the With/From field the entity to which the complex is binding.

Two examples of how the with/from column is used with IPI are shown in Table 3. Abcd3, a mouse protein, is annotated to protein binding ; GO:0005515, based on Liu et al., 1999 (PMID:10551832). The with/from field has the UniProt protein ID of the protein Abcd3 binds to. In the second example Alb, a rat protein, is annotated to drug binding based on Harada et al., 2002 (PMID:12458670). In this case the CHEBI ID (chemical ID) of the drug that Alb binds to is provided in the with/from column.

Table 3. Example annotations that use entries in the with/from field.
Object_ID Object_Symbol GO term Reference Evidence With/From
MGI:1349216 Abcd3 GO:0005515 (protein binding) PMID:10551832 IPI UniProt:P33897 (human ABCD1)|UniProt:Q61285 (mouse ABCD2)
RGD:2085 Alb GO:0008144 (drug binding) PMID:12458670 IPI CHEBI:28939 (N-acetyl-L-cysteine)

Note: For an interacting protein, a protein ID is recommended in the with/from column for a IPI annotation, but a gene ID may be used if the database does not have identifiers for individual gene products. A gene ID may also be used if the cited reference provides enough information to determine which gene ID should be used, but not enough to establish which protein ID is correct, for example, in cases where there is a one-to-many relationship between a gene and its protein products. Note that there has been some discrepancy between groups as to the use of the with/from column; please see the note on usage of the with/from column for more details. 

IMP: Inferred from Mutant Phenotype

 Updated November 9, 2007

  • mutations, natural or introduced, that result in partial or complete impairment or alteration of the function of that gene
  • polymorphism or allelic variation (including where no allele is designated wild-type or mutant)
  • any procedure that disturbs the expression or function of the gene, including RNAi, anti-sense RNAs, antibody depletion, or the use of any molecule or experimental condition that may disturb or affect the normal functioning of the gene, including: inhibitors, blockers, modifiers, any type of antagonists, temperature jumps, changes in pH or ionic strength.
  • overexpression or ectopic expression of wild-type or mutant gene that results in aberrant behavior of the system or aberrant expression where the resulting mutant phenotype is used to make a judgment about the normal activity of that gene product.

The IMP evidence code covers those cases when the function, process or cellular localization of a gene product is inferred based on differences in the function, process, or cellular localization between two different alleles of the corresponding gene. The IMP code is used for cases where one allele may be designated 'wild-type' and another as 'mutant'. It is also used in cases where allelic variation occurs naturally and no specific allele is designated as wild-type or mutant. Caution should be used when making annotations from gain-of-function mutations as it may be difficult to infer a gene's normal function from a gain of function mutation, although it is sometimes possible.

For transfection experiments or other experiments where a gene from one organism or tissue is put into a system that is not its normal environment, the annotator should use the author's intent and interpretation of the experiment as a guide as to whether IMP or IDA is appropriate. When the author is comparing differences between alleles, regardless of the simplicity or complexity of the assay, IMP is appropriate. When the author is using an expression system as a way to investigate the normal function of a gene product, IDA is appropriate. Examples where the IMP code should be used

  • use of an inhibitor of a gene product's activity in order to see the effect of absence, or significant depletion, of that gene product. For example, an experiment using baicalein to inhibit the activity of 12-LOX in a murine bladder cancer cell line inhibits cell proliferation in a concentration dependent manner (see PMID:15161019) results in an annotation to the GO term cell proliferation using the IMP evidence code for the 12-LOX gene.
  • transfection into a cell line, overexpression, or extopic expression of a gene where the effects of various alleles of a gene are compared to each other or to wild-type. For this type of experiment, annotate using IMP.
  • In situations where a mutation in gene A provides information about the function, process, or component of gene B do not use IGI. Use IMP evidence code and use column-16 or the Annotation Extension column to provide additional data. For example, if a mutation in gene A causes a mislocalization of gene B, gene A is annotated to protein localization using IMP and the gene B identifier is added to the Annotation Extension column with the appropriate relationship.

Examples where the IMP code should not be used

  • mutation in gene B provides information about gene A being annotated. For this type of experiment, use the IGI code.
  • complementation of a mutation in one organism by a gene from a different organism.
  • Transfections into a cell line, overexpression, or ectopic expression of a gene when the expression system used is considered to be an assay system to address basic, normal functions of gene product even if it would not normally be expressed in that cell type or location. If the experiments were conducted to assess the normal function of the gene and the assay system is believed to reproduce this function, i.e., the authors would consider their experiment to be a direct assay, and not a comparison between various alleles of a gene, then the IDA code should be used. This is in contrast with a situation where overexpression affects the function or expression of the gene and that difference from normal is used to make an inference about the normal function; in this case use the IMP evidence code.

Usage of the With column for IMP We recommend making a "with" entry in the with/from column when using this evidence code to indicate the identifier for the allele in which the phenotype was observed. When multiple entries are placed in the with/from field, they are separated by pipes. Example for how the with/from column should be filled in

  • The mouse gene product Actc1 (actin, alpha, cardiac ; MGI:87905), has a GO annotation to muscle thin filament assembly ; GO:0030240, inferred from mutant phenotype, IMP of MGI:2180072 (symbol: Actc1tm1Jll; name: targeted mutation 1, James Lessard), from PMID:9114002. MGI:2180072 is entered in the with/from column for this annotation.

IGI: Inferred from Genetic Interaction

Updated March 1, 2016

  • Genetic interactions involving two or more mutations that result in suppression or enhancement of a given phenotype, also synergistic (synthetic) interactions
  • Co-transfection experiments in which two or more genes are expressed in a heterologous system to assess functional interaction
  • Expression of one gene alters the phenotypic outcome of a mutation in another gene; the two genes may or may not be from the same species. In the literature, these types of experiments are variably referred to as: functional complementation, rescue experiments, or suppression

The IGI evidence code is used for annotations based on experiments reporting the effects of perturbations in the sequence or expression of one or more genes or gene products. IGI is also used for experiments that interrogate functional interactions between two or more genes or gene products when co-expressed, for example, in a cell line. Additional uses of IGI include experiments in which the expression of one gene affects the phenotypic outcome of a mutation in another gene.

Key to deciding whether or not to use the IGI or IMP (Inferred from Mutant Phenotype) evidence code is consideration of the point of reference (i.e., what is being compared) to determine a possible interaction. If experiments interrogate the effects of multiple mutations or differences from the control, then use IGI. If experiments interrogate the effects of a single mutation or difference from the control, then use IMP.

The IGI evidence code requires curators enter a stable database identifier for the interacting entity in the With/From field of the Gene Association File (GAF). Independent interactors may be captured in the With/From field by separating each entry with a pipe. If the interaction experiment involves multiple perturbations simultaneously, e.g. triply mutant strains, then the respective interactors are separated with a comma.

Examples where the IGI evidence code should be used:

Genetic interactions such as suppression, enhancement, synergistic (synthetic) interactions, etc.

This use of the IGI evidence code refers to the more “traditional” genetic interaction experiments performed in model organisms, such as Saccharomyces cerevisiae, as well as more recent approaches adopted in a number of different systems such as RNA-mediated knockdown or genome editing techniques. Note that genetic interaction experiments may be performed with both loss-and gain-of-function mutations. Consequently, curators will need to use their expertise to determine whether interaction phenotypes resulting from gain-of-function mutations are informative about the normal, wild type role of a gene or gene product.

  • Example 1: Double loss-of-function mutations resulting in enhancement of a mutant phenotype
  • Localized cell wall degradation is essential for proper cell fusion in the fission yeast, Schizosaccharomyces pombe. This process is accomplished by the localized action of degradative enzymes including several distinct glucanases that act on differentpolysaccharides. Deletion of multiple glucanases in S. pombe results in decreasing efficiency of cell fusion indicating thateach enzyme contributes additively to this process.
  • exg3 fungal-type cell wall disassembly involved in conjugation with cellular fusion (GO:1904541) PMID:25825517 IGI agn2
  • agn2 fungal-type cell wall disassembly involved in conjugation with cellular fusion (GO:1904541) PMID:25825517 IGI exg3
  • Example 2: Gain-of-function mutation
  • The response to axonal injury requires the activities of MAP kinase and cAMP signaling pathways that are required, for example, for signaling growth cone formation. In C. elegans, the activity of the upstream-most kinase in one of the MAPK signaling pathways, DLK-1, is stimulated by Ca2+ influx mediated by the EGL-19 voltage-gated calcium channel. EGL-19’s regulatory role in the MAPK-mediated axon regeneration pathway was determined, in part, through doubly mutant animals containing an egl-19 hypermorphic mutation that results in occasional action potentials with significantly prolonged plateau phases and a dlk-1 loss-of-function mutation that showed a reduced axon regenerative response when compared to egl-19 alone.
  • EGL-19 positive regulation of MAPK cascade involved in axon regeneration (GO:1904922) PMID:20203177 IGI DLK-1
  • Note that in this example, reciprocal IGI annotations are not made, as the GO term selected for EGL-19 does not make sense for DLK-1.
  • Example 3: Synergistic (synthetic) interactions
  • Disruption of the MSB2 gene in S. cerevisiae has no appreciable effects on the cell's ability to activate the High-Osmolarity Glycerol (HOG) pathway upon osmotic stress, or on cellular growth on high-osmolarity media. To identify potential osmosensors in the SHO1 branch of the HOG pathway, the authors screened for a mutant that is osmosensitive only in an msb2Δ background and recovered mutations in the HKR1 gene. Like MSB2, mutations in HRK1 alone confer no osmosensitivity to the cells.
  • MSB2 hyperosmotic response (GO:0006972) PMID:17627274 IGI HKR1
  • HKR1 hyperosmotic response (GO:0006972) PMID:17627274 IGI MSB2

Co-transfection experiments

Co-transfection experiments include those experiments where two or more gene products are expressed in a heterologous system, such as a cell line, for the purposes of interrogating a functional interaction between them.

  • Example 1: Co-transfection of G protein-coupled receptors (GPCRs)
  • In C. elegans, the response to dauer pheromone, a mixture of small molecules, is mediated by G protein-coupled receptors (GPCRs). Genetic analysis has implicated two GPCRs, SRBC-64 and SRBC-66, in a signaling pathway that responds to specific components of dauer pheromone. To assess the biochemical role of SRBC-64 and SRBC-66, the gene products were expressed singly or in combination in HEK293 cells. Only when expressed in combination were the GPCRs able to enhance forskolin-stimulated cAMP production.
  • SRBC-64 G-protein coupled receptor signaling pathway (GO:0007186) PMID:19797623 IGI SRBC-66
  • SRBC-66 G-protein coupled receptor signaling pathway (GO:0007186) PMID:19797623 IGI SRBC-64

Expression of one gene affects the phenotype of a mutation in another gene

These types of experiments are described in various ways in the published literature, but generally involve expressing a wild-type copy of one gene in the background of a mutation in a second, different gene to determine if the expressed gene can mask the phenotype of the mutated gene. The two genes may or may not be from the same species. When genes from different species are analyzed it is often with the intent of demonstrating functional conservation between species.

  • Example 1: Genes from different species
  • C. elegans contains two genes, lgg-1 and lgg-2, with sequence similarity to the Saccharomyces cerevisiae ubiquitin-like protein Atg8 that is required for autophagosome biogenesis. Transformation of lgg-1, but not lgg-2, into atg8 deletion mutants in nitrogen starvation medium results in increased survival compared to atg8 mutants alone, indicating that lgg-1 can functionally complement budding yeast atg8.
  • lgg-1 (C. elegans) macroautophagy (GO:0016236) PMID:20523114 IGI atg8 (S. cerevisiae)
  • For these annotations, the With/From column should list the identifier for the endogenous gene that is complemented by the heterologously expressed gene being annotated. In annotations from cross-species functional complementation experiments, the gene referred to in the With/From column will thus be from a different species than the gene being annotated.
  • Example 2: Different genes from the same species
  • The planar cell polarity pathway is critical for a number of biological processes including epidermal wound repair. Activity of the GRHL3 transcription factor is essential for efficient wound repair in mice and human cell lines. Wound repair requires activation of the RhoA small GTPase to effect the cellular polarization, actin polymerization and epidermal migration critical to wound closure. The gene encoding the RhoGEF RhoGEF119, a RhoA GTPase activator, is a transcriptional target of GRHL3, and RHOGEF119 activity is also required for wound repair. Expression of human RhoGEF119 in human Grhl3-kd cell lines rescues the actin polymerization defects resulting from loss of Grhl13, indicating a role for RhoGEF119 in regulation of actin cytoskeletal organization during wound repair.
  • ARHGEF19 positive regulation of actin cytoskeleton organization (GO:0032956) PMID:20643356 IGI GRHL3
  • GRHL3 positive regulation of actin cytoskeleton organization (GO:0032956) PMID:20643356 IGI ARHGEF119
  • Note that rescue experiments may be used to help determine the order in which gene products act within a biological pathway or process.
  • Example 3: Different genes from the same species
  • Localized assembly of a filamentous actin (F-actin) network at the leading edge of D. discoideum cells is required for proper chemotaxis towards the cAMP chemoattractant. The organization of actin filaments is regulated by intracellular pH; an increase in pH is necessary for chemotaxis and required the Na+/H+ exchanger Ddnhe1. Expression of DdAip1, the D. discoideum ortholog of Actin-interacting protein 1, suppresses the chemotaxis defect of Ddnhe1 mutants by restoring the F-actin network, thus illustrating DdAip1's role in actin filament polymerization.
  • aip1 actin filament polymerization (GO:0030041) PMID:20668166 IGI nhe1

When NOT to use IGI

A mutation in one gene affects some property of another gene

Some experiments assess a functional interaction between one or more gene products by examining the effects that mutations in one gene have on the properties of another. These types of experiments are annotated using the IMP (Inferred from Mutant Phenotype) evidence code and the target, or affected gene product, may be captured as an Annotation Extension. The key here is that the genetic perturbation is directed at only one of the gene products in the experiment.

  • For example, treatment of cells with GSK3B antagonists results in nuclear accumulation of the GATA6 transcription factor. This experiment indicates that GSK3B negatively regulates GATA6 localization.
  • GSK3B negative regulation of protein localization to nucleus (GO:1900181) PMID:23624080 transports_or_maintains_localization_of GATA6

Expression of a gene is used to restore the normal function of the same gene

Evidence for a gene's role in a given biological process can be evaluated by expression of a wild-type copy of the gene to "rescue" the phenotype of a mutation in that gene. These experiments, since they involve the same gene, are not considered genetic interactions and may instead be used to support an IMP annotation.

  • For example, loss-of-function mutations in the C. elegans phosphoinositol-5-phosphatase inpp-1 exhibit defective Ca2+ signaling in the AWA chemosensory neuron in response to odorant stimulus. Expression of inpp-1 from a genomic fosmid clone or from an AWA-specific promoter restores the wild-type AWA-mediated odorant response.
  • inpp-1 response to odorant (GO:1990834) IMP

Expression of a miRNA affects expression of a target gene as determined via a reporter assay

A reporter assay is a common way to determine the target(s) of a miRNA. The 3'UTR of an mRNA containing specific miRNA binding sites fused to a reporter gene is transfected into cells together with the miRNA. If the mRNA is a bona fide target the miRNA binds to and reduces the expression of the reporter gene. The assay is assessing the action of the miRNA on the target, not how the two entities work together to affect some process, therefore the evidence code should not be IGI. Since the effect of the miRNA can be determined without any perturbation, the evidence code used is IDA. Often the authors will perform a perturbation experiment as well, but this is not required to see the effect of the miRNA on the target.

  • For example, in luciferase reporter assays with a construct containing a full-length Snai1 3′UTR sequence, miR-133 transfection strongly repressed the luciferase activity by 60%. Mutations of either predicted miR-133-binding site in the Snai1 3′UTR reduced the responsiveness to miR-133, which was almost absent with mutations of both sites, suggesting direct binding of miR-133 to both sites (Fig​.4C).
  • Human miR-133a mRNA binding involved in posttranscriptional gene silencing (GO:1903231) PMID:24920580 IDA has_direct_input SNAI1
  • Human miR-133a gene silencing by miRNA (GO:0035195) PMID:24920580 IDA regulates_expression_of SNAI1

IEP: Inferred from Expression Pattern

Updated November 9, 2007

  • Transcript levels or timing (e.g. Northerns, microarray data)
  • Protein levels (e.g. Western blots)

The IEP evidence code covers cases where the annotation is inferred from the timing or location of expression of a gene, particularly when comparing a gene that is not yet characterized with the timing or location of expression of genes known to be involved in a particular process. Use this code with caution! It may be difficult to determine whether the expression pattern really indicates that a gene plays a role in a given process, so the IEP evidence code is usually used in conjunction with high level GO terms in the biological process ontology.

Note that we have not yet encountered any examples where we feel it is valid to make annotations to terms from the cellular component or molecular function ontologies on the basis of expression pattern data. Thus we currently recommend that this code be restricted to annotations to terms from the biological process ontology. Also, different annotating groups use different identifiers (gene or protein or gene_product) and no inference should be made as to whether an annotation made using IEP concerns a gene, RNA or protein.

Examples where the IEP evidence code should be used:

  • genes upregulated during a stress condition may be annotated to the process of stress response (for example, heat shock proteins)
  • genes selectively expressed at specific developmental stages in specific organs may be annotated to xxx development

Example annotations:

  • PMID:10748035. Both mRNA and protein levels of Atp2a2 (SERCA2) are increased upon ER stress in a pattern highly similar to BiP, a well-characterized endoplasmic reticulum (ER) chaperone with a role in the ER stress response. Therefore Atp2a2 may be annotated to 'response to endoplasmic reticulum stress' with IEP.
  • PMID:17627301. Primate IRF2BPL (EAP1) expression increases selectively at puberty in the hypothalamus, and IRF2BPL is expressed in neurons involved in the inhibitory and facilitatory control of reproduction. Ideally there should be additional support for a role of the gene product in the process. In this example, PMID: 17627301 shows that human IRF2BPL (EAP1) activates genes required for reproductive function, and represses inhibitory genes. Therefore primate IRF2BPL may be annotated to 'development of secondary female sexual characteristics' with IEP.

Examples where the IEP evidence code should not be used:

  • Function and component annotations should not be made with IEP.
  • Exogenous expression or overexpression of a gene should be not annotated using IEP; only the normal expression pattern should lead to an IEP annotation.
  • Overexpression of a gene causing increased activity of an enzyme should be annotated to IDA or IMP (see IDA documentation)
  • Overexpression (wild type or mutated) of a gene causing an abnormal phenotype should be annotated to IMP
  • Exogenous expression of a gene and assaying of its function should be annotated to IDA (like a transcription factor)
  • Binding assays with overexpressed proteins or exogenously expressed proteins should be annotated to IPI for protein binding or IDA for binding to other molecules.
  • Observation of protein localization for a component annotation should be made using the IDA evidence code.
  • Annotation to the molecular function term transcription factor activity where the experimental evidence is that introduction of the gene to be tested into an in vitro assay system leads to expression of the appropriate reporter gene. Annotate using the IDA evidence code.
  • Annotation to a binding molecular function term, e.g. calmodulin binding, where the experiment was to screen an expression library (a library expressing various proteins) to identify which of the library proteins interact with a particular protein of interest. Annotate using the IPI evidence code with the accession number of to the interacting protein (or its corresponding gene) in the with/from field.
  • Annotating an enzymatic function to a Molecular Function Term based on an overexpression experiment. Since this is not the normal expression pattern, the IEP code does not apply. IDA would be the appropriate evidence code for this annotation. Annotating guanylate cyclase 2f from rat (GC-F), to the Molecular Function term guanylate cyclase activity, based on the experimental result that over-production of GC-E and GC-F in COS cells resulted in production of or increase in of guanylyl cyclase activity (PMID:7831337). IDA would be the appropriate evidence code for this annotation.

Computational Analysis Evidence Codes

The Computational Analysis Evidence Codes are:

ISS: Inferred from Sequence or Structural Similarity

Updated April 1, 2008

The ISS evidence code or one of its sub-categories should be used whenever a sequence-based analysis forms the basis for an annotation and review of the evidence and annotation has been done manually. If the annotation has not been reviewed manually, the correct evidence code is IEA, even if the evidence supporting the annotation is all sequence based. ISS should be used if a combination of sequence-based tools or methods are used. If only one particular type of sequence-based evidence is used then one of the more specific sub-categories of ISS may be more appropriate for the annotation. There are three specific sub-categories of ISS, mentioned briefly here and described in more detail below:

  • ISA: If the primary piece of evidence is a pairwise or multiple alignment then ISA (Inferred from Sequence Alignment) would likely be the appropriate evidence code to use.
  • ISO: If the primary piece of evidence is the assertion of orthology between the gene product and a gene product in another organism, ISO (Inferred from Sequence Orthology) would likely be the appropriate evidence code to use.
  • ISM: If any kind of sequence modeling method (e.g. Hidden Markov Models) is the primary piece of evidence then the ISM (Inferred from Sequence Model) code is the most appropriate.

ISS can also be used for structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction. In practice, ISS annotations are rarely, if ever, made purely from structural information. When included, structural information is generally at the level of secondary structure modeling or prediction derived from sequence information. Secondary structure information is particularly useful as one component of RNA gene predictions and in some domain models.

Population of the with field is important when using the ISS code or one of its sub-categories. The entry in with is the accession of the object or model to which your query has similarity. It is mandatory for annotators to make an entry in the with field when using the ISS code or one of its sub-categories if the annotation is based on an alignment with other proteins (e.g UniProt) or a sequence model contained in a database (e.g. Pfam, InterPro). If the annotation is based on a method such as tRNASCAN, which cannot be referred to with an accession number, the with field may be left empty. Entries in the with field should be in the format database:accession, where database is one of the abbreviations listed in the GO database abbreviations collection and accession is the accession number of the object the sequence similarity is with. Multiple entries in the with field should be separated by pipes.

If the searches and evaluation of the sequence-based data are described in a published paper, the ID (either one assigned by PubMed or one assigned by another database such as a Model Organism Database) of the paper should be placed in the reference column. However, if the group that is doing the GO annotation performed the searches and evaluation of the sequence-based data, and there is no published reference, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations. In all cases, the ID of the reference describing the methodology of the sequence analysis should be placed in the reference column.

Examples of when to use ISS:

  • An ISS annotation is often based on more than just one type of sequence-based evidence. Often, a host of searches are performed for any given query protein. These searches might include BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, etc. Evaluation of output from these search tools (bear in mind that every search may not yield results for every protein) leads an annotator to a particular ISS annotation for a particular protein. For example, a BLAST search might reveal that a query protein matches an experimentally characterized protein from another species at 50% identity over the full lengths of both proteins. After reading literature about the match protein, the curator sees that the match protein is known to contain a domain located in the plasma membrane and another domain that extends into the cytoplasm. It is also known from the literature that the experimentally characterized match protein requires the binding of ATP to function. TMHMM analysis of the query protein predicts several membrane spanning regions in one half of the protein (consistent with location in a membrane). In addition there are PROSITE and Pfam results which reveal the presence of an ATP-binding domain in the other half of the protein which TMHMM predicts to be cytoplasmic. These four search results taken together point to a probable identification of the query protein as having the function of the match protein.
  • PMID:8674114 describes comparative analysis of several newly identified and previously characterized snoRNAs. They list a number of sequence features, both conserved sequence elements and a region of complementarity to rRNA, and spacings that are characteristic of box C/D snoRNAs. As the authors don't develop a predictive method, the analysis they describe isn't considered to be a model, so ISM is not appropriate. As being a member of the box C/D snoRNA family is predictive for being a methylation guide, one could make annotations for a number of snoRNAs based on this paper. Note that the yeast U24 gene (snR24) is also experimentally characterized in this paper. Thus, for snR24 from S. cerevisiae, it is possible to make annotations using both the ISS and the IMP evidence codes, or one might choose not to make the ISS-based annotation for snR24 since experimental evidence is available.

ISA: Inferred from Sequence Alignment

  • Sequence similarity with experimentally characterized gene products, as determined by alignments, either pairwise or multiple (tools such as BLAST, ClustalW, MUSCLE).
  • An entry in the with field is mandatory.

The ISA code is a sub-category of the ISS code. It should be used whenever a sequence alignment is the basis for making an annotation, but only when a curator has manually reviewed the alignment and choice of GO term or if the information is in a published paper, the authors have manually reviewed the evidence. Such alignments may be pairwise alignments (the alignment of two sequences to one another) or multiple alignments (the alignment of 3 or more sequences to one another). BLAST produces pairwise alignments and any annotations based solely on the evaluation of BLAST results should use this code. GO policy states that in order to assert that a query protein has the same function as a match protein, the match protein MUST be experimentally characterized. This prevents transitive annotation errors. A transitive annotation error occurs when a protein gets its annotation by virtue of a match to an uncharacterized protein that may itself have gotten its annotation from yet another uncharacterized protein, and so on. With the high number of genome sequences currently in the public databases, the risk of transitive annotation errors is high. However, by requiring that every alignment used for a GO annotation contain an experimentally characterized protein, transitive annotation errors can be significantly reduced.

The process of evaluating a sequence alignment involves checking that the length of the matching region and the percent identity with the matching sequence are sufficient to infer shared function. Residues or secondary structures that are important for function should be conserved. The guiding principle in making sequence similarity based annotations should be that there is a good reason to believe that the comparison is relevant. This evaluation may be carried out by the curator, when sequence analysis is performed by the curators, or by authors of a published paper, when the curator is making annotations based on literature. In literature-based annotation it is incumbent upon the curator to identify which of the proteins in the sequence analysis are experimentally characterized so as to populate the with field.

A note about when to use ISO (inferred from sequence orthology) instead of ISA: If it is known that the experimentally characterized match protein in question is the functional ortholog of the query protein, then the code ISO (Inferred from Sequence Orthology) may be used (see the ISO section below). Orthologs are generally determined from phylogenetic analysis using algorithms such as maximum likelihood or nearest neighbor joining. The presumption is that orthologs often have the same/similar biological function and/or engage in the same or similar biological processes. It can sometimes be difficult to determine when proteins are orthologs of each other, but if one is confident of orthology the orthology specific code should be used.

Note that we have not set definitive numerical cutoffs for the extent or percentage identity of sequence similarity comparisons because groups annotating very different organisms from the current MODs / reference genomes may find that a given arbitrarily selected numerical cutoff does not work when applied to a new organism. It is up to each annotating group to use judgment as to what sequence similarity comparisons are relevant for the purpose of making GO annotations.

It is mandatory to make an entry in the with column when using ISA. The entry in with is the accession number of the experimentally characterized sequences(s) that match the query sequence. Multiple entries in the with field should be separated by pipes. Annotations made with ISA without an entry in the with field will be filtered out by the Annotation File Format Quality Control script which is run monthly.

If the generation and evaluation of the alignment was described in a published paper and then curated by a GO annotator, a reference to the paper should be placed in the reference column. However, if the same group that is doing the GO annotation performed the generation and evaluation of the alignment, then a reference should be placed in the reference column that describes the methodology used. If there is no publication for this methodology, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations.

Examples of when to use ISA:

  • A curator generates a pairwise alignment between a query Haemophilus influenzae protein that he/she is trying to annotate and a Vibrio marinus protein. The curator sees that the Vibrio protein is experimentally characterized. The curator evaluates the alignment and sees that the two proteins match over nearly their entire lengths at 68% identity. Furthermore, after reading information on the characterized Vibrio protein the curator looks for the important residues needed for catalysis and binding in the Vibrio protein and finds that they are conserved in the Haemophilus protein. The curator reads the available literature on the Vibrio protein to determine what is known about that protein. The curator can then assign GO terms to the Haemophilus protein based on what has been experimentally determined in the Vibrio protein. The code for this annotation is ISA, the accession number of the Vibrio protein should be placed in the with field. If the process used by the curator for evaluation of the sequence alignments is not in a published paper they should refer to a GO standard reference, for example GO_REF:0000012.
  • A curator performs sequence similarity analysis on a group of genes, (e.g. sequence similarity alignments of the human NDUFS8 gene (UniProtKB accession: O00217) with several other genes) and identifies several genes with very high sequence identity to the experimentally characterized human HDUFS8 gene: orangutan and chimpanzee (both 100% sequence identity), crab-eating macaque (95% identity), and gorilla (92% identity). The curator judged that these high sequence matches to the human sequence meant that all proteins possessed a similar function, therefore, annotations were made for the related genes in orangutan (UniProt:Q5RC7), macaque (UniProt:Q60HE3), chimpanzee (UniProt:Q0MQI3), and gorilla (UniProt:Q0MQI2) by ISS with the experimentally characterized human NDUFS8 protein, and the accession number of the human NDUFS8 gene was included in the with column for each of these annotations. As there is no published paper describing this sequence analysis, the id of the GO_REF (e.g. GO_REF:0000024) that describes the process the curator carried out to make this judgment is placed in the REF_DB_ID field.
  • PMID:2165073 identifies a new gene, AAC3, that is similar to two known genes of the same species (S. cerevisiae) based on Southern hybridization. Cloning and sequencing of the new AAC3 gene indicates that it is similar to the previously characterized ADP/ATP translocators AAC1 and PET9. For the AAC3 gene, an annotation may be made to the function term ATP:ADP antiporter activity using the evidence code ISA; the reference is the paper which performed the analysis and the accession numbers of the experimentally genes with which AAC3 was aligned (AAC1 and PET9) should be placed in the with field.
  • PMID:12507466 describes a set of proteins containing both experimentally confirmed and predicted N-terminal acetyltransferases (NATs) that were collected and assigned to orthologous groups based on phylogenetic analysis. Three of the groups, Ard1, Mak3, and Nat3, were named based on the well characterized gene by that name from S. cerevisiae that is a member of the group. In addition, a previously unknown group with unknown substrate specificity was identified, called Nat5 based on the name of the S. cerevisiae member of the group. About the Nat5 family, the authors make this statement Nat5p represents a family of the putative NATs with orthlogous proteins identified in yeast, S. pombe, C. elegans, D. melanogaster, A. thaliana and H. sapiens. The finding of this new family is only based on sequence similarity of Nat5p (YOR253Wp) to other NATs. Our attempts to detect any Nat5p substrates in yeast by 2D-gel electrophoresis has been so far unsuccessful, but this may reflect the rarity of the substrates in vivo or that Nat5p is acting on the smaller polypetides with mobility parameters undetectable by our regular 2D-gel procedure. As a protein with sequence similarity to other NATs, the annotation that may be made for NAT5 is to the function term peptide alpha-N-acetyltransferase activity. Although this paper clearly discussed orthology relationships, the evidence code for this annotation for NAT5 is ISA because it is not based on the orthology relationship, but merely on similarity with the other experimentally characterized NATs in yeast, MAK3, ARD1, and NAT3, and the accession numbers of these three genes should be placed in the with field. The reference is the paper which performed the analysis, Note that this paper may also be used for annotations using the ISO code when the annotation is based on the orthology relationships described in the paper.

ISO: Inferred from Sequence Orthology

  • Pairwise or multiple alignments between a query protein and experimentally characterized match proteins when the proteins are established to be orthologs of each other.
  • Phylogenetic analysis of a set of proteins to define orthologous groups.
  • An entry in the with field is mandatory.

The ISO code is a sub-category of the ISS code. Orthology is a relationship between genes in different species indicating that the genes derive from a common ancestor. Orthology is established by multiple criteria generally including amino acid and/or nucleotide sequence comparisons and one or more of the following:

  • phylogenetic analysis
  • coincident expression
  • conserved map location
  • functional complementation
  • immunological cross-reaction
  • similarity in subcellular localization
  • subunit structure
  • substrate specificity
  • response to specific inhibitors

It should be noted that there are known cases where a gene in one organism is significantly different in size from its ortholog(s) in other species. For example, the U2 snRNA in S. cerevisiae is much larger than vertebrate U2 snRNAs due to several additional domains. However it has been shown that both S. cerevisiae and vertebrate U2 snRNAs have the same conserved core and perform the same basic role in the spliceosome, even though a simplistic sequence comparison might miss this due to the large size difference between U2 in S. cerevisiae and U2 in mammalian species.

When making an annotation using the ISO evidence code, an entry in the with field is mandatory. This entry will be the accession number of an experimentally characterized orthologous gene product. The matching orthologous gene product must have substantiating experimental evidence to support the annotation. In addition, there will be cases where a gene product in one species is the ortholog of several closely related paralogous genes in another species. In these cases, the ID for all of these paralogs should be included in the with field. Annotations made with ISO without an entry in the with field will be filtered out by the Annotation File Format Quality Control script.

If the paper being used to make the annotation demonstrates the orthology, then that paper is used as the reference for that annotation. However, if the group doing the annotation is establishing orthology and there is no published reference, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations. For e.g., GO_REF:0000096 describes MGI's practice of transferring experimental GO annotations from rat and human to mouse genes based on orthology evidence (i.e. ISO).

It is important to note that if revised predictions on orthologous protein sets are produced at a later time than the original annotation, annotations should be updated accordingly.

Example of when to use ISO:

  • PMID:12507466 describes a set of proteins containing both experimentally confirmed and predicted N-terminal acetyltransferases (NATs) that were collected and assigned to orthologous groups based on phylogenetic analysis. Three of the groups, Ard1, Mak3, and Nat3, were named based on the well characterized gene by that name from S. cerevisiae that is a member of the group. Proteins in these orthologous groups without experimental characterization can be assigned the function term peptide alpha-N-acetyltransferase activity based on orthology to the experimentally characterized proteins within the orthologous group. The evidence code for this annotation is ISO, the reference is the paper which performed the analysis, and the accession numbers of the experimentally characterized members of the orthologous group should be placed in the with field. The paper also makes it clear that the genes, ARD1, MAK3, and NAT3 are well characterized experimentally, thus one could use the relevant one of these genes in the with field for annotations of members of their orthology groups without further reading. There may be additional characterized genes in each group, but it is not obvious from the paper. Also note that this paper also describes a putative Nat5 family only based on sequence similarity of Nat5p (YOR253Wp) to other NATs. As there is no experimentally characterized member of the Nat5 family, no annotations may be made based on the Nat5 orthology grouping, though see the ISA section for a description of the annotation which may be made for NAT5.

ISM: Inferred from Sequence Model

  • Prediction methods for non-coding RNA genes such as tRNASCAN-SE, Snoscan, and Rfam
  • Predicted presence of recognized functional domains or membership in protein families, as determined by tools such as profile Hidden Markov Models (HMMs), including Pfam and TIGRFAM
  • Predicted protein features using tools such as TMHMM (transmembrane regions), SignalP (signal peptides on secreted proteins), and TargetP (subcellular localization)
  • Any other kind of domain modeling tool or collections of them such as SMART, PROSITE, PANTHER, InterPro, etc.
  • An entry in the with field is required when the model used is an object with an accession number (as found with Pfam, TIGRFAM, InterPro, PROSITE, Rfam, etc.) The with field may be left blank for tools such as tRNAscan and Snoscan where there is not an object with an accession to point to.

The ISM code is a sub-category of the ISS code. The ISM code should be used any time that evidence from some kind of statistical model of a sequence or group of sequences is used to make a prediction about the function of a protein or RNA. Generally, when searching sequences with these modeling tools, the results include statistical scores (such as e values and cutoff scores) that help curators decide when a result is significant enough to warrant making an annotation. If an annotator manually checks these scores and determines if the result makes sense in the context of other information known about the sequence and decides that the evidence warrants a particular annotation, then the evidence code is ISM. However, if a tool that looks only at the scores makes annotations automatically and there is no manual review, the evidence code should be IEA.

It is important to note that some models are more functionally specific than others. In particular this is seen in the profile HMMs and somewhat in PROSITE motifs. Some HMMs are built so that all of the proteins used in building the model and all of the proteins that score well to the model have the exact same function. These models can therefore be used to predict precise functions in match proteins. Other models are built to reflect the shared sequence found among members of superfamiles or subfamilies. These can be used to predict varying levels of functional specificity and may often only provide very general annotations such as identification of a protein as an oxidoreductase. Finally, many models predict the presence of particular domains in a protein which may or may not provide information on the function of a protein, for example the CUB domain is found in a functionally diverse set of proteins and does not allow annotations to function to be made based on its presence alone. Therefore it is very important during the manual annotation process to assess what information it is safe to conclude from a match to any given model.

Some of the sequence-based modeling techniques result in models specific to individual sequence families. The profile HMMs, PROSITE motifs, and InterPro are in this group. In such cases, the with field should be populated with the accession number of the model specific for the functional domain or protein in question. Other sequence-based modeling techniques such as tRNASCAN and Snoscan are methods that result in the prediction of a set of sequences within a particular class (e.g. tRNAs, snoRNAs) and there are not specific models that one can link to each ncRNA. In these cases the with field may be left blank.

If the search for, and evaluation of, the sequence-based model data was described in a published paper, a reference to the paper should be placed in the reference column. However, if the search for and evaluation of the data was performed by the same group that is doing the GO annotation, then a reference should be placed in the reference column that describes the methodology used. If there is no publication for this methodology, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations.

Examples of when to use ISM

  • A curator performs an HMM search for a query protein. The result is that the query protein scores above the trusted cutoff to the HMM PF05426 alginate lyase. This HMM describes a family of alginate lyases. After review of all documentation associated with the HMM to determine functional specificity, or lack thereof, of the HMM and review of the scores that the query protein received, if the curator is confident that the query protein is indeed an alginate lyase, the appropriate annotations should be made using ISM as the evidence code, and putting Pfam:PF05426 in the with column. Since this search and evaluation was performed by the curator, a GO standard reference should be used to describe the search and evaluation methods (e.g. GO_REF:0000011).
  • A paper describes using PROSITE searches with the protein of interest and concludes the protein has a particular binding activity based on a match to a particular PROSITE motif. The curator would make the appropriate GO annotations, using ISM as the evidence code, putting the accession number of the PROSTIE motif that provided the evidence in the with column, and the PMID number of the paper that described the work in the reference column.
  • A curator runs the program tRNAscan (Lowe, T.M. and Eddy, S.R. NAR, 1997) on a newly sequenced bacterial genome to find the tRNAs. tRNAscan produces a list of the tRNA genes contained within that genome. A curator checks the results of the analysis to make sure that the predictions make sense and are consistent with what is known about the organism. Each of theses genes is given appropriate annotations for a tRNA. The evidence code is ISM, and a reference describing the process the curator used (either a published paper or a GO standard reference) should be placed in the reference column. The with column may be left blank.
  • PMID:10024243 describes the use of a probabilistic model to predict snoRNA genes in yeast. Each of theses genes may be given appropriate annotations for a snoRNA. The evidence code is ISM, and the reference is the paper describing the work. The with column may be left blank.

IGC: Inferred from Genomic Context

Updated November 9, 2007

  • operon structure
  • syntenic regions
  • pathway analysis
  • genome scale analysis of processes

This evidence code can be used whenever information about the genomic context of a gene product forms part of the evidence for a particular annotation. Genomic context includes, but is not limited to, such things as identity of the genes neighboring the gene product in question (i.e. synteny), operon structure, and phylogenetic or other whole genome analysis.

IGC may be used in situations where part of the evidence for the function of a protein is that it is present in a putative operon for which the other members of the operon have strong sequence or literature based evidence for function. The presence of the gene in an operon specific for a particular function, pathway, complex, etc. is itself a form of evidence. It is encouraged that when using this code with operon structure that the id numbers for the genes in the operon be put in the with/from field.

The IGC evidence code can also be used to annotate gene products encoded by genes within a region of conserved synteny. For instance, sequence similarity alone may be too low to make an inference but orthology can often be predicted based on the position of a gene within a region of synteny and this used to strengthen the assertion. In these cases the with/from field should be used to store the identity of the positional ortholog.

In the area of process annotations, in order for us to assert that a gene product is involved in a particular process in the cell, that process itself must be happening in that cell. The only way to know if a process is happening is to determine if all of the elements required for that process are present. This is often accomplished by looking to see if there are genes in the genome which can complete every step in the process in question. The same holds true for subunits of protein complexes. This often entails examining many different gene products and many different evidence types found all around the genome of an organism to reach a particular conclusion.

When the method used to make annotations using the IGC code is performed internally by the annotating group and is not published, a short description of the method should be written and added to the GO Consortium's collection of GO references, where it will be given a GO_REF ID which can be used to cite the reference in gene association files.

Usage of the With/From Column for IGC

We recommend making an entry in the with/from column when using this evidence code. In cases where operon structure or synteny are the compelling evidence, include identifier(s) for the neighboring genes in the with/from column. In casees where metabolic reconstruction is the compelling evidence, and there is an identifier for the pathway or system, that should be entered in the with/from column. When multiple entries are placed in the with/from field, they are separated by pipes.

Note that there has been some discrepancy between groups as to the use of the with/from column; please see the Note on Usage of the With/from Column for more details.

...

2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... TIGR_CMR:gene_B_ID gene B   GO:0009231 GO_REF:0000025 IGC operon_geneA_ID|operon_geneC_ID (from operon in annotated organism) ...
... TIGR_CMR:gene_A_ID gene A   GO:0009102 PMID:15347579 IGC TIGR_GenProp:GenProp0036 ...

IBA: Inferred from Biological aspect of Ancestor

Updated May 3, 2011

  • A type of phylogenetic evidence whereby an aspect of a descendent is inferred through the characterization of an aspect of a ancestral gene.

IBD: Inferred from Biological aspect of Descendent

Updated May 3, 2011

  • A type of phylogenetic evidence whereby an aspect of an ancestral gene is inferred through the characterization of an aspect of a descendant gene.

IKR: Inferred from Key Residues

Updated May 2, 2012

  • A type of manually-curated evidence derived from sequence analysis, characterized by the lack of key sequence residues. All annotations that apply this evidence code should use the 'NOT' qualifier. This evidence code is used to annotate a gene product when, although homologous to a particular protein family, it has lost essential residues and is very unlikely to be able to carry out an associated function, participate in the expected associated process, or found in a certain location. This annotation statement can be supported by a published literature reference (e.g. a PubMed identifier) that has described the sequence analysis efforts, or by a GO Reference that describes the process a curator undertook to become sufficiently convinced of the sequence mutation. Where an IKR annotation statement is made using a GO Reference, inclusion of an identifier in the 'with/from' column of the annotation format that can indicate to the user the lacking residues (e.g. an alignment, domain or annotation rule identifier) is absolutely required. In contrast, when an IKR annotation statement is supported by a published literature reference,a value in the 'with/from' field is highly recommended although not required. This evidence code is also referred to as IMR (inferred from Missing Residues).

Examples where the IKR evidence code should be used:

  • Curator-Determined IKR Annotation Example: Rat HPT (P06866) is homologous to serine proteases and contains a match to the peptidase S1 domain. However further sequence analysis by a curator looking at the the Peptidase S1B, active site established it has lost all essential catalytic residues, making it unable to carry out serine protease activity.
  • Curator-Determined IKR Annotation Example, Using PAINT : Curators determined that Drosophila neuroligin protein does not have carboxylesterase activity, based on phylogeny-based evidence. The Panther identifier in the 'with/from' field links out to an evidence record citing annotation data from orthologous gene products, supporting the annotation statement.
  • Paper-Curated IKR Annotation Example: Ross,J., Jiang,H., Kanost,M.R. and Wang,Y. (2003) Serine proteases and their homologs in the Drosophila melanogaster genome: an initial analysis of sequence conservation and phylogenetic relationships. Gene 30;304:117-31 (PMID:12568721). The authors describe the determination of serine protease activity of proteins from the D. melanogaster S1 serine protease gene family, by determining the presence of conserved His, Asp, Ser catalytic triad residues in retrieved sequences. If all three residues were present in the conserved TAAHC, DIAL, and GDSGGP motifs, the sequence was considered to have serine protease activity. Any sequence lacking one of the key residues was identified as an a serine protease homolog, lacking proteolytic activity.
...

2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... P06866 RatHPT NOT GO:0004252serine-type endopeptidase activity GO_REF:0000047 IKR InterPro:IPR000126 ...
... P06866 neuroligin NOT GO:0004091carboxylesterase activity GO_REF:0000033 IKR PANTHER:PTHR11559_AN146 ...
... FB:FBgn0033192 gene S1 NOT GO:0004252serine-type endopeptidase activity PMID:12568721 IKR   ...

Examples where the IKR evidence code should not be used:

  • If there is experimental evidence available from a publication to support a NOT-evidenced annotation. In such instances, the curator should make the IDA, IMP or EXP NOT-qualified annotation based on the experimental evidence. If a paper supplies data that showed the active site was missing and additionally carried out an experimental assay to show lack of activity, it would be correct to create two annotation statements from this paper; both NOT IKR and NOT IDA.
  • CAUTION: Where curators make judgements of functionning using the IKR evidence code, they should be able to draw on some level of expertise regarding the protein family, as there will always be exceptions to the rule. For instance, Q9H4A3 (WNK1_HUMAN) is a good example where nature has confounded prediction; Cys-250 is present instead of the conserved Lys which is expected to be an active site residue. However Lys-233 appears to fulfill the required catalytic function.

IRD: Inferred from Rapid Divergence

Updated May 3, 2011

  • A type of phylogenetic evidence characterized by rapid divergence from ancestral sequence. Annotating with this evidence code implies a NOT annotation.

RCA: inferred from Reviewed Computational Analysis

Updated November 9, 2007

Note: Annotations using the RCA code should be reviewed after one year, any older than this date will be deleted.

  • Predictions based on computational analyses of large-scale experimental data sets
  • Predictions based on computational analyses that integrate datasets of several types, including experimental data (e.g. expression data, protein-protein interaction data, genetic interaction data, etc.), sequence data (e.g. promoter sequence, sequence-based structural predictions, etc.), or mathematical models

The RCA code should be used for annotations made from predictions based on computational analyses of large-scale experimental data sets, or on computational analyses that integrate multiple types of data into the analysis. Acceptable experimental data types include protein-protein interaction data (e.g. two-hybrid results, mass spectroscopic identification of proteins identified by affinity tag purifications, etc.) synthetic genetic interactions, microarray expression results. Sequence-based data based on the sequence of the gene product, including structural predictions based on sequence, may be included provided that the analysis included non-sequence-based data as well. Sequence information related to promotor sequence features may also be included as a data type within these analyses. Predictions based on mathematical modelling which attempts to duplicate existing experimental results are also appropriate for use of this evidence code.

Analyses based purely on comparisons of the gene product sequence, including sequence similarity with experimentally characterized gene products, as determined by pairwise or multiple alignment; prediction methods for non-coding RNA genes; recognized functional domains, as determined by tools such as InterPro, Pfam, SMART, etc. and including the use of files such as interpro2go, pfam2go, smart2go to convert the domain hits to GO terms; predicted protein features, e.g., transmembrane regions, signal sequence, etc.; structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction; or analyses combining multiple types of data based on the gene product sequence should use the ISS evidence code (or the IEA code if it is not reviewed by a curator).

Similarly for experimental data, if the annotation was made purely on the basis of an experimental result, e.g. a protein-protein interaction with a characterized protein, a genetic interaction with a characterized gene, or having a similar microarray expression pattern as a characterized gene, then the appropriate experimental evidence code, IPI, IGI, or IEP, respectively, should be used instead.

Examples where the RCA evidence code should be used:

  • Samanta and Liang, 2003 (PMID:14566057) analyzed all interactions for S. cerevisiae present in the Database of Interacting Proteins (DIP) and made predictions about the roles of genes that were uncharacterized at the time. GO Annotations resulting from this publication include the process term 'rRNA processing' for both UTP30 and NOP6, neither of which was experimentally characterized at the time. A role for NOP6 in the biogenesis of the small ribosomal subunit has subsequently been indicated via a genetic interaction with the experimentally characterized gene EMG1.
  • Troyanskaya et al., 2003 (PMID:12826619) ...

Examples where the RCA evidence code should not be used:

  • Annotations based on more than one type of gene product sequence based evidence, including such things as BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. should use the ISS code.
  • Annotations based on integrated computational analyses, if they have not been reviewed by a curator, should receive the IEA code.

Author Statement Evidence Codes

The Author Statement Evidence Codes are:

TAS: Traceable Author Statement

Updated November 9, 2007

  • Any statement in an article where the original evidence (experimental results, sequence comparison, etc.) is not directly shown, but is referenced in the article and therefore can be traced to another source.

The TAS evidence code covers author statements that are attributed to a cited source. Typically this type of information comes from review articles. Material from the introductions and discussion sections of non-review papers may also be suitable if another reference is cited as the source of experimental work or analysis.

When annotating with this code the curator should use caution and be aware that authors often cite papers dealing with experiments that were performed in organisms different from the one being discussed in the paper at hand. Thus a problem with the TAS code is that it may turn out from following up the references in the paper that no experiments were performed on the gene in the organism actually being characterized in the primary paper. For this reason we recommend (when time and resources allow) that curators track down the cited paper and annotate directly from the experimental paper using the appropriate experimental evidence code. When this is not possible and it is necessary to annotate from reviews, the TAS code is the appropriate code to use for statements that are associated with a cited reference.

Once an annotation has been made to a given term using an experimental evidence code, we recommend removing any annotations made to the same term using the TAS evidence code.

Note that prior to July 2006, it was allowed to use the TAS evidence code for annotations based on information found in a text book or dictionary; as text book material has often become common knowledge (e.g. "everybody" knows that enolase is a glycolytic enzyme). However, at the 2006 GO Annotation Camp, it was concluded that this sort of information is not traceable to its source and is thus not suitable for the TAS evidence code. When annotating on the basis of common knowledge possessed by the curator, consider the IC code. When annotating an author statement that that is not associated with a cited reference, use the NAS code.

Examples where the TAS evidence code should be used:

  • Annotating the twelve S. cerevisiae genes (RPO21, RPB2, RPB3, RPB4, RPB5, RPO26, RPB7, RPB8, RPB9, RPB10, RPC10, and RPB11) that are part of the core complex of RNA polymerase II to the GO term DNA-directed RNA polymerase II, core complex ; GO:00005665 based on a table in Meyer and Young, 1998 (PMID:9774381) listing each of these genes as encoding a subunit of the enzyme and giving one or more references for each subunit.
  • Annotating the human myo9b gene to the GO term Rho GTPase activator activity ; GO:0005100 based on this statement in the introduction of a research article, Post et al., 2002 (PMID:11801597):
    "Biochemical characterization of both bacterially expressed Myr5 and Myr7 tail domains and tissue-purified human Myo9b demonstrate that these myosins IX are active GAPs for Rho but not Rac or CDC 42 (3,4,7)."

Examples where the TAS evidence code should not be used:

  • In Ladd et al., 2001 (PMID:11158314), the authors state:
    "All of the CELF proteins contain multiple potential protein kinase C and casein kinase II phosphorylation sites. All are predicted to have predominantly nuclear localization, and CELF3, CELF4, and CELF5 each possess a consensus nuclear localization signal sequence near the C terminus."
    As this paper provided no reference to support the author's ascertion that CELF3 is located to the nucleus (nor presentation of sequence analyses related to this statement), and the absence of better published data at the time of curation, CELF3 has been annotated to the GO term nucleus with the NAS evidence code and not the TAS evidence code.
    ...

    2.

    DB Object ID

    3.

    DB Object Symbol

    4.

    Qualifier

    5.

    GO ID

    6.

    DB:Reference

    7.

    Evidence Code

    8.

    With/From

    ...
    ...   gene B   GO:0005634 PMID:11158314 IGC operon_geneA_ID|operon_geneC_ID (from operon in annotated organism) ...
    ... UniProt:Q5SZQ8 CELF3_HUMAN   GO:0009102 PMID:15347579 NAS   ...
  • When an annotator makes an annotation based on a combination of another GO annotation and common knowledge. For example, if a curator makes an annotation to the cellular component term nucleus on the basis that the gene product is already annotated to the molecular function term general RNA polymerase II transcription factor activity and the common knowledge that transcription factors interacting with RNA polymerase II act in the nucleus, then the IC evidence code should be used with the GO ID for the GO term from which the annotation was derived in the with/from field and the same reference should be cited as was used for the annotation to the term whose GO ID is placed in the with/from field.

NAS: Non-traceable Author Statement

Updated November 9, 2007

  • Database entries that don't cite a paper (e.g. UniProt Knowledgebase records, YPD protein reports)
  • Statements in papers (abstract, introduction, or discussion) that a curator cannot trace to another publication

The NAS evidence code should be used in all cases where the author makes a statement that a curator wants to capture but for which there are neither results presented nor a specific reference cited in the source used to make the annotation. The source of the information may be peer reviewed papers, textbooks, or database records. For some annotations using the NAS code, there will not be an entry in the with/from field.

The NAS code is also used for making annotations from database entries when a curator reviews the annotations that result. Typically such annotations will refer to an unpublished reference describing what was done, either a reference with a GO_REF id or an internal reference from the specific annotating database.

Cases where the NAS code should be used:

  • In Ladd et al., 2001 (PMID:11158314), the authors state that:
    "All of the CELF proteins contain multiple potential protein kinase C and casein kinase II phosphorylation sites. All are predicted to have predominantly nuclear localization, and CELF3, CELF4, and CELF5 each possess a consensus nuclear localization signal sequence near the C terminus."
    As this paper provided no reference to support the author's ascertion that CELF3 is located to the nucleus (nor presentation of sequence analyses related to this statement), and the absence of better published data at the time of curation, CELF3 has been annotated to the GO term nucleus with the NAS evidence code.
    ...

    2.

    DB Object ID

    3.

    DB Object Symbol

    4.

    Qualifier

    5.

    GO ID

    6.

    DB:Reference

    7.

    Evidence Code

    8.

    With/From

    ...
    ... UniProt:Q5SZQ8 CELF3_HUMAN   GO:0009102 PMID:11158314 NAS   ...

Cases where the NAS code should not be used:

  • When an author makes a statement that is attributed to a source cited in the reference list, use the TAS evidence code.
  • When an annotator makes an annotation based on a combination of another GO annotation and common knowledge. For example, if a curator makes an annotation to the cellular component term nucleus on the basis that the gene product is already annotated to the molecular function term general RNA polymerase II transcription factor activity and the common knowledge that transcription factors interacting with RNA polymerase II act in the nucleus, then the IC evidence code should be used with the GO ID for the GO term from which the annotation was derived in the with/from field and the same reference should be cited as was used for the annotation to the term whose GO ID is placed in the with/from field.

Curatorial Statement Evidence Codes

The Curatorial Statement Evidence Codes are:

IC: Inferred by Curator

Updated September 22, 2011 

The IC evidence code is to be used for those cases where an annotation is not supported by any direct evidence, but can be reasonably inferred by a curator from other GO annotations, for which evidence is available.

An example would be when there is evidence (be it direct assay, sequence similarity or even from electronic annotation) that a particular gene product has the function RNA polymerase II transcription factor activity ; GO:0003702. There is no direct evidence showing that this gene product is located in the nucleus, but this would be a perfectly reasonable inference for a curator to make since the curator is annotating a eukaryotic gene product that is associated with a specific nuclear RNA polymerase. This inference will be linked to the annotation to the term RNA polymerase II transcription factor activity ; GO:0003702 in two ways: both annotations will share the same reference; and the annotation inferred by a curator will include one or more with/from statements pointing to the GO term(s) used by the curator for the inference.

In many cases a GO term can be inferred from just one other annotation as described above. Occasionally, there are cases where a curator has to infer the GO term based on evidence from multiple sources of evidence/GO annotations. The 'with/from' field in these annotations will therefore supply >1 GO identifier, obtained from the set of supporting GO annotations assigned to the same gene/gene product identifier which cite publicly-available references. In addition, such IC-annotations will use reference GO_REF:0000036.

Usage of the With/From Column for IC

Note that the with/from field must always be filled in with a GO ID when using this evidence code.

For example, Noel et al., 1998 (PMID:9651335) provides evidence that the protein encoded by the S. cerevisiae UGA3 gene has the function "specific RNA polymerase II transcription factor activity" ; GO:0003704. From this, the curator deduces it is located in the nucleus and thus makes an annotation to the cellular component term "nucleus" ; GO:0005634 with the GO ID for the function term in the with/from for the component annotation.

The second example shown below illustrates the use of IC with GO_REF:0000036. In this case, a curator has inferred an annotation for the CUP9 gene to the GO Term "RNA polymerase II transcription factor activity, sequence-specific transcription regulatory region DNA binding"; GO:0001133 based on evidence from PMID:9427760 that CUP9 is involved in "RNA polymerase II core promoter proximal region sequence-specific DNA binding" (GO:0000978), as well as evidence from PMID:18708352 that CUP9 is involved in "negative regulation of transcription from RNA polymerase" (GO:0000122). The with/from column supplies the GO IDs derived from these two publications separated by comma symbols (meaning AND) because both of these GO terms are required to support the inferred annotation to GO:0001133. If either of the GO terms could support the inference, they should be separated with a pipe (meaning OR).

...

2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... SGDID:S000002329 UGA3   GO:0003704 PMID:9651335 IPI   ...
... SGDID:S000002329 UGA3   GO:0005634 PMID:9651335 IC GO:0003704 ...

...

2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... SGDID:S000006098 CUP9   GO:0000122 PMID:18708352 IMP   ...
... SGDID:S000006098 CUP9   GO:0000978 PMID:9427760 IDA   ...
... SGDID:S000006098 CUP9   GO:0001133 GO_REF:0000036 IC GO:0000122,GO:0000978 ...

Where;

  • GO:0003704 specific RNA polymerase II transcription factor activity
  • GO:0005634 nucleus
  • GO:0000122 negative regulation of transcription from RNA polymerase II promoter
  • GO:0000978 RNA polymerase II core promoter proximal region sequence-specific DNA binding
  • GO:0001133 RNA polymerase II transcription factor activity, sequence-specific transcription regulatory region DNA binding
  • ND: No Biological Data Available

    Updated November 9, 2007

    Used for annotations when information about the molecular function, biological process, or cellular component of the gene or gene product being annotated is not available.

    Use of the ND evidence code indicates that the annotator at the contributing database found no information that allowed making an annotation to any term indicating specific knowledge from the ontology in question (molecular function, biological process, or cellular component) as of the date indicated. This code should be used only for annotations to the root terms, molecular function ; GO:0003674, biological process ; GO:0008150, or cellular component ; GO:0005575, which, when used in annotations, indicate that no knowledge is available about a gene product in that aspect of GO.

    Annotations made with the ND evidence code should be accompanied by a reference that explains that curators looked but found no information. Note that some groups check only published literature while other groups also make sequence comparisons to see if an annotation can be made on the basis of a sequence comparison. The GO Reference collection includes a reference that can be used with ND when both literature and sequence have been checked; to use it, put "GO_REF:0000015" in the reference column of a gene association file.

    Note that use of the ND evidence code with an annotation to one of the root nodes to indicate lack of knowledge in that aspect makes a statement about the lack of knowledge only with respect to that particular aspect of the ontology. Use of the ND evidence code to indicate lack of knowledge in one particular aspect does not make any statement about the availability of knowledge or evidence in the other GO aspects.

    Even if an author states in a paper that there is no data available or nothing is known about the gene product in a particular GO aspect, annotation to the corresponding root node should be made with ND evidence code citing either the annotating group's internal reference or the GOC's reference on use of the ND evidence code, not a specific paper.

    Note: The ND evidence code, unlike other evidence codes, should be considered as a code that indicates curation status/progress than as method used to derive an annotation.

    When a gene product is annotated to a GO term using the NOT qualifier, this is a statement that it is not appropriate to associate that specific GO term with that particular gene product. However, such a negative annotation does not make any positive statements about the role of that gene product. Thus, there should always be a positive annotation, in addition to the NOT annotation. If nothing is known about the role of the gene product in a given aspect (molecular function, biological process, or cellular component) of GO, then the positive annotation should be made to the root node for that aspect using the ND evidence code.

    Automatically-assigned Evidence Codes

    The Automatically-assigned Evidence Code is:

    IEA: Inferred from Electronic Annotation

    Note: Annotations using the IEA code should be reviewed after one year, any older than this date will be deleted.

    • Annotations based on "matches" in sequence similarity comparisons if they have not been reviewed by a curator
    • Annotations transferred from database records, if not reviewed by a curator
    • Annotations made on the basis of keyword mapping files, if not reviewed by a curator
    • If annotations based on sequence similarity based methods have been reviewed by a curator, use ISS instead and change the reference from the one that describes the computational analysis to one that says that the curator reviewed the sequence similarity and approved it.

    Used for annotations that depend directly on computation or automated transfer of annotations from a database, particularly when the analysis is performed internally and not published. A key feature that distinguishes this evidence code from others is that it is not made by a curator; use IEA when no curator has checked the specific annotation to verify its accuracy. The actual method used (BLAST search, Swiss-Prot keyword mapping, etc.) doesn't matter.

    When the method used to make annotations using the IEA code is performed internally by the annotating group and is not published, a short description of the method should be written and added to the GO Consortium's collection of GO references, where it will be given a GO_REF ID which can be used to cite the reference in gene association files.

    Examples where the IEA evidence code should be used:

    • Annotations based on "matches" in sequence similarity comparisons if they have not been reviewed by a curator. If annotations based on sequence similarity based methods have been reviewed by a curator, use ISS instead.
    • Annotations transferred from database records, if not reviewed by a curator. If such annotations are reviewed by a curator and the database record has no linked publication, consider the NAS code.
    • Annotations made on the basis of keyword mapping files, if not reviewed by a curator

    Examples where the IEA evidence code should not be used:

    • Annotations based on "matches" in sequence similarity comparisons and which have been reviewed by a curator should be made with ISS code.
    • Annotations transferred from database records, where the annotation is reviewed by a curator should not receive the IEA code. If the source is not traceable and the annotation is worth making, NAS should be used.

    Usage of the With/From Column for IEA

    At the January 2007 GOC meeting, it was agreed that it will be required to make an entry in the with/from column for all annotations made after May 1, 2007 when using this evidence code to indicate what individual sequences, sequence objects, methods, keyword mapping files, etc. are the basis of the annotation. When multiple entries are placed in the with/from field, they are separated by pipes.

    ...

    2.

    DB Object ID

    3.

    DB Object Symbol

    4.

    Qualifier

    5.

    GO ID

    6.

    DB:Reference

    7.

    Evidence Code

    8.

    With/From

    ...
    ... UniProt:A0A7W6 A0A7W6_9PARI   GO:0006118 GOA:interpro|GO_REF:0000002 IEA InterPro:IPR005797 ...
    ... UniProt:A0A7W4 A0A7W6_9PARI   GO:0006118 GOA:spkw|GO_REF:0000004 IEA SP_KW:KW-0496 ...
    ... UniProt:A0K8M1 A0K8M1_BURCH   GO:0004830 GOA:spec|GO_REF:0000003 IEA EC:6.1.1.2 ...
    ... UniProt:A0KAB8 Y2695_BURCH   GO:0008237 GOA:hamap|GO_REF:0000020 IEA HAMAP:MF_00009 ...
    ... UniProt:O77797 AKAP3_BOVIN   GO:0009434 GOA:compara|GO_REF:0000019 IEA Ensembl:ENSMUSP00000093091 ...

    Obsolete Evidence Codes

    The Obsolete Evidence Code is

    NR: Not Recorded

    Updated November 9, 2007

    Used for annotations done before curators began tracking evidence types (appears in some legacy annotations). It may not be used for new annotations.

    With/From Column Usage

    We are aware that there has been some variability in usage of the with/from column. Some groups have used an annotation in combination with the IDs in the with/from field in the same line to indicate specific interactions that occur in pairwise or other specific combinations, while others have used the with/from field to indicate all interactions with that gene that are described in a paper, without any indication as to whether they occur at the same time or not. This issue has been placed on the agenda of the next GO Consortium meeting for resolution.