Annotation related questions (e.g. evidence codes, ID mapping...).

Can a gene or gene product be annotated to more than one term from an ontology?

Yes, a gene product can be annotated to zero or more nodes of each ontology, at any level within each ontology.

See the GO annotation guide for more information.

What is an evidence code?

Every annotation must be attributed to a source, which may be a literature reference, another database or a computational analysis. The annotation must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term. A simple controlled vocabulary is used to record evidence; and the evidence codes are simply the three-letter codes used to signify the type of evidence cited. More information on the meaning and use of the evidence codes can be found in the GO evidence codes documentation.

What criteria are used to annotate genes with GO terms?

A variety of criteria are used for each annotation including experimental results, sequence similarity and curator judgement.

See the GO annotation guide for more information.

How are gene products associated with GO terms?

A gene product can be annotated to zero or more nodes of each ontology, at any level within each ontology; annotation of a gene product to one ontology is independent of its annotation to other ontologies. Annotations should reflect the normal function, process, or localization (component) of the gene product; an activity or location observed only in a mutant or disease state is therefore not usually included.

What is a 'gene product'?

GO uses the term 'gene product' to refer collectively to genes and any entities encoded by the gene, e.g. proteins and functional RNAs.

What is annotation?

What does it mean to do GO annotation of genes or proteins?

How is the GO used in genome analysis?

Functional annotation of newly sequenced genomes:Genome and full-length cDNA sequence projects often include computational (putative) assignments of molecular function based on sequence similarity to annotated genes or sequences. A common tactic now is to use a computational approach to establish some threshold sequence similarity to a SWISS-PROT sequence. Then the GO associations to the SWISS-PROT sequence can be retrieved and associated with the gene model. Under the GO guidelines, the evidence code for this event would be 'inferred from electronic annotation' (IEA).

Where can I find GO annotations of proteins and ESTs?

Gene objects in model organism databases typically have multiple nucleotide sequences from the public databases associated with them, including expressed sequence tags (ESTs) and one or more protein sequences. There are two ways to obtain sets of sequences with GO annotations:

  • from the model organism databases
  • from the annotation sets for transcripts and proteins contributed to the GO by Compugen and UniProt

What gene or protein IDs should I use?

The list of authoritative database groups for certain species lists the database groups who assume sole responsibility for collecting and submitting annotations for one or more species. If you can convert your IDs into the IDs used by that database group, you will be able to find the data you are looking for far more quickly and efficiently.

We maintain a list of suggested resources for mapping gene and protein IDs.

What is the best way to obtain the GO annotations for a list of UniProt Accession Numbers in batch?

With UniProt accession numbers, you can obtain all GO annotations by parsing a GOA gene association file, which are provided in a simple tab-delimited format. These files are available from the GOA FTP site.