Gene objects in model organism databases typically have multiple nucleotide sequences from the public databases associated with them, including expressed sequence tags (ESTs) and one or more protein sequences. There are two ways to obtain sets of sequences with GO annotations:
- from the model organism databases
- from the annotation sets for transcripts and proteins contributed to the GO by Compugen and UniProt
Obtaining GO annotations for model organism sequence sets
In the gene association files, the GO terms are associated with an accession ID for a gene or gene product from the contributing data resource. Usually, the association files of the gene to sequence IDs are also available from the contributing model organism database. For example, the Mouse Genome Informatics FTP site includes the gene association files contributed to the GO, and other reports that include official mouse gene symbols and names and all curated gene : sequence ID associations.
Obtaining GO annotations for transcript and proteins in general
Large transcript and protein sequence data sets are annotated to the GO by Compugen and UniProt, respectively. These files can be downloaded direct from the GO web site. Species of origin for the sequence is included in the association files.