Current Annotations

Annotation Details and Downloads

The gene association files submitted by GO Consortium members are shown in the tables below. Files are in the GO annotation file format and are compressed using the UNIX gzip utility. Please see the appropriate README file for further details on the annotation set. Any errors or omissions in annotations should be reported by writing to the GO helpdesk.

Ontology and annotation data is integrated in the mySQL and XML files. See the GO database guide for more information.

These files can also be downloaded via FTP; we recommend this method for the larger files, such as the UniProt dataset, as the web-based download may not work correctly.

Filtered Files

These files are taxon-specific and reflect the work of specific projects, primarily the model organisms database groups, to provide comprehensive, non-redundant annotation files for their organism. All the files in this table have been filtered using the annotation file QC checks script. A major component to the filtering is the requirement that particular taxon IDs can only be included within the association files provided by specific projects; please see the list of the authoritative groups for the major model organisms.

Statistics as of May 20, 2013

Filtered Annotation File Downloads
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download filtered files
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download filtered files
Agrobacterium tumefaciensstr. C58
PAMGO
82 248
(248 non-IEA)
9/24/2012
Arabidopsis thaliana
TAIR
30329 180470
(180286 non-IEA)
5/14/2013
Aspergillus nidulans
AspGD
47607 187595
(99167 non-IEA)
5/19/2013
Comprehensive Microbial Resource [multispecies]
JCVI
61686 154444
(154444 non-IEA)
2/1/2013
Bos taurus
GO Annotations @ EBI
19928 121350
(13593 non-IEA)
5/10/2013
Caenorhabditis elegans
WormBase
17659 114497
(65175 non-IEA)
5/10/2013
Candida albicans
CGD
22007 126829
(27154 non-IEA)
5/19/2013
Canis lupus familiaris
GO Annotations @ EBI
19008 81807
(4213 non-IEA)
5/10/2013
Danio rerio
ZFIN
18447 135802
(31489 non-IEA)
5/20/2013
Dickeya dadantii
PAMGO
124 296
(296 non-IEA)
9/24/2012
Dictyostelium discoideum
dictyBase
8244 57717
(19868 non-IEA)
5/10/2013
Drosophila melanogaster
FlyBase
13749 91751
(78210 non-IEA)
5/10/2013
Escherichia coli
EcoCyc & EcoliHub
3748 35034
(10734 non-IEA)
4/26/2013
Gallus gallus
GO Annotations @ EBI
16430 85431
(6494 non-IEA)
5/13/2013
Homo sapiens
GO Annotations @ EBI
46145 368965
(194790 non-IEA)
5/13/2013
Leishmania major
Sanger GeneDB
354 897
(897 non-IEA)
2/8/2013
Magnaporthe grisea
PAMGO
11274 27618
(27618 non-IEA)
2/15/2013
Mus musculus
MGI
25500 294857
(198718 non-IEA)
5/10/2013
Oomycetes
PAMGO
30 126
(126 non-IEA)
9/24/2012
Oryza sativa
Gramene
41142 49296
(49296 non-IEA)
10/12/2012
Protein Data Bank [multispecies]
GO Annotations @ EBI
119940 1196998
(51779 non-IEA)
4/30/2013
Plasmodium falciparum
Sanger GeneDB
2182 4546
(4546 non-IEA)
2/15/2013
Pseudomonas aeruginosa PAO1
PseudoCAP
1519 6896
(6896 non-IEA)
9/24/2012
Rattus norvegicus
RGD
22537 270011
(166350 non-IEA)
5/18/2013
Reactome [multispecies]
CSHL & EBI
402 2148
(2148 non-IEA)
4/8/2013
Saccharomyces cerevisiae
SGD
Stanford University
6381 91087
(46404 non-IEA)
5/18/2013
Schizosaccharomyces pombe
PomBase
University of Cambridge, UK
5456 39584
(34239 non-IEA)
5/3/2013
Solanaceae
SGN
309 562
(562 non-IEA)
9/24/2012
Sus scrofa
GO Annotations @ EBI
19389 99337
(4504 non-IEA)
5/10/2013
Trypanosoma brucei
Sanger GeneDB
2098 3519
(3519 non-IEA)
3/29/2013
UniProt [multispecies]
IEA annotations have been removed
GO Annotations @ EBI
67021 216161
(216161 non-IEA)
4/30/2013
gonuts.gz 194 285
(285 non-IEA)
2/1/2013

Unfiltered Files

These files have not been filtered with the annotation file QC checks script. The most important difference between these files and the filtered files above is that gene products from certain taxa are not stripped out of the file; they may also contain annotations to obsolete terms or outdated IEA annotations. Please see the annotation file QC script documentation for full details of the checks performed.

Please note that if you use unfiltered files in conjunction with filtered files, there may be duplicated annotations.

Statistics as of May 20, 2013

Unfiltered Annotation File Downloads
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download unfiltered files
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download unfiltered files
Protein Data Bank [multispecies]
GO Annotations @ EBI
188520 2849784
(802482 non-IEA)
4/30/2013
Reactome [multispecies]
CSHL & EBI
13789 98638
(98638 non-IEA)
4/8/2013
UniProt [multispecies]
GO Annotations @ EBI
22429149 145864274
(1284090 non-IEA)
4/30/2013
Arabidopsis thaliana
GO Annotations @ EBI
27650 152616
(101010 non-IEA)
4/30/2013
Danio rerio
GO Annotations @ EBI
28757 88404
(19949 non-IEA)
4/30/2013
Mus musculus
GO Annotations @ EBI
40792 353658
(243463 non-IEA)
4/30/2013
Rattus norvegicus
GO Annotations @ EBI
22055 161786
(64666 non-IEA)
4/30/2013
Canis lupus familiaris
GO Annotations @ EBI
19008 81818
(4214 non-IEA)
4/30/2013
Sus scrofa
GO Annotations @ EBI
19389 99356
(4505 non-IEA)
4/30/2013

In the tables above gene association counts are provided for all evidence codes and separately for everything except IEA, Inferred from Electronic Annotation. The IEA code means there has been no human involvement in the assignment of the association; see the GO evidence code documentation for more details.

Back to top

gp2protein files

The gp2protein directory contains files that map between model organism database object IDs and UniProt accessions.

Back to top