Current Annotations

Annotation Details and Downloads

The gene association files submitted by GO Consortium members are shown in the tables below. Files are in the GO annotation file format and are compressed using the UNIX gzip utility. Please see the appropriate README file for further details on the annotation set. Any errors or omissions in annotations should be reported by writing to the GO helpdesk.

Ontology and annotation data is integrated in the mySQL and XML files. See the GO database guide for more information.

These files can also be downloaded via FTP; we recommend this method for the larger files, such as the UniProt dataset, as the web-based download may not work correctly.

Filtered Files

These files are taxon-specific and reflect the work of specific projects, primarily the model organisms database groups, to provide comprehensive, non-redundant annotation files for their organism. All the files in this table have been filtered using the annotation file QC checks script. A major component to the filtering is the requirement that particular taxon IDs can only be included within the association files provided by specific projects; please see the list of the authoritative groups for the major model organisms.

Statistics as of February 2, 2012

Filtered Annotation File Downloads
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download filtered files
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download filtered files
Agrobacterium tumefaciensstr. C58
PAMGO
82 248
(248 non-IEA)
12/31/2010
Arabidopsis thaliana
TAIR
29560 147241
(115042 non-IEA)
1/31/2012
Aspergillus nidulans
AspGD
43074 167654
(67388 non-IEA)
1/29/2012
Comprehensive Microbial Resource [multispecies]
JCVI
61691 154709
(154709 non-IEA)
10/28/2011
Bos taurus
GO Annotations @ EBI
21441 129733
(10386 non-IEA)
1/10/2012
Caenorhabditis elegans
WormBase
16913 107736
(58842 non-IEA)
11/25/2011
Candida albicans
CGD
11815 65395
(20580 non-IEA)
1/29/2012
Canis lupus familiaris
GO Annotations @ EBI
17580 67594
(3036 non-IEA)
1/10/2012
Danio rerio
ZFIN
15753 121569
(29189 non-IEA)
2/1/2012
Dickeya dadantii
PAMGO
124 296
(296 non-IEA)
5/20/2011
Dictyostelium discoideum
dictyBase
5436 20769
(20769 non-IEA)
1/29/2012
Drosophila melanogaster
FlyBase
13190 82762
(68278 non-IEA)
1/24/2012
Escherichia coli
EcoCyc & EcoliHub
2200 9083
(9083 non-IEA)
1/13/2012
Gallus gallus
GO Annotations @ EBI
15884 106795
(5498 non-IEA)
1/23/2012
Homo sapiens
GO Annotations @ EBI
41576 357891
(161014 non-IEA)
1/23/2012
Leishmania major
Sanger GeneDB
10 27
(27 non-IEA)
8/12/2011
Magnaporthe grisea
PAMGO
11274 27628
(27628 non-IEA)
10/28/2011
Mus musculus
MGI
25112 272232
(178853 non-IEA)
1/25/2012
Oomycetes
PAMGO
30 126
(126 non-IEA)
5/31/2010
Oryza sativa
Gramene
41142 49322
(49322 non-IEA)
10/28/2011
Protein Data Bank [multispecies]
GO Annotations @ EBI
100870 930960
(34601 non-IEA)
1/10/2012
Plasmodium falciparum
Sanger GeneDB
2198 4606
(4606 non-IEA)
5/20/2011
Pseudomonas aeruginosa PAO1
PseudoCAP
1519 6902
(6902 non-IEA)
10/28/2011
Rattus norvegicus
RGD
25030 260058
(142815 non-IEA)
1/28/2012
Reactome [multispecies]
CSHL & EBI
534 13863
(13863 non-IEA)
12/27/2011
Saccharomyces cerevisiae
SGD
Stanford University
6359 87875
(44775 non-IEA)
1/28/2012
Schizosaccharomyces pombe
PomBase
University of Cambridge, UK
5291 36865
(33350 non-IEA)
11/29/2011
Solanaceae
SGN
148 269
(269 non-IEA)
5/20/2011
Sus scrofa
GO Annotations @ EBI
14974 93392
(3519 non-IEA)
1/10/2012
Trypanosoma brucei
Sanger GeneDB
2977 10299
(10299 non-IEA)
10/28/2011
UniProt [multispecies]
IEA annotations have been removed
GO Annotations @ EBI
19438 87327
(87327 non-IEA)
1/11/2012

Unfiltered Files

These files have not been filtered with the annotation file QC checks script. The most important difference between these files and the filtered files above is that gene products from certain taxa are not stripped out of the file; they may also contain annotations to obsolete terms or outdated IEA annotations. Please see the annotation file QC script documentation for full details of the checks performed.

Please note that if you use unfiltered files in conjunction with filtered files, there may be duplicated annotations.

Statistics as of February 2, 2012

Unfiltered Annotation File Downloads
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download unfiltered files
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download unfiltered files
Protein Data Bank [multispecies]
GO Annotations @ EBI
171725 2223213
(537167 non-IEA)
1/10/2012
Reactome [multispecies]
CSHL & EBI
11451 88230
(88230 non-IEA)
12/27/2011
UniProt [multispecies]
GO Annotations @ EBI
12600598 108246461
(923092 non-IEA)
1/10/2012
Arabidopsis thaliana
GO Annotations @ EBI
27230 130585
(57434 non-IEA)
1/10/2012
Danio rerio
GO Annotations @ EBI
27595 101736
(13466 non-IEA)
1/10/2012
Mus musculus
GO Annotations @ EBI
43944 383757
(228106 non-IEA)
1/10/2012
Rattus norvegicus
GO Annotations @ EBI
27373 198376
(54144 non-IEA)
1/10/2012
Canis lupus familiaris
GO Annotations @ EBI
17580 67594
(3036 non-IEA)
1/10/2012
Sus scrofa
GO Annotations @ EBI
14974 93393
(3520 non-IEA)
1/10/2012

In the tables above gene association counts are provided for all evidence codes and separately for everything except IEA, Inferred from Electronic Annotation. The IEA code means there has been no human involvement in the assignment of the association; see the GO evidence code documentation for more details.

Back to top

gp2protein files

The gp2protein directory contains files that map between model organism database object IDs and UniProt accessions.

Back to top