Current Annotations

Annotation Details and Downloads

The gene association files submitted by GO Consortium members are shown in the tables below. Files are in the GO annotation file format and are compressed using the UNIX gzip utility. Please see the appropriate README file for further details on the annotation set. Any errors or omissions in annotations should be reported by writing to the GO helpdesk.

Ontology and annotation data is integrated in the mySQL and XML files. See the GO database guide for more information.

These files can also be downloaded via FTP; we recommend this method for the larger files, such as the UniProt dataset, as the web-based download may not work correctly.

Filtered Files

These files are taxon-specific and reflect the work of specific projects, primarily the model organisms database groups, to provide comprehensive, non-redundant annotation files for their organism. All the files in this table have been filtered using the annotation file QC checks script. A major component to the filtering is the requirement that particular taxon IDs can only be included within the association files provided by specific projects; please see the list of the authoritative groups for the major model organisms.

Statistics as of June 16, 2013

Filtered Annotation File Downloads
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download filtered files
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download filtered files
Agrobacterium tumefaciensstr. C58
PAMGO
82 248
(248 non-IEA)
9/24/2012
Arabidopsis thaliana
TAIR
30329 180388
(180285 non-IEA)
6/14/2013
Aspergillus nidulans
AspGD
47607 206593
(88514 non-IEA)
6/16/2013
Comprehensive Microbial Resource [multispecies]
JCVI
61682 154257
(154257 non-IEA)
6/14/2013
Bos taurus
GO Annotations @ EBI
19926 121385
(13637 non-IEA)
6/14/2013
Caenorhabditis elegans
WormBase
16715 103843
(60738 non-IEA)
6/6/2013
Candida albicans
CGD
22780 156503
(27195 non-IEA)
6/16/2013
Canis lupus familiaris
GO Annotations @ EBI
19013 81967
(4224 non-IEA)
6/14/2013
Danio rerio
ZFIN
18436 135672
(31312 non-IEA)
6/14/2013
Dickeya dadantii
PAMGO
124 296
(296 non-IEA)
9/24/2012
Dictyostelium discoideum
dictyBase
8226 57799
(19955 non-IEA)
6/2/2013
Drosophila melanogaster
FlyBase
13748 91728
(78187 non-IEA)
6/14/2013
Escherichia coli
EcoCyc & EcoliHub
3745 34982
(10709 non-IEA)
6/14/2013
Gallus gallus
GO Annotations @ EBI
11570 62658
(6335 non-IEA)
6/14/2013
Homo sapiens
GO Annotations @ EBI
45240 365386
(194177 non-IEA)
6/14/2013
Leishmania major
Sanger GeneDB
352 892
(892 non-IEA)
6/14/2013
Magnaporthe grisea
PAMGO
11274 27618
(27618 non-IEA)
2/15/2013
Mus musculus
MGI
25486 294676
(198929 non-IEA)
6/14/2013
Oomycetes
PAMGO
30 126
(126 non-IEA)
9/24/2012
Oryza sativa
Gramene
41142 49296
(49296 non-IEA)
10/12/2012
Protein Data Bank [multispecies]
GO Annotations @ EBI
120985 1210159
(52497 non-IEA)
6/14/2013
Plasmodium falciparum
Sanger GeneDB
2182 4546
(4546 non-IEA)
2/15/2013
Pseudomonas aeruginosa PAO1
PseudoCAP
1519 6884
(6884 non-IEA)
6/14/2013
Rattus norvegicus
RGD
22642 271638
(167175 non-IEA)
6/15/2013
Reactome [multispecies]
CSHL & EBI
402 2148
(2148 non-IEA)
4/8/2013
Saccharomyces cerevisiae
SGD
Stanford University
6381 91470
(46787 non-IEA)
6/15/2013
Schizosaccharomyces pombe
PomBase
University of Cambridge, UK
5456 39587
(34242 non-IEA)
6/6/2013
Solanaceae
SGN
309 562
(562 non-IEA)
9/24/2012
Sus scrofa
GO Annotations @ EBI
19385 99303
(4504 non-IEA)
6/14/2013
Trypanosoma brucei
Sanger GeneDB
2096 3511
(3511 non-IEA)
6/14/2013
UniProt [multispecies]
IEA annotations have been removed
GO Annotations @ EBI
67149 216675
(216675 non-IEA)
5/30/2013
gonuts.gz 194 285
(285 non-IEA)
2/1/2013

Unfiltered Files

These files have not been filtered with the annotation file QC checks script. The most important difference between these files and the filtered files above is that gene products from certain taxa are not stripped out of the file; they may also contain annotations to obsolete terms or outdated IEA annotations. Please see the annotation file QC script documentation for full details of the checks performed.

Please note that if you use unfiltered files in conjunction with filtered files, there may be duplicated annotations.

Statistics as of June 16, 2013

Unfiltered Annotation File Downloads
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download unfiltered files
Species, Database Gene Products Annotated Annotations Submission date MM/DD/YYYY Download unfiltered files
Protein Data Bank [multispecies]
GO Annotations @ EBI
190579 2893664
(821753 non-IEA)
5/30/2013
Reactome [multispecies]
CSHL & EBI
13789 98638
(98638 non-IEA)
4/8/2013
UniProt [multispecies]
GO Annotations @ EBI
23367340 155335562
(1289308 non-IEA)
5/30/2013
Arabidopsis thaliana
GO Annotations @ EBI
27681 152893
(101135 non-IEA)
5/30/2013
Danio rerio
GO Annotations @ EBI
28777 88688
(20068 non-IEA)
5/30/2013
Mus musculus
GO Annotations @ EBI
40988 355794
(245612 non-IEA)
5/30/2013
Rattus norvegicus
GO Annotations @ EBI
22075 162590
(64826 non-IEA)
5/30/2013
Canis lupus familiaris
GO Annotations @ EBI
19013 81997
(4231 non-IEA)
5/30/2013
Sus scrofa
GO Annotations @ EBI
19385 99345
(4508 non-IEA)
5/30/2013

In the tables above gene association counts are provided for all evidence codes and separately for everything except IEA, Inferred from Electronic Annotation. The IEA code means there has been no human involvement in the assignment of the association; see the GO evidence code documentation for more details.

Back to top

gp2protein files

The gp2protein directory contains files that map between model organism database object IDs and UniProt accessions.

Back to top