Everybody

File Format Guide

GO Formats

The GO File Format Guide documents the structure and syntax of the files available on the GO website, to assist users who need to read, write parsers for, or create these files. The following file formats are documented separately:

Annotation

Annotation is the process of assigning GO terms to gene products. The annotation data in the GO database is contributed by members of the GO Consortium, and the Consortium is continuously encouraging new groups to start contributing their annotations. The list of links below offer details on the GO annotation policies and the annotation process, as well as direct users to other pages of interest on GO annotation conventions, the standard operating procedures used by some consortium members, and the GO annotation file format guide.

Ontology Documentation

The Gene Ontology defines the universe of concepts relating to gene functions (‘GO terms’), and how these functions are related to each other (‘relations’). It is constantly revised and expanded as biological knowledge accumulates.

About

The Gene Ontology Project

Introduction to the GO resource

The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. The structured knowledge in the ontology is a crucial part of the global biomedical informatics infrastructure.

Guide to GO Evidence Codes

A GO annotation consists of a GO term associated with a specific reference that describes the work or analysis upon which the association between a specific GO term and gene product is based. Each annotation must also include an evidence code to indicate how the annotation to a particular term is supported. Although evidence codes do reflect the type of work or analysis described in the cited reference which supports the GO term to gene product association, they are not necessarily a classification of types of experiments/analyses.

ID Mapping Files

ID Mapping Files

This page documents the file formats used to store the mapping between the Database object IDs to corresponding sequence IDs in UniProtKB or NCBI.
  • gp2protein file
  • gp2rna file
  • gp_unlocalized

gp2protein file

A gp2protein file is a tab-delimited file that provides a mapping between database object IDs and protein sequence IDs. gp2protein files contributed by annotation groups are available for download.

Need for gp2protein file

    Current Annotations

    Current Annotations
    • Annotation Details and Downloads
    • Filtered files
    • Unfiltered files
    • gp2protein files

    Annotation Details and Downloads

    The gene association files submitted by GO Consortium members are shown in the tables below. Files are in the GO annotation file format and are compressed using the UNIX gzip utility. Please see the appropriate README file for further details on the annotation set. Any errors or omissions in annotations should be reported by writing to the GO helpdesk.

    Use and License

    Gene Ontology Consortium data and data products are licensed under the Creative Commons Attribution 4.0 Unported License. A human-readable version and explanation is available at the Creative Commons website. For information about how to properly credit data use, please review the Creative Commons FAQ or contact the GO Helpdesk.