the Gene Ontology

  • Open menus
  • Home
  • FAQ
  • Downloads
  • Ontologies
  • Annotations
  • Database
  • Mappings to GO
  • Teaching Resources
  • Other files
  • FTP and CVS downloads
  • Tools
  • Browsers
  • Microarray tools
  • Annotation tools
  • Other tools
  • Submit New Tools
  • Documentation
  • Introduction
  • Annotation Guide
  • Evidence Code Guide
  • Component Ontology
  • Function Ontology
  • Process Ontology
  • File Format Guide
  • GO Database Guide
  • GO Slim Guide
  • Meeting minutes
  • Editorial Style Guide
  • About GO
  • GO Consortium
  • Publications
  • Citation Policy
  • Mailing lists
  • Interest Groups
  • GO People
  • Funding
  • Acknowledgements
  • Newsletter
  • Projects
  • Cardiovascular
  • Immunology
  • Reference Genomes
  • Contact GO
  • Site Map

File Format Guide

The GO File Format Guide documents the structure and syntax of the GO files available on the GO website, to assist users who need to read, write parsers for, or create these files.

See also the GO annotation file format guide for the format used in the gene association files.

  • Anatomy of a GO Term
  • Ontology Flat File Formats
  • GO RDF-XML Format
  • OWL Format
  • OBO-XML Format
  • MySQL Format
  • FASTA Format
  • Mappings to Other Classification Systems

Anatomy of a GO Term

Terms and unique identifiers

The structure of a GO term is very simple. At its bare minimum, each GO entry consists of a term name (e.g. cell) and a unique, zero-padded seven-digit identifier (or accession number) prefixed by GO: (e.g. GO:0005623), which is used as a unique idenfier and database cross-reference. The same number range is used across all three ontologies. The numeric portion of a GO ID does not have any 'meaning' or relation to the position of the term in the ontologies; instead, ranges of GO IDs are assigned to specific groups or individual curators, so a GO ID can be used to trace who added a term.

Secondary IDs

Terms may have one or more secondary IDs, alternate IDs that refer to the term. Secondary IDs come about when two or more terms are identical in meaning, and are merged into a single term. All terms IDs are preserved so that no information (for example, annotations to the merged IDs) is lost. More information on the protocols involved can be found in the documentation on term merges.

Synonyms

Any term may, but does not need to, include one or more synonyms (e.g. type I programmed cell death is a synonym of apoptosis). Synonyms are assigned a relationship to the primary term string; see the documentation on synonyms for more information.

Database cross-references

Another optional extra is one or more general database cross-references (dbxrefs), which refer to an identical object in another database. For instance, the molecular function term retinal isomerase activity has the database cross reference EC:5.2.1.3, which is the accession number of this enzyme activity in the Enzyme Commission database. There is a complete list of database cross-references and database abbreviations used by GO available.

Definition and Comment

GO terms should be equipped with a text definition, which includes an indication of the source of the definition. Terms may also have a comment, which gives more information about the term and its usage.

Back to top

Ontology Flat File Formats

There are two types of ontology flat file format, the older GO flat file format and the newer OBO flat file format. The GO flat file format is now deprecated but will continue to be provided alongside the new format.

See also the Java OBO parser guide, which gives details of the OBO parser implemented as part of OBO-Edit, and how to use it.

Back to top

GO RDF-XML Format

The GO RDF-XML version of GO, which includes all three ontologies and the definitions, can be downloaded from the GO database archive. The document type definition (DTD) is available from the GO FTP site.

The GO RDF-XML file is built from the flat files and the gene association files on a monthly basis.

Here's a GO RDF-XML snapshot (with some lines wrapped for legibility):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE go:go>

<go:go xmlns:go="xml-dtd/go.dtd#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <go:version timestamp="Wed May 9 23:55:02 2001" />
  <rdf:RDF>
    <go:term rdf:about="go#GO:0003673">
      <go:accession>GO:0003673</go:accession>
      <go:name>Gene_Ontology</go:name>
      <go:definition></go:definition>
    </go:term>
    <go:term rdf:about="go#GO:0003674">
      <go:accession>GO:0003674</go:accession>
      <go:name>molecular_function</go:name>
      <go:definition>The action characteristic of a gene product.</go:definition>
      <go:part-of rdf:resource="go#GO:0003673" />
      <go:dbxref>
        <go:database_symbol>go</go:database_symbol>
        <go:reference>curators</go:reference>
      </go:dbxref>
    </go:term>
    <go:term rdf:about="go#GO:0016209">
      <go:accession>GO:0016209</go:accession>
      <go:name>antioxidant</go:name>
      <go:definition></go:definition>
      <go:isa rdf:resource="go#GO:0003674" />
      <go:association>
        <go:evidence evidence_code="ISS">
          <go:dbxref>
            <go:database_symbol>fb</go:database_symbol>
            <go:reference>fbrf0105495</go:reference>
          </go:dbxref>
        </go:evidence>
        <go:gene_product>
          <go:name>CG7217</go:name>
          <go:dbxref>
            <go:database_symbol>fb</go:database_symbol>
            <go:reference>FBgn0038570</go:reference>
          </go:dbxref>
        </go:gene_product>
      </go:association>
      <go:association>
        <go:evidence evidence_code="ISS">
          <go:dbxref>
            <go:database_symbol>fb</go:database_symbol>
            <go:reference>fbrf0105495</go:reference>
          </go:dbxref>
        </go:evidence>
        <go:gene_product>
          <go:name>Jafrac1</go:name>
          <go:dbxref>
            <go:database_symbol>fb</go:database_symbol>
            <go:reference>FBgn0040309</go:reference>
          </go:dbxref>
        </go:gene_product>
      </go:association>
    </go:term>
  </rdf:RDF>
</go:go>

The basic unit of the GO RDF-XML database is GO:termid. Owing to limitations of the XML id and idref attributes (for instance, multiple parentage cannot be represented), the linking mechanism is RDF. RDF provides a much more flexible system for representing trees. To follow the links, note that term molecular function ; GO:0003674 has the attribute

rdf:about="go#GO:0003674"

This is roughly equivalent to

id="go#GO:0003674"

In rdf, unique urls are used as ids to make them universally unique. Now, note that term antioxidant activity ; GO:0016209 has the tag

<go:isa
rdf:resource="go#GO:0003674" />

This shows that its parent is molecular function ; GO:0003674. This tag represents the relationship "GO:0016209 isa GO:0003674" or, in plain English, "antioxidant is a molecular function". The other type of parentage relationship is go:part-of. molecular function ; GO:0003674 has the tag

<go:part-of
rdf:resource="go#GO:0003673" />

This shows the relationship "molecular function is part of the Gene Ontology".

In addition, each term can have one go:name, go:accession, go:definition, or multiple go:dbxrefs or go:associations. go:name, go:accession and go:definition are self-explanatory. go:dbxref represents the term in an external database, and go:association represents the gene associations of each term. go:association can have both go:evidence, which holds a go:dbxref to the evidence supporting the association, and a go:gene_product, which has the gene symbol and go:dbxref.

Back to top

OBO-XML Format

OBO-XML is a direct XML serialization of the OBO 1.2 Format specification. The schema is specified using RELAX-NG compact syntax: obo-xml.rnc. Currently, only the ontology is available as OBO-XML

OWL Format

OWL is a standard for ontology languages, produced by the W3C. Details of the translation used for GO is available on the official OboInOwl page.

MySQL Format

The MySQL version of GO can be downloaded from the GO database archives. Four databases are built and made available for download:

termdb
ontologies, definitions and mappings to other dbs
assocdb
the above, plus associations to gene products
seqdb
the above, plus protein sequences for some of the gene products
seqdblite
the above, with IEA associations stripped out (this is the version that drives AmiGO)

There are two download options for each of these databases, giving 8 possible options. You only need to download one of these files. You should not attempt to parse these files yourself, they are meant to be loaded into a MySQL database. There is also a perl API for advanced queries on the database. For full details, see the README file in the archive. To obtain documentation for the GO database, you should should download either of two files from the archive:

go_YYYYMM-schema-mysql.sql
the MySQL table creation statements, plus documentation
go_YYYYMM-schema-html
Designed for viewing with a web browser; does not contain full documentation.

Further documentation on the GO database can be found in the GO database guide.

Back to top

FASTA Format

There is a FASTA version of the gene products in the database available from the database archives.

Back to top

Mappings to Other Classification Systems

Mappings of GO have been made to other many other classification systems; a full list is available on the Mappings to GO page. The syntax of these files is as follows:

The source of the external file is given in the line beginning !Uses:

!Uses:http://www.tigr.org/docs/tigr-scripts/egad_scripts/role_reports.spl, 15 aug 2000.

The line syntax for mappings is

external database:term identifier (id/name) > GO:GO term name ; GO:id

For example:

TIGR_role:11030 73 Amino acid biosynthesis Glutamate family > GO:glutamine family amino-acid biosynthesis ; GO:0009084

all on a single line. The relationship between terms from external systems to GO terms can also be one to many, and these should just be added with a further >. For example:

MultiFun:1.5.1.18 Isoleucine/valine > GO:isoleucine biosynthesis ; GO:0009097 > GO:valine biosynthesis ; GO:0009099

If no equivalent GO term exists for a term from another classification system, GO:. should be added as a mapping. For example:

MultiFun:1.5 Building block biosynthesis > GO:.

Back to top


Open Biomedical Ontologies logo

Last modified Tuesday, 04-Dec-2007 16:40:11 PST
Cite GO • Terms of use • GO helpdesk
Copyright © 1999-Thursday, 24-Jul-2008 20:10:30 PDT the Gene Ontology