These are questions and answers about the various file formats that can be found with the Gene Ontology Consortium.

What is a GPI file?

A GPI - Gene Product Information file is used to submit gene and gene product information to the GO Consortium. The specification is here. Please note that annotation information relationships between GO terms and annotations made to them uses GPAD format files

What is OBO file format?

The OBO file format is one of the formats that the Gene Ontology is made available in. The most recent version is OBO 1.2. The OBO format is designed to be more human readable than XML based formats. GO can be accessed in this format here.

What gene or protein IDs should I use?

The list of authoritative database groups for certain species lists the database groups who assume sole responsibility for collecting and submitting annotations for one or more species. If you can convert your IDs into the IDs used by that database group, you will be able to find the data you are looking for far more quickly and efficiently.

We maintain a list of suggested resources for mapping gene and protein IDs.

Why are the ontologies initially produced in OBO flat file format instead of XML?

The ontologies are initially produced in the specially designed OBO flat file format. They are converted to XML once a month for the convenience of users who require this facility. Both formats and many others are available in the GO downloads section. We use the OBO flat file format because it is human-readable, and also because the file is much smaller without the XML tags.

Why won't the RDF-XML file parse using RDF parsers?

The GO RDF-XML format was originally developed some time ago, before the advent of OWL. It has a few unusual features that render it more of a pseudo-rdf format. The actual RDF is embedded within a xml element - this should be stripped out before handing to RDF parsers. Note that the GO RDF-XML conforms to a DTD, something that is not normally a requirement of RDF. This is because most people parse the file using conventional XML parsers rather than XML tools. We are working on a more up to date RDF representation of GO.

How can I generate files in the old GO flat file format?

As of August 1, 2009, the original GO flat file format was deprecated and is no longer be provided by the GO Consortium.

The OBO-Edit project, which used to generate the flat file format, has been mothballed.

What is an OWL file?

OWL is the acronym for Web Ontology Language, a standard produced by the W3C. GO in OWL is based on a translation from OBO to OWL and is available for download here. OWL files can be opened in an editing tool such as Protege.

What are the file formats used by the Gene Ontology?

A general introduction to the project's file formats is available. This page provides information about the file formats relevant to the ontology and the files used to express ontology-gene product annotations.

What is a GPAD file?

The GPAD - Gene Product Association File Format - is an alternative means of exchanging annotations from the Gene Association File (GAF). The GPAD format is designed to be more normalized than GAF, and is intended to work in conjunction with a separate format for exchanging gene product information. The GPAD specification is defined in detail here

What is a GAF file?

A GAF file is a GO annotation file containing annotations made to the GO by a contributing resource such as FlyBase or Pombase. There are two versions of the file format, the most recent is GAF version 2.0 An explanation of the differences between versions 1.0 and 2.0 is available and the 1.0 specification is described here