The Beginner's Guide to Modifying the Ontologies
This guide is intended for new GO curators. It assumes that you have a basic understanding of GO's structure and scope (see the GO introductory documentation) but no technical knowledge whatsoever.
The Basics
Accounts
Before you can assign SourceForge requests to yourself and edit the ontologies, you will need the following accounts set up:
- a UNIX account or an xterm or terminal emulator application at your work site
- GO CVS account - e-mail the GO helpdesk
- a user account on the Sourceforge web site
Jane, Amelia and Midori are SourceForge administrators so they can grant the privileges (tech and admin) that you need.
OBO-Edit
Editing the flat files using a text editor is inefficient and error-prone, so most of our editing is done using the ontology editing program OBO-Edit, which can be found on the GO SourceForge downloads page. For more information on how to use OBO-Edit, see the Gene Ontology OBO-Edit User Guide. There are two associated SourceForge trackers which you may find useful: the OBO-Edit feature request tracker and the OBO-Edit bug report tracker.
Mailing lists
There are a number of email lists available, depending on your field of interest and the work you intend to get involved in. See the mailing lists page for details of the lists available and how to subscribe.
Setting Up CVS Access
The CVS repository
The GO Consortium have set up a CVS repository to house the GO data. CVS, or concurrent versions system, is a tool that allows multiple users to edit a file simultaneously. The majority of the GO project data, included the ontology file and the annotation data, can be downloaded directly from the GO CVS repository at Stanford. To learn more about CVS, have a look at the CVS manual.
The .tcshrc File
Open a terminal window (or, if using Windows, access your UNIX account). When you start UNIX, you are placed in a program called the shell. Shells come in different flavors, but we use either the c-shell (csh) or a variant of it called the tc-shell (tcsh). The shell interprets commands (which you type in) and passes them to the kernel, which performs the commands. The first file that UNIX reads is your .tcshrc file; you can control your working environment by setting environment variables, such as setting a default text editor or a default printer, and aliases, which are short text strings to replace and represent long strings.
If you use this method, you can change the alias to anything you like. It will also save you a lot of typing in the future if you add some more aliases to your .tcshrc file for when you need to access cvs. If you're quite new to this kind of thing and you're planning to follow this series of guides right through then you can add the aliases in the example .tcshrc file linked below, and then the commands you need to type will correspond exactly to those in the instructions which follow. If you wish to make your own aliases then please remember to set the environmental variable: setenv CVS_RSH ssh and refer to the .tcshrc file to see the commands that I discuss later written out in full.
The example .tcshrc includes commands under '# Basic GO cvs setup:' and '# To check and commit OBO flatfiles to cvs' that are used in the subsequent guides. You may also wish later to add and experiment with the commands listed under '# Other useful GO cvs setup commands'. These are things that I have found useful but not crucial to the subsequent pages of this guide.
To use the aliases, you will need to use a text editor to edit your .tcshrc configuration file. There are several UNIX text editors, but one of the easiest to use is emacs. To open a text file in emacs, type
emacs filename
The online emacs tutorial can be accessed from within the program by typing
ctrl-h t
To edit your .tcshrc file, go to your home directory and then open the file using the following commands:
cd ~
emacs .tcshrc
Cut and paste in the aliases, and then save and close the file by typing
[control]-x [control]-s
[control]-x [control]-c
To put your changes into effect, you will now need to either restart your terminal application and open a new window, or tell the terminal to reread the .tcshrc file using the command
source .tcshrc
You are now ready to GO! The function of these aliases will be explained below and later on in the Guide to Addressing a SourceForge Request
Accessing the CVS repository
The first thing to do is to download the current version of the files in the GO CVS repository. To fit in with the aliases and directory structures described in this guide you should make sure you are in the Documents directory.
To change your ssh password (first time only), open a terminal window and type
sshpassword
Change the password to something more secure.
You will be accessing CVS via ssh and this would normally require that you type your password for every command. However you can avoid this by adding a key to your computer and to the remote host. To do this please follow the instructions shown at the Apple developer site under the heading 'Secure CVS via ssh' and using Version 2. Once you have made the key you should send the public key file id_dsa.pub to the sysadmin@genome.stanford.edu (Stanford Genetics system administrators). Make sure the files have the following permissions:
-rw------- 1 <user> <group> 672 Jul 17 11:19 id_dsa
-rw-r--r-- 1 <user> <group> 616 Jul 17 11:19 id_dsa.pub
To check out the contents of the GO repository, type
cvsco
The directories and files will then start to download; depending on the speed of your internet connection this may take an hour or more. This download puts the current GO files into a new directory called go; the contents of this directory are listed at the bottom of the page.
To update your existing copy of the GO repository, use the command
cvsup
You will also need to create a directory called old within the ontology directory in which you will store a copy of the unedited ontology file to check your edits against. The ontology directory is located under the main go directory, so create the old directory using the command
mkdir go/ontology/old
Setting your GO ID Range
The GO Numbers File
The GO_numbers file in the CVS repository contains a list of who 'owns' each range of GO IDs. To assign yourself some numbers on this list, you need to edit your local copy and then commit it back to the cvs repository. The numbers file is in plain text format and can be opened in any text editor; to open the file in the text editor emacs, type the command
emacs go/numbers/go_numbers
Scroll down to the text that reads "Allocated number ranges for additions" and add your chosen range of numbers in the format:
XXX: GO:0048001 to GO:0050000
where XXX are your initials. Then, locate the number range in the file and add a header for your section, in the format
[your name]'s GO numbers
You can then start your list of GO numbers and corresponding GO terms, using the following format:
GO:0048001 XXX erythrose-4-phosphate dehydrogenase
Once you've finished editing the file, save and close it, then commit it back to the cvs repository:
cvs ci go/numbers/go_numbers
Setting Your Number Range in OBO-Edit
Once you have claimed a set of numbers from the ontology numbers file, you must also set these numbers within the configuration file of OBO-Edit. To do this you should open OBO-Edit, and choose from the 'plugins' menu the 'OBO-Edit Configuration Manager'. Within this window you can now fill in your range of numbers, starting in the 'start of id range' line and finishing in the 'End of id range' line. Press 'Save Configuration' to save your changes.
The 'go' directory
- doc
- documents, including abbreviations for database cross-references and their definitions, and other files with miscellaneous types of information
- external2go
- files that contain a mapping from an external system (e.g. EC classifications, InterPro, MetaCyc) to GO terms
- gene-associations
- files created though curation by a group or project (e.g. SGD, FlyBase, GOA) that associate gene products with GO terms, including evidence codes, references, and other supporting information for the annotation
- gene-associations/ readme
- Readme files for the above mentioned gene-association files
- GO_slims
- files that contain a subset of GO terms, a selected set of high level terms from one, two, or three of the Gene Ontologies, particularly useful for reporting results of genome-level analyses; different GO_slims have been created for various purposes
- gp2protein
- index files between database object ID and sequence IDs.
- meeting
- Announcements, Agendas, and Meeting Programs for GO Users Meetings
- meeting/ minutes
- Minutes from GO Consortium Meetings
- numbers
- Defines number ranges for particular projects and lists numbers which have been used
- ontology
- Contains the OBO format ontology file, as well as the ontology and definitions files in the old GO format
- software
- Contributed software from the GO Consortium
- software/ Python/ mgibrowser
- A GO term browser provided by the MGI group
- software/ SGD
- Software from SGD group
- software/ SGD/ geneAssociation
- Perl scripts used by SGD to create their gene-association file.
- software/ SGD/ goAnnotation
- Perl scripts for Oracle storage of GO associations
- software/ SGD/ goPath
- Perl scripts to parse ontology files and output child/parent flat file
- software/ utilities
- Various utilities, including the script to create the GO Monthly Reports
- software/ utilities/ goview
- Java used to create the go-diff daily messages that show changes in the ontologies
- synonyms
- Files to define synonym types and list synonyms for GO terms
- teaching_resources
- collection of posters, presentations and tutorials by contributed by members of the Consortium
- www
- document root for the GO web site (www.geneontology.org); contains HTML files used for the GO web site
- xml
- archive for XML files created by the GOC group at LBNL (Berkeley, California).