Part 2: Training for Manual Curation of Research Literature - July 12-14, 2006
This page contains the reading list for Part 2 of the GO Consortium's 2006 Annotation Camp. We strongly encourage you to take a look at these papers in advance of arriving at the Annotation Camp to maximize your learning experience.
We are also using this same reading list for a comparative study of GO curation consistency between five of the nine designated reference genome groups of the GO Consortium. Participants in Part 2 of the Annotation Camp are invited to contribute to the study, though participation in the study is NOT required and lack of participation in the study will NOT have any impact on your participation in in the Annotation Camp.
If you are completely new to GO, we recommend that you read a little about it before arriving at the Annotation Camp. Here are some reviews:
A more extensive reading list of publications about GO or using GO is available from the Gene Ontology Bibliography page.
During the large group working session on Wednesday July 12th, we will read two papers and discuss the GO annotations that may or may not, be reasonably made from each paper. We suggest that you read these papers at least briefly before arriving at the camp.
supplementary material for Chang et al.
These are the 10 papers we will be reading for Part 2 of the Annotation Camp. You may get more out of the Annotation Camp if you read these papers in advance and make notes about any questions you may have. During the camp, we will be using this Excel workbook to record our annotations. Feel free to use it in advance to record any notes or questions about each paper.
GO Study: This same group of papers will also be used for the GO curation consistency study. If you fill out an Excel workbook with your annotations in advance of the camp, we would be interested in receiving an anonymous copy of your Excel workbook, if you are comfortable sharing it with us. More details about participating in the study are here.
Small Group Working Sessions: During Part 2 of the Annotation Camp, we will break into small groups to examine each of these papers carefully and discuss the GO annotations that can reasonably be made from the papers. Each group will contain at least one experienced GO annotator from one of the GO Consortium groups. These sessions will be a good opportunity for people to ask questions about the process of making GO annotations.
Group Discussions: Once the small groups have had a chance to discuss the papers and arrive at a group consensus, we will discuss the papers in the full group and can compare small group results to the consensus derived by the pair of experts for the relevant organism.
The main focus of the Annotation Camp is on determining appropriate GO terms to associate with a gene. However, sometimes determining the correct gene to which the GO annotation should be attached is itself a tricky issue. To facilitate your reading of the papers we have selected for the Annotation Camp, here are some of the basics of gene naming conventions in the organisms we will encounter in the reading list.
To get a feel for gene naming conventions in S. cerevisiae, let's start with a specific gene. SGD has a page for every gene in the database, for example the one for the gene DST1.
For names, a sequenced Arabidopsis genes will generally have an AGI name (#1 below), and may have one or both of the others.
Uncloned Arabidopsis genes will have #2 and/or #3 from above.
All Arabidopsis genes have a TAIR accession number, which you can find on the gene's detail page (right under Gene Model Type). For example, on the page for the gene called SQN, the TAIR accession is Gene:1945377.
The TAIR Nomenclature page provides more complete information about naming genes in A. thaliana.
Mouse gene names are should be brief and specific and should convey the character or function of the gene. Mouse genes are often referred to by their gene symbols, which are 3-5 characters, not to exceed 10 characters, e.g. Ash2l. The MGI Quick Guide to Nomenclature for Genes provides more detailed information.
Human genes tend to have both a descriptive gene name, e.g. iduronate 2-sulfatase, and a gene symbol that is a short form, or abbreviation, derived from the gene name, e.g. IDS The HGNC Guidelines on Human Gene Nomenclature provides more detailed information about the naming of human genes.
The UniProt GOA group uses UniProtKB accession IDs for annotation. For example, if you go to the UniProt website and are looking for Angiomotin, type in "angiomotin human". In your results, you will get (mostly) human proteins which are angiomotin or have angiomotin somewhere in their description. In the angiomotin example there are several accessions claiming to be human angiomotin, but some of these are fragments - the accession we use is the full length one which has been given a protein name - i.e. AMOT_HUMAN as opposed to Q8TEN8_HUMAN which is only a fragment.
HGNC also give the UniProtKB accessions.