!version: $Revision: 1.79 $
!date: $Date: 2012/08/14 15:24:47 $
!
! Gene Ontology Reference Collection
!
! The GO reference collection is a set of abstracts that can be cited
! in the GO ontologies (e.g. as dbxrefs for term definitions) and
! annotation files (in the Reference column).
!
! The collection houses two main kinds of references; one type are
! descriptions of methods that groups use for ISS, IEA, and ND
! evidence codes; the other type are abstract-style descriptions of
! "GO content" meetings at which substantial changes in the ontologies
! are discussed and made.
!
! data fields for this file:
!
! go_ref_id: [mandatory; cardinality 1; GO_REF:nnnnnnn]
! alt_id: [not mandatory; cardinality 0,1,>1; GO_REF:nnnnnnn]
! title: [mandatory; cardinality 1; free text]
! authors: [mandatory; cardinality 1; free text??
! or cardinality 1,>1 and one entry per author?]
! year: [mandatory, cardinality 1]
! external_accession: [not mandatory; cardinality 0,1,>1; DB:id]
! citation: [not mandatory; cardinality 0,1; use for published refs]
! abstract: [mandatory; cardinality 1; free text]
! comment: [not mandatory; cardinality 1; free text]
! is_obsolete: [not mandatory; cardinality 0,1; 'true';
! if tag is not present, assume that the ref is not obsolete
! denotes a reference no longer used by the contributing database]
!
! If a database maintains its own internal reference collection, and
! has a record that is equivalent to a GO_REF entry, the database's
! internal ID should be included as an external_accession for the
! corresponding GO_REF.
!
!This data is available as a web page at
!http://www.geneontology.org/cgi-bin/references.cgi
!
go_ref_id: GO_REF:0000001
title: GO Consortium unpublished data
authors: GO curators
year: 1998
abstract: No abstract available.
comment: This reference will normally be replaced upon publication of the data supporting the annotation. Formerly GOC:unpublished.
go_ref_id: GO_REF:0000002
alt_id: GO_REF:0000007
alt_id: GO_REF:0000014
alt_id: GO_REF:0000016
alt_id: GO_REF:0000017
title: Gene Ontology annotation through association of InterPro records with GO terms.
authors: DDB, FB, MGI, GOA, ZFIN curators
year: 2001
external_accession: MGI:2152098
external_accession: J:72247
external_accession: ZFIN:ZDB-PUB-020724-1
external_accession: FB:FBrf0174215
external_accession: dictyBase_REF:10157
external_accession: SGD_REF:S000124036
abstract: Transitive assignment of GO terms based on InterPro classification. For any database entry (representing a protein or protein-coding gene) that has been annotated with one or more InterPro domains, The corresponding GO terms are obtained from a translation table of InterPro entries to GO terms (interpro2go) generated manually by the InterPro team at EBI. The mapping file is available at http://www.geneontology.org/external2go/interpro2go.
comment: Formerly GOA:interpro. Note that GO annotations based on InterPro-to-GO transitive assignment may undergo subsequent filtering, e.g. to remove annotations redundant with manual curation; consult documentation from the annotation providers for further information.
go_ref_id: GO_REF:0000003
alt_id: GO_REF:0000005
title: Gene Ontology annotation based on Enzyme Commission mapping.
authors: GOA curators, MGI curators
year: 2001
external_accession: MGI:2152096
external_accession: J:72245
external_accession: ZFIN:ZDB-PUB-031118-3
external_accession: SGD_REF:S000124037
citation: PMID:11374909
abstract: Transitive assignment using Enzyme Commission identifiers. This method is used for any database entry, such as a protein record in UniProtKB or TrEMBL, that has had an Enzyme Commission number assigned. The corresponding GO term is determined using the EC cross-references in the GO molecular function ontology. Also see Hill et al., Genomics (2001) 74:121-128. The mapping file is available at http://www.geneontology.org/external2go/ec2go.
comment: Formerly GOA:spec.
go_ref_id: GO_REF:0000004
alt_id: GO_REF:0000009
alt_id: GO_REF:0000013
title: Gene Ontology annotation based on UniProtKB keyword mapping.
authors: GOA curators
year: 2000
external_accession: MGI:1354194
external_accession: J:60000
external_accession: ZFIN:ZDB-PUB-020723-1
external_accession: SGD_REF:S000124038
abstract: Transitive assignments using UniProtKB keywords. The UniProtKB keyword controlled vocabulary has been created and used by the UniProt Knowledgebase (UniProtKB) to supply 10 different categories of information to UniProtKB entries. Further information on the UniProtKB keyword resource can be found at http://www.uniprot.org/docs/keywlist
UniProtKB keywords are assigned to UniProtKB/UniProtKB entries by UniProt curators as part of the UniProtKB manual curation process. In contrast however, UniProtKB keywords are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program.
Further information on the two different UniProt annotation methods is available at http://www.uniprot.org/faq/45 and http://www.uniprot.org/program/automatic_annotation .
When a UniProtKB keyword describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the keyword to an equivalent term in GO. The mapping between UniProtKB keywords and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB keywords is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/uniprotkb_kw2go .
comment: Formerly GOA:spkw.
go_ref_id: GO_REF:0000006
title: Gene Ontology annotation by the MGI curatorial staff, Mouse Locus Catalog
authors: Mouse Genome Informatics scientific curators
year: 2001
external_accession: MGI:2152097
external_accession: J:72246
citation: PMID:11374909
abstract: For annotations documented via this citation, curators used the information in the Mouse Locus Catalog in MGI to assign GO terms. The GO terms were assigned based on MLC textual descriptions of genes that could not be traced to the primary literature. Details of this strategy can be found in Hill et al, Genomics (2001) 74:121-128.
is_obsolete: true
go_ref_id: GO_REF:0000008
title: Gene Ontology annotation by the MGI curatorial staff, curated orthology
authors: Mouse Genome Informatics scientific curators
year: 2001
external_accession: MGI:2154458
external_accession: J:73065
abstract: The sequence conservation that permits the establishment of orthology between mouse and rat or mouse and human genes is a strong predictor of the conservation of function for the gene product across these species. Therefore, in instances where a mouse gene product has not been functionally characterized, but its human or rat orthologs have, Mouse Genome Informatics (MGI) curators append the GO terms associated with the orthologous gene(s) to the mouse gene. Only those GO terms assigned by experimental determination to the ortholog of the mouse gene will be adopted by MGI. GO terms that are assigned to the ortholog of the mouse gene computationally (i.e. IEA), will not be transferred to the mouse ortholog. The evidence code represented by this citation is Inferred by Sequence Similarity (ISS.)
go_ref_id: GO_REF:0000010
title: Gene Ontology annotation by the MGI curatorial staff, mouse gene nomenclature
authors: Mouse Genome Informatics scientific curators
year: 1999
external_accession: MGI:1347124
external_accession: J:56000
citation: PMID:11374909
abstract: For annotations documented via this citation, curators designed queries based on their knowledge of mouse gene nomenclature to group genes that shared common molecular functions, biological processes or cellular components. GO annotations were assigned to these genes in groups. Details of this strategy can be found in Hill et al., Genomics (2001) 74:121-128.
go_ref_id: GO_REF:0000011
title: Hidden Markov Models (TIGR)
authors: Michelle Gwinn, TIGR curators
year: 2003
abstract: A Hidden Markov Model (HMM) is a statistical representation of patterns found in a data set. When using HMMs with proteins, the HMM is a statistical model of the patterns of the amino acids found in a multiple alignment of a set of proteins called the "seed". Seed proteins are chosen based on sequence similarity to each other. Seed members can be chosen with different levels of relationship to each other. They can be members of a superfamily (ex. ABC transporter, ATP-binding proteins), they can all share the same exact specific function (ex. biotin synthase) or they could share another type of relationship of intermediate specificity (ex. subfamily, domain). New proteins can be scored against the model generated from the seed according to how closely the patterns of amino acids in the new proteins match those in the seed. There are two scores assigned to the HMM which allow annotators to judge how well any new protein scores to the model. Proteins scoring above the "trusted cutoff" score can be assumed to be part of the group defined by the seed. Proteins scoring below the "noise cutoff" score can be assumed to NOT be a part of the group. Proteins scoring between the trusted and noise cutoffs may be part of the group but may not. One of the important features of HMMs is that they are built from a multiple alignment of protein sequences, not a pairwise alignment. This is significant, since shared similarity between many proteins is much more likely to indicate shared functional relationship than sequence similarity between just two proteins. The usefulness of an HMM is directly related to the amount of care that is taken in chosing the seed members, building a good multiple alignment of the seed members, assessing the level of specificity of the model, and choosing the cutoff scores correctly. In order to properly assess what functional relevance an above-trusted scoring HMM match has to a query, one must carefully determine what the functional scope of the HMM is. If the HMM models proteins that all share the same function then it is likely possible to assign a specific function to high-scoring match proteins based on the HMM. If the HMM models proteins that have a wide variety of functions, then it will not be possible to assign a specific function to the query based on the HMM match, however, depending on the nature of the HMM in question, it may be possible to assign a more general (family or subfamily level) function. In order to determine the functional scope of an HMM, one must carefully read the documentation associated with the HMM. The annotator must also consider whether the function attributed to the proteins in the HMM makes sense for the query based on what is known about the organism in which the query protein resides and in light of any other information that might be available about the query protein. After carefully considering all of these issues the annotator makes an annotation.
go_ref_id: GO_REF:0000012
title: Pairwise alignment (TIGR)
authors: Michelle Gwinn, TIGR curators
year: 2003
abstract: Pairwise alignments are generated by taking two sequences and aligning them so that the maximum number of amino acids in each protein match, or are similar to, each other. Tools such as BLAST work by comparing a protein-of-interest individually with every protein in a database of known protein sequences and retaining only those matches with a high probability of being significant. Basic BLAST generates local alignments between proteins for regions of high similarity. Other pairwise alignment tools attempt to generate global (full-length) protein alignments. A tool called Blast_Extend_repraze (BER, http://ber.sourceforge.net) has some benefits over basic BLAST. Input into the BER tool includes the underlying DNA sequence for each protein as well as 300 nucleotides upstream and downstream of the predicted boundaries of the protein coding sequence. This allows annotators to see the DNA sequence that underlies the query protein as part of the alignment. In addition, the BER tool is able to look for continuation of regions of similarity through frameshifts and in-frame stop codons. If such regions are found the alignment is continued. BER searches are done in a two-step process: step one is a BLAST search against a non-redundant protein database, significant BLAST hits are stored in a mini-database for each query protein; step two is a modified Smith-Waterman alignment between the query and the proteins in its mini-database. In order to assess whether a given BER alignment is good enough to assert that the query shares the function of the match protein, one must look at a several factors. First of all, the match protein must itself be experimentally characterized in order to avoid transitive annotation errors. In addition, any residues or secondary structures known to be important for function in the match protein must be conserved in the query. The alignment should be visually inspected to look for any areas of lesser quality that might indicate the two proteins do not share the same function. Although it is impossible to set cutoff values for percent identity and length of match that will apply for every alignment, there are some guidelines. In general at least 40% identity that extends over the full lengths of both proteins is required in order to even consider functional equivalence. However, this percentage is highly dependent on the length and complexity of the proteins. 40% identity between two proteins 500 amino acids long is much more significant that 40% identity between two proteins that are only 100 amino acids long. Therefore, the annotator's experience and knowledge of what is considered significant for the organism and protein family in question is very important. Some sets of proteins are much more highly conserved than others and therefore tolerances for percent identity may have to be adjusted. Finally, the alignment must be considered in the context of what else is known about the query protein and the organism as a whole.
go_ref_id: GO_REF:0000015
title: Use of the ND evidence code for Gene Ontology (GO) terms.
authors: GO Curators
year: 2002
external_accession: AspGD_REF:ASPL0000111607
external_accession: CGD_REF:CAL0125086
external_accession: dictyBase_REF:2
external_accession: dictyBase_REF:9851
external_accession: FB:FBrf0159398
external_accession: MGI:MGI:2156816
external_accession: RGD:1598407
external_accession: SGD_REF:S000069584
external_accession: TAIR:Communication:1345790
external_accession: ZFIN:ZDB-PUB-031118-1
external_accession: GO_REF:nd
abstract: The Gene Ontology (GO) Consortium created the evidence code "ND" to indicate "no biological data available". This code is used for annotations to any of the three terms 'molecular function unknown ; GO:0005554', 'biological process unknown ; GO:0000004' or 'cellular component unknown ; GO:0008372'. In GO member databases, the use of any of these three GO terms, attributed to this reference and supported by the ND evidence code, signifies that a curator has examined the available literature and/or sequence for this gene or protein and that as of the date of the annotation to the unknown term, there is no information supporting an annotation to any GO term in that ontology. (Note that ND can be used with any one (or two) of the 'unknown' terms, even if there is data available to support annotation to a term from one or both of the other ontologies; e.g., ND can be used with GO:0008372 if the function and process are known but component is not).
comment: From FlyBase.
go_ref_id: GO_REF:0000018
title: dictyBase 'Inferred from Electronic Annotation (BLAST method)'
authors: DictyBase curators
year: 2005
external_accession: dictyBase_REF:10158
abstract: Gene Ontology (GO) annotations with the evidence code 'Inferred from Electronic Annotation' (IEA) are assigned automatically to gene products in dictyBase. All Dictyostelium protein sequences are analyzed by BLAST against GO gene association sequence files, identifying proteins in other organisms that align with Dictyostelium proteins with an E value less than or equal to e-50. GO annotations that have been manually assigned to these proteins from other species are then imported and attached to the corresponding gene product in dictyBase. The proteins from which the annotations are derived are displayed in the 'Evidence' column on the Gene Ontology evidence and references page.
go_ref_id: GO_REF:0000019
title: Automatic transfer of experimentally verified manual GO annotation data to orthologs using Ensembl Compara
authors: Ensembl curators, GOA curators
year: 2006
abstract: GO terms from a source species are projected on to one or more target species based on gene orthology obtained from the Ensembl Compara system. Only one to one and apparent one to one orthologies are used for a restricted range of species. Only GO annotations with a manual experimental evidence type of IDA, IEP, IGI, IMP or IPI are projected. Projected GO annotations using this technique will receive the evidence code, inferred from electronic annotation, 'IEA'. The Ensembl protein identifier of the annotation source is indicated in the 'With' column of the GOA association file.
go_ref_id: GO_REF:0000020
title: Electronic Gene Ontology annotations created by transferring manual GO annotations between orthologous microbial proteins
authors: Swiss Institute of Bioinformatics (SIB) curators, GOA curators
year: 2006
abstract: GO terms are manually assigned to each HAMAP family rule. High-quality Automated and Manual Annotation of microbial Proteins (HAMAP) family rules are a collection of orthologous microbial protein families, from bacteria, archaea and plastids, generated manually by expert curators. The assigned GO terms are then transferred to all the proteins that belong to each HAMAP family. Only GO terms from the molecular function and biological process ontologies are assigned. GO annotations using this technique will receive the evidence code Inferred from Electronic Annotation (IEA). These annotations are updated monthly by HAMAP and are available for download on both GO and GOA EBI ftp sites. To report an annotation error or inconsistency, or for further information, please contact the GO Consortium at gohelp@genome.stanford.edu or submit a comment the SourceForge Annotation Issues tracker (http://sourceforge.net/projects/geneontology/). HAMAP is a project based at the Swiss Institute of Bioinformatics (Gattiker et al. 2003, Comp. Biol and Chem. 27: 49-58). For further information, please see http://www.expasy.org/sprot/hamap/.
go_ref_id: GO_REF:0000021
title: Improving the representation of central nervous system development in the biological process ontology
authors: Judith Blake (1, 2), William Bug (3), Rex Chisholm (1, 4), Jennifer Clark (1, 5), Erika Feltrin (6), Jacqueline Finger (2), David Hill (1, 2), Midori Harris (1, 5), Terry Hayamizu (2), Doug Howe (9), Maryanne Martone (7), Kathleen Millen (8), Francis Sele (4) (1. The Gene Ontology Consortium, 2. Mouse Genome Informatics, Bar Harbor, ME, 3. Drexel University, Philadelphia, PA, 4. Northwestern University, Chicago, IL, 5. EMBL-EBI, Hinxton, Cambridgeshire, UK, 6. The University of Padua, Padua, Italy, 7. The University of California at San Diego, San Diego, CA, 8. The University of Chicago, Chicago, IL, 9. The Zebrafish Information Network, University of Oregon, Eugene, OR)
year: 2006
abstract: Current genetic and molecular studies in many model organisms are aimed at understanding formation and development of the nervous system. Up until this point, the GO has had a very shallow representation of processes pertaining to the nervous system. In June 2006, curators from MGI and ZFIN met with researchers studying central nervous system development to improve the representation of these processes in GO. In particular, emphasis was placed on three areas that are being addressed actively in current research: forebrain development, hindbrain development and neural tube development. This collaboration resulted in the addition of over 500 terms that reflect the development of the forebrain, the hindbrain, and the neural tube from the perspective of biological process and anatomical structure.
go_ref_id: GO_REF:0000022
title: Improving the representation of immunology in the biological process Ontology
authors: Alison Deckhut Augustine (1), Alan Collmer (2), Judith A. Blake (3, 4), Candace W. Collmer (2, 3), Shane C. Burgess (5), Lindsay Grey Cowell (6), Jennifer I. Clark (3, 7), Bernard de Bono (7), Russell T. Collins (8), Alexander D. Diehl (3, 4), Michelle Gwinn Giglio (3, 9), Jamie A. Lee (10), Linda Hannick (3, 9), Jane Lomax (3, 7), Midori A. Harris (3, 7), Christopher J. Mungall (3, 11), David P. Hill (3, 4), Richard H. Scheuermann (10), Amelia Ireland (3, 7), Alessandro Sette (12) (1. NIAID, 2. Cornell University, 3. The GO Consortium, 4. Mouse Genome Informatics, 5. Mississippi State University, 6. Duke University, 7. EMBL-EBI, 8. University of Cambridge, 9. The Institute for Genomic Research, 10. U.T. Southwestern Medical Center, 11. HHMI, 12. La Jolla Institute for Allergy and Immunology)
year: 2005
abstract: GO terms describing processes, functions, and cellular components related to the immune system have existed in the GO from its beginning and been used extensively in the annotation of gene products. However, particularly in the biological process ontology, the initial set of terms relating to immunology failed to cover the breadth of known immunological processes, and in many cases diverged from current usage and understanding in their names, definitions, and ontological placement. As part of a larger effort to improve the representation of immunology in the GO, a GO Content Meeting was held November 15-16, 2005, at The Institute for Genomic Research, to discuss improvements to representation of immunology in the biological process ontology of the GO. As a result of the meeting, a number of high level terms for immunological processes were created, an overall structure for immunologically related terms was established, and certain existing terms were renamed or redefined as well to bring them in line with current usage.
go_ref_id: GO_REF:0000023
title: Gene Ontology annotation based on UniProtKB Subcellular Location vocabulary mapping.
authors: GOA curators, UniProt curators
year: 2007
external_accession: SGD_REF:S000125578
abstract: Transitive assignment of GO terms based on the UniProtKB Subcellular Location vocabulary. UniProtKB Subcellular Location is a controlled vocabulary used to supply subcellular location information to UniProtKB entries in the SUBCELLULAR LOCATION lines. Terms from this vocabulary are annotated manually to UniProtKB/Swiss-Prot entries but are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program.
Further information on these two different annotation methods is available at http://www.uniprot.org/faq/45 and http://www.uniprot.org/program/automatic_annotation .
When a UniProtKB Subcellular Location term describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the term to an equivalent term in GO. The mapping between UniProtKB Subcellular Location terms and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB Subcellular Location term is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/spsl2go .
go_ref_id: GO_REF:0000024
title: Manual transfer of experimentally-verified manual GO annotation data to orthologs by curator judgment of sequence similarity.
authors: AgBase, BHF-UCL, dictyBase, HGNC, Roslin Institute and UniProtKB curators.
year: 2011
external_accession: dictyBase_REF:9
abstract: Method for transferring manual annotations to an entry based on a curator's judgment of its similarity to a putative ortholog that has annotations that are supported with experimental evidence. Annotations are created when a curator judges that the sequence of a protein shows high similarity to another protein that has annotation(s) supported by experimental evidence (and therefore display one of the evidence codes EXP, IDA, IGI, IMP, IPI or IEP). Annotations resulting from the transfer of GO terms display the 'ISS' evidence code and include an accession for the protein from which the annotation was projected in the 'with' field (column 8). This field can contain either a UniProtKB accession or an IPI (International Protein Index) identifier. Only annotations with an experimental evidence code and which do not have the 'NOT' qualifier are transferred. Putative orthologs are chosen using information combined from a variety of complementary sources. Potential orthologs are initially identified using sequence similarity search programs such as BLAST. Orthology relationships are then verified manually using a combination of resources including sequence analysis tools, phylogenetic and comparative genomics databases such as Ensembl Compara, INPARANOID and OrthoMCL, as well as other specialised databases such as species-specific collections (e.g. HGNC's HCOP). In all cases curators check each alignment and use their experience to assess whether similarity is considered to be strong enough to infer that the two proteins have a common function so that they can confidently project an annotation. While there is no fixed cut-off point in percentage sequence similarity, generally proteins which have greater than 30% identity that covers greater than 80% of the length of both proteins are examined further. For mammalian proteins this cut-off tends to be higher, with an average of 80% identity over 90% of the length of both proteins. Strict orthologs are desirable but not essential. When there is evidence of paralogs, annotations are transferred only to the most similar protein in each species. Further detailed information on this procedure, including how ISS annotations are made to protein isoforms, can be found at: http://www.ebi.ac.uk/GOA/ISS_method.html.
url: http://www.ebi.ac.uk/GOA/ISS_method.html
go_ref_id: GO_REF:0000025
title: Operon structure as IGC evidence
authors: Michelle Gwinn, TIGR curators
year: 2007
abstract: Genes in prokaryotic organisms are often arranged in operons. Genes in an operon are all transcribed into one mRNA. Generally the genes in the operons code for proteins that all have related functions. For example, they may be the steps in a biochemical pathway, or they may be the subunits of a protein complex. Often the genes in operons shared between organisms are syntenic; that is, the same genes are in the same order in the operon in different species. When assessing sequence-comparison-based evidence during the process of manual annotation of a genome, it is often the case that some of the genes in the operon will have strong sequence-based evidence while others will have weak evidence. If seen alone, not in the presence of an operon, the weak evidence in question may not be sufficient to make a functional annotation. However, in the presence of an operon in which there is strong evidence for some of the genes, the very presence of the gene in the operon is a strong indication that the gene shares in the process carried out by the operon. If the putative function is one expected to exist for the process in question and particularly if that function has been observed in the same operon in another species, then the annotation should be made. This type of evidence is inferred from the context of the gene in an operon, and therefore the evidence code is IGC "inferred from genomic context."
go_ref_id: GO_REF:0000026
title: Improving the representation of muscle biology in the biological process and cellular component ontologies.
authors: Jennifer Deegan nee Clark (1, 5), Alexander D. Diehl (1,7), Elisabeth Ehler (2), Georgine Faulkner (3), Erika Feltrin (4), Jennifer Fordham (2), Midori Harris (1, 5), Ralph Knoell (6) David Hill (1, 7), Paolo Laveder (8), Alessandra Nori (8), Carlo Reggiani (8), Vincenzo Sorrentino (9), Giorgio Valle (4), Pompeo Volpe (8) (1. The Gene Ontology Consortium, 2. King's College, London, UK, 3. ICGEB, Trieste, Italy, 4. CRIBI - University of Padua, Padua, Italy 5. EMBL-EBI, Hinxton, Cambridgeshire, UK, 6. University of Goettingen, Goettingen, Germany 7. Mouse Genome Informatics, Bar Harbor, ME, 8. University of Padua, Padua, Italy, 9. University of Siena, Siena, Italy)
year: 2007
abstract: A meeting focused on the biology of skeletal and smooth muscle has been held on 24-25 July 2007 at the University of Padua, Italy, as a collaboration with the GO consortium and CRIBI Biotechnology Center. The aims of this effort were to provide a comprehensive representation of muscle biology in the biological process and cellular component ontologies and to improve the organization of muscle-specific terms to better describe the current knowledge of biological mechanisms in muscle tissue. Thus, the collaboration brought together experts in several areas of muscle biology and physiology who carried out a thorough review of the existing GO muscle terms as these terms were largely created by non-muscle experts using older definitions. In particular, several areas are being addressed actively in current research: the biological processes of muscle contraction, muscle plasticity, muscle development, and muscle regeneration; and the sarcoplasmic reticulum and membrane delimited compartments. This work resulted in the addition of 159 new terms and in the modification of 57 terms to bring them in line with current usage. Funding for the meeting was provided by Italian Telethon Foundation.
go_ref_id: GO_REF:0000027
title: BLAST search criteria for ISS assignment in PAMGO_GAT
authors: PAMGO_GAT curators
year: 2007
abstract: This GO reference describes the criteria used in assigning the evidence code of ISS via BLAST searches to annotate gene products from PAMGO_GAT. Standard BLASTP from NCBI was used (http://www.ncbi.nih.gov/blast) to query the non-redundant (NR) database. Hits are considered to be significant if the E-value is at or less than 10^-4. All other parameters are default according to http://www.ncbi.nih.gov/blast.
go_ref_id: GO_REF:0000028
title: Criteria for IDA, IEP, ISS, IGC, RCA, ND, and IEA assignment in PAMGO_MGG
authors: PAMGO_MGG curators
year: 2008
abstract: This GO reference describes the criteria used in assigning the evidence codes of IDA, IEP, ISS, IGC, RCA, ND and IEA to annotate gene products from PAMGO_MGG. Standard BLASTP from NCBI was used (http://www.ncbi.nih.gov/blast) to iteratively search reciprocal best hits and thus identify orthologs between predicted proteins of Magnaporthe grisea and GO proteins from multiple organisms with published association to GO terms (http://www.geneontology.org/GO.downloads.database.shtml). The alignments were manually reviewed for those hits with e-value equal to zero and with 80% or better coverage of both query and subject sequences, and for those hits with e<=10^-20, pid >=35 and sequence coverage >=80%. Furthermore, experimental or reviewed data from literature and other sources were incorporated into the GO annotation. IDA was assigned to an annotation if normal function of its gene was determined through transfections into a cell line and overexpression. IEP was assigned to an annotation if according to microarray experiments, its gene was upregulated in a biological process and the fold change was equal to or bigger than 10, or if according to Massively Parallel Signature Sequencing (MPSS), its gene was upregulated only in a certain biological process and the fold change was equal to or bigger than 10. ISS was assigned to an annotation if the entry at the With_column was experimentally characterized and the pairwise alignments were manually reviewed. IGC was assigned to an annotation if it based on comparison and analysis of gene location and structure, clustering of genes, and phylogenetic reconstruction of these genes. RCA was assigned to an annotation if it based on integrated computational analysis of whole genome microarray data, and matches to InterPro, pfam, and COG etc. When no knowledge (experimental/computational) was available about a gene product in any one of the GO aspects, the gene product was annotated to the root term (GO:0005575 for Cellular Component, GO:0003674 for Molecular Function, and GO:0008150 for Biological Process), and was assigned an ND evidence code. IEA was assigned to an annotation if its function assignment based on computational work, and no manual review was done.
go_ref_id: GO_REF:0000029
title: Gene Ontology annotation based on information extracted from curated UniProtKB entries
authors: GOA-UniProt curators
year: 2001-2007
abstract: Method by which GO terms were manually assigned to UniProt KnowledgeBase accessions, using either a NAS or TAS evidence code, by applying information extracted from the corresponding publicly-available, manually curated UniProtKB entry. Such GO annotations were submitted by the GOA-UniProt group from 2001, but this annotation practice was discontinued in 2007.
go_ref_id: GO_REF:0000030
title: Portable Annotation Rules
authors: Daniel Haft, JCVI
year: 2008
abstract: The JCVI is developing a collection of mixed-evidence annotation rules, under the working name BrainGrab/RuleBase (BGRB). A rule has two parts. The first is the set of conditions that must be met for the rule to fire. The second is the set actions to be taken for rules that have fired. BGRB rules are designed to serve as proxies for the annotators that create them. They have very high fidelity but may have low coverage. Types of evidence used in combination include HMM hits and BLAST matches, hits to neighboring genes, pathway reconstruction reports from the Genome Properties system, and species taxonomy. BLAST matches are described by a number of separate parameters for raw score, percent sequence identity, and coverage of total sequence length by the match region. These parameters are customized for each protein family in order to achieve high fidelity in automated annotation systems. The flexible syntax makes it possible to use existing protein family classifiers, such as Pfam and TIGRFAMs HMMs, in new ways. It is especially useful in assigning GO terms to proteins such as SelD (selenide, water dikinase) that have different roles in different contexts.
go_ref_id: GO_REF:0000031
title: NIAID Cell Ontology Workshop
authors: Alexander D. Diehl, Alison Deckhut Augustine, Judith A. Blake, Lindsay G. Cowell, Elizabeth S. Gold, Timothy A. Gondre-Lewis, Anna Maria Masci, Terrence F. Meehan, Penelope A. Morel, Anastasia Nijnik, Bjoern Peters, Bali Pulendran, Richard H. Scheuermann, Q. Alison Yao, Martin S. Zand, Christopher J. Mungall
year: 2008
abstract: The NIAID sponsored a Cell Ontology Workshop, May 13-14, 2008, in Bethesda, focusing on improving representation of immune cell types in the Cell Ontology. The participants in the workshop worked together to extend the current ontology in the area of immune cell types and to provide the necessary information for the upcoming restructuring of the Cell Ontology in single-inheritance form with genus-differentia definitions.
url: http://www.bioontology.org/wiki/index.php/NIAID_Cell_Ontology_Workshop_May_2008
go_ref_id: GO_REF:0000032
title: Inference of Biological Process annotations from inter-ontology links
authors: Christopher J. Mungall, Tanya Z. Berardini, David P. Hill
abstract: We use the GOBO library to propagate annotations from Molecular Function to Biological Process. This results in both increased numbers of annotations, and increased consistency between curators.
url: http://wiki.geneontology.org/index.php/GAF_Inference
go_ref_id: GO_REF:0000033
title: Annotation inferences using phylogenetic trees
authors: Pascale Gaudet, Michael Livstone, Paul Thomas, The Reference Genome Project
year: 2010
external_accession: SGD_REF:S000146947
external_accession: TAIR:Communication:501741973
external_accession: MGI:MGI:4459044
external_accession: PAINT_REF:[0-9]{7}
external_accession: ZFIN:ZDB-PUB-110330-1
abstract: The goal of the GO Reference Genome Project, described in PMID 19578431, is to provide accurate, complete and consistent GO annotations for all genes in twelve model organism genomes.To this end, GO curators are annotating evolutionary trees from the PANTHER database with GO terms describing molecular function, biological process and cellular component. GO terms based on experimental data from the scientific literature are used to annotate ancestral genes in the phylogenetic tree by sequence similarity (ISS), and unannotated descendants of these ancestral genes are inferred to have inherited these same GO annotations by descent. The annotations are done using a tool called PAINT (Phylogenetic Annotation and INference Tool).
url: http://gocwiki.geneontology.org/index.php/PAINT
go_ref_id: GO_REF:0000034
title: Phenoscape Skeletal Anatomy Jamboree
authors: Brian K. Hall (Dalhousie University), Matthew Vickaryous (Ontario Veterinary College, University of Guelph), David Blackburn, University of Kansas; Wasila Dahdul, University of South Dakota and NESCent; Alexander Diehl, Mouse Genome Informatics (MGI); Melissa Haendel, Oregon Health Sciences University; John G. Lundberg, Department of Ichthyology, Academy of Natural Sciences, Philadelphia; Paula Mabee, Department of Biology, University of South Dakota; Martin Ringwald, Mouse Genome Informatics (MGI); Erik Segerdell, Oregon Health Sciences University; Ceri Van Slyke, Zebrafish Information Network (ZFIN); Monte Westerfield, Zebrafish Information Network (ZFIN) and Institute of Neuroscience, University of Oregon.
year: 2010
abstract: Skeletal cell terms and relationships were added and revised at the Skeletal Anatomy Jamboree held by Phenoscape (NSF grant BDI-0641025) and hosted by the National Evolutionary Synthesis Center (NESCent), April 9-10, 2010.
go_ref_id: GO_REF:0000035
title: Automatic transfer of experimentally verified manual GO annotation data to plant orthologs using Ensembl Compara
authors: Ensembl, GRAMENE, GOA curators
year: 2011
abstract: GO terms from a source species are projected onto one or more target species based on gene orthology obtained from the Ensembl Compara system. One to one, one to many and many to many orthologies are used but annotations are only projected between orthologs that have at least a 40% peptide identity to each other. Only GO annotations with an evidence type of IDA, IEP, IGI, IMP or IPI are projected, no annotations with a 'NOT' qualifier are projected and annotations to the GO:0005515 protein binding term are not projected. Projected GO annotations using this technique will receive the evidence code Inferred from Electronic Annotation (IEA). The model organism database identifier of the annotation source will be indicated in the 'With' column of the GOA association file.
go_ref_id: GO_REF:0000036
title: Manual annotations that require more than one source of functional data to support the assignment of the associated GO term
authors: GO Annotation working group
year: 2011
external_accession: SGD_REF:S000147045
abstract: The Gene Ontology Consortium uses the IC (Inferred by Curator) evidence code when an annotation cannot be supported by any direct evidence, but can be inferred by GO annotations that have been annotated to the same gene/gene product identifier in conjunction with the curator's knowledge of biology (supporting GO annotations must not be IC-evidenced). In many cases an IC-evidenced annotation simply applies the same reference that was used in the supporting GO annotation. The use of IC evidence code in an annotation with reference GO_REF:0000036 signifies a curator inferred the GO term based on evidence from multiple sources of evidence/GO annotations. The 'with/from' field in these annotations will therefore supply more than one GO identifier, obtained from the set of supporting GO annotations assigned to the same gene/gene product identifier which cite publicly-available references.
go_ref_id: GO_REF:0000037
title: Gene Ontology annotation based on manual assignment of UniProtKB keywords in UniProtKB/Swiss-Prot entries.
authors: UniProt-GOA
year: 2011
external_accession: SGD_REF:S000148669
abstract: Transitive assignments using UniProtKB keywords. The UniProtKB keyword controlled vocabulary has been created and used by the UniProt Knowledgebase (UniProtKB) to supply 10 different categories of information to UniProtKB entries. Further information on the UniProtKB keyword resource can be found at http://www.uniprot.org/docs/keywlist. UniProtKB keywords are manually applied to UniProtKB/Swiss-Prot entries by UniProt curators. Further information on the UniProtKB manual annotation process is available at http://www.uniprot.org/faq/45.
When a UniProtKB keyword describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the keyword to an equivalent term in GO. The mapping between UniProtKB keywords and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB keywords is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/uniprotkb_kw2go.
go_ref_id: GO_REF:0000038
title: Gene Ontology annotation based on automatic assignment of UniProtKB keywords in UniProtKB/TrEMBL entries.
authors: UniProt-GOA
year: 2011
external_accession: SGD_REF:S000148670
abstract: Transitive assignments using UniProtKB keywords. The UniProtKB keyword controlled vocabulary has been created and used by the UniProt Knowledgebase (UniProtKB) to supply 10 different categories of information to UniProtKB entries. Further information on the UniProtKB keyword resource can be found at http://www.uniprot.org/docs/keywlist. UniProtKB keywords are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program. Further information on the prediction systems applied by UniProt is available here: http://www.uniprot.org/program/automatic_annotation.
When a UniProtKB keyword describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the keyword to an equivalent term in GO. The mapping between UniProtKB keywords and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB keywords is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/uniprotkb_kw2go.
go_ref_id: GO_REF:0000039
title: Gene Ontology annotation based on the manual assignment of UniProtKB Subcellular Location terms in UniProtKB/Swiss-Prot entries.
authors: UniProt-GOA
year: 2011
external_accession: SGD_REF:S000148671
abstract: Transitive assignment of GO terms based on the UniProtKB Subcellular Location vocabulary. UniProtKB Subcellular Location is a controlled vocabulary used to supply subcellular location information to UniProtKB entries in the SUBCELLULAR LOCATION lines. Terms from this vocabulary are annotated manually to UniProtKB/Swiss-Prot entries. Further information on the UniProtKB manual annotation method is available at http://www.uniprot.org/faq/45.
When a UniProtKB Subcellular Location term describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the term to an equivalent term in GO. The mapping between UniProtKB Subcellular Location terms and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB Subcellular Location terms is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/spsl2go.
go_ref_id: GO_REF:0000040
title: Gene Ontology annotation based on the automatic assignment of UniProtKB Subcellular Location terms in UniProtKB/TrEMBL entries.
authors: UniProt-GOA
year: 2011
external_accession: SGD_REF:S000148672
abstract: Transitive assignment of GO terms based on the UniProtKB Subcellular Location vocabulary. UniProtKB Subcellular Location is a controlled vocabulary used to supply subcellular location information to UniProtKB entries in the SUBCELLULAR LOCATION lines. Terms from this vocabulary are applied automatically to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program. Further information on the UniProtKB automatic annotation program is available at http://www.uniprot.org/faq/45.
When a UniProtKB Subcellular Location term describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the term to an equivalent term in GO. The mapping between UniProtKB Subcellular Location terms and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB Subcellular Location terms is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/spsl2go.
go_ref_id: GO_REF:0000041
title: Gene Ontology annotation based on UniPathway vocabulary mapping.
authors: UniProt-GOA
year: 2012
external_accession: ZFIN:ZDB-PUB-130131-1
abstract: Transitive assignment of GO terms based on the UniPathway pathway vocabulary. UniPathway is a manually curated resource of enzyme-catalyzed and spontaneous chemical reactions. It provides a hierarchical representation of metabolic pathways. Descriptions of the pathway(s) that a particular protein is involved in are included in UniProtKB records.
UniPathway data are cross-linked to existing pathway resources such as KEGG and MetaCyc. Further information on the UniPathway resource is available at http://www.unipathway.org/obiwarehouse/unipathway.
When a UniPathway pathway describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the term to an equivalent term in GO. The mapping between UniPathway terms and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniPathway pathways is maintained by the UniPathway team and is available at http://www.grenoble.prabi.fr/dev/obiwarehouse/download/unipathway/public/unipathway2go.tsv.
go_ref_id: GO_REF:0000042
title: Gene Ontology annotation through association of InterPro records with GO terms, accompanied by conservative changes to GO terms applied by UniProt.
authors: UniProt-GOA
year: 2012
abstract: Transitive assignment of GO terms based on InterPro classification. For any database entry (representing a protein or protein-coding gene) that has been annotated with one or more InterPro domains, The corresponding GO terms are obtained from a translation table of InterPro entries to GO terms (interpro2go) generated manually by the InterPro team at EBI. The mapping file is available at http://www.geneontology.org/external2go/interpro2go.
Please note that the GO term in the annotation assigned with this GO reference has been changed from that originally applied by the UniProtKB keywords 2GO mapping. This change has been carried out by the UniProt group to ensure the GO annotation obeys the GO Consortium’s ontology structure and taxonomic constraints. Further information on the rules used by UniProt to transform specific incorrect IEA annotations is available at http://www.ebi.ac.uk/QuickGO/AnnotationPostProcessing.html.
go_ref_id: GO_REF:0000043
title: Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping, accompanied by conservative changes to GO terms applied by UniProt.
authors: UniProt-GOA
year: 2012
abstract: Transitive assignments using UniProtKB/Swiss-Prot keywords. The UniProtKB keyword controlled vocabulary has been created and used by the UniProt Knowledgebase (UniProtKB) to supply 10 different categories of information to UniProtKB entries. Further information on the UniProtKB keyword resource can be found at http://www.uniprot.org/docs/keywlist.
UniProtKB keywords are assigned to UniProtKB/UniProtKB entries by UniProt curators as part of the UniProtKB manual curation process. In contrast however, UniProtKB keywords are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program.
Further information on the two different UniProt annotation methods is available at http://www.uniprot.org/faq/45 and http://www.uniprot.org/program/automatic_annotation.
When a UniProtKB keyword describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the keyword to an equivalent term in GO. The mapping between UniProtKB keywords and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB keywords is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/uniprotkb_kw2go.
Please note that the GO term in the annotation assigned with this GO reference has been changed from that originally applied by the UniProtKB keywords 2GO mapping. This change has been carried out by the UniProt group to ensure the GO annotation obeys the GO Consortium’s ontology structure and taxonomic constraints. Further information on the rules used by UniProt to transform specific incorrect IEA annotations is available at http://www.ebi.ac.uk/QuickGO/AnnotationPostProcessing.html.
go_ref_id: GO_REF:0000044
title: Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary mapping, accompanied by conservative changes to GO terms applied by UniProt.
authors: UniProt-GOA
year: 2012
abstract: Transitive assignment of GO terms based on the UniProtKB/Swiss-Prot Subcellular Location vocabulary. UniProtKB Subcellular Location is a controlled vocabulary used to supply subcellular location information to UniProtKB entries in the SUBCELLULAR LOCATION lines. Terms from this vocabulary are annotated manually to UniProtKB/Swiss-Prot entries but are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program.
Further information on these two different annotation methods is available at http://www.uniprot.org/faq/45 and http://www.uniprot.org/program/automatic_annotation.
When a UniProtKB Subcellular Location term describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the term to an equivalent term in GO. The mapping between UniProtKB Subcellular Location terms and GO terms is carried out manually. Definitions and hierarchies of the terms in the two resources are compared and the mapping generated will reflect the most correct correspondence. The translation table between GO terms and UniProtKB Subcellular Location term is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/spsl2go.
Please note that the GO term in the annotation assigned with this GO reference has been changed from that originally applied by the UniProtKB Subcellular Location2GO mapping. This change has been carried out by the UniProt group to ensure the GO annotation obeys the GO Consortium’s ontology structure and taxonomic constraints. Further information on the rules used by UniProt to transform specific incorrect IEA annotations is available at http://www.ebi.ac.uk/QuickGO/AnnotationPostProcessing.html.
go_ref_id: GO_REF:0000045
title: Gene Ontology annotation based on UniProtKB/TrEMBL entries keyword mapping, accompanied by conservative changes to GO terms applied by UniProt.
authors: UniProt-GOA
year: 2012
abstract: Transitive assignments using UniProtKB/TrEMBL keywords. The UniProtKB keyword controlled vocabulary has been created and used by the UniProt Knowledgebase (UniProtKB) to supply 10 different categories of information to UniProtKB/TrEMBL entries entries. Further information on the UniProtKB keyword resource can be found at http://www.uniprot.org/docs/keywlist.
UniProtKB keywords are assigned to UniProtKB/UniProtKB entries by UniProt curators as part of the UniProtKB manual curation process. In contrast however, UniProtKB keywords are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program.
Further information on the two different UniProt annotation methods is available at http://www.uniprot.org/faq/45 and http://www.uniprot.org/program/automatic_annotation.
When a UniProtKB keyword describes a concept that is within the scope of the Gene Ontology, it is investigated to determine whether it is appropriate to map the keyword to an equivalent term in GO. The translation table between GO terms and UniProtKB keywords is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/uniprotkb_kw2go.
Please note that the GO term in the annotation assigned with this GO reference has been changed from that originally applied by the UniProtKB keywords 2GO mapping. This change has been carried out by the UniProt group to ensure the GO annotation obeys the GO Consortium’s ontology structure and taxonomic constraints. Further information on the rules used by UniProt to transform specific incorrect IEA annotations is available at http://www.ebi.ac.uk/QuickGO/AnnotationPostProcessing.html.
go_ref_id: GO_REF:0000046
title: Gene Ontology annotation based on UniProtKB/TrEMBL Subcellular Location vocabulary mapping, accompanied by conservative changes to GO terms applied by UniProt.
authors: UniProt-GOA
year: 2012
abstract: Transitive assignment of GO terms based on the UniProtKB/TrEMBL Subcellular Location vocabulary. UniProtKB Subcellular Location is a controlled vocabulary used to supply subcellular location information to UniProtKB entries in the SUBCELLULAR LOCATION lines. Terms from this vocabulary are annotated manually to UniProtKB/Swiss-Prot entries but are automatically assigned to UniProtKB/TrEMBL entries from the underlying nucleic acid databases and/or by the UniProt automatic annotation program.
Further information on these two different annotation methods is available at http://www.uniprot.org/faq/45 and http://www.uniprot.org/program/automatic_annotation.
The translation table between GO terms and UniProtKB Subcellular Location term is maintained by the UniProt-GOA team and available at http://www.geneontology.org/external2go/spsl2go.
Please note that the GO term in the annotation assigned with this GO reference has been changed from that originally applied by the UniProtKB Subcellular Location2GO mapping. This change has been carried out by the UniProt group to ensure the GO annotation obeys the GO Consortium’s ontology structure and taxonomic constraints. Further information on the rules used by UniProt to transform specific incorrect IEA annotations is available at http://www.ebi.ac.uk/QuickGO/AnnotationPostProcessing.html.
go_ref_id: GO_REF:0000047
title: Gene Ontology annotation based on absence of key sequence residues.
authors: GO curators
year: 2012
abstract: This describes a method for supplying a NOT-qualified, IKR-evidenced GO annotation to a gene product, when general sequence homology considerations would suggest a function or location, or a role in a biological process, but where a curator has determined that the absence of key sequence residues, known to be required for an expected activity or location, indicating the gene product is unlikely to be able to carry out the implied activity, involvement in a process or cellular component location. This reference should only be used used when an IKR-evidenced annotation is made based on curator judgement from manually reviewing the sequence of the gene product and where no publication can be found to support the curators conclusion. It is preferable to cite a peer-reviewed publication (such as a PubMed identifier) for IKR-evidenced annotations whenever possible. Curators will have carefully reviewed the sequence of the annotated protein, and established that the key residues known to be required for an expected activity or location are not present. Inclusion of an identifier in the 'with/from' field, that highlights to the user the lacking residues(e.g. an alignment, domain or rule identifier) is absolutely required when annotating to IKR with this GO_REF. Documentation on the GOC website provides more details on the correct use of the IKR evidence code.
go_ref_id: GO_REF:0000048
title: TIGR's Eukaryotic Manual Gene Ontology Assignment Method
authors: TIGR Arabidopsis annotation team
year: 2005
external_accession: TAIR:Communication:501714663
abstract: This describes TIGR curators' interpretation of a combination of evidence. Our internal software tools present us with a great deal of evidence based on domains, sequence similarities, signal sequences, paralogous proteins, etc. The curator interprets the body of evidence to make a decision about a GO assignment when an external reference is not available. The curator places one or more accessions that informed the decision in the "with" field.
go_ref_id: GO_REF:0000049
title: Automatic transfer of experimentally verified manual GO annotation data to fungal orthologs using Ensembl Compara
authors: Ensembl Genomes
year: 2012
abstract: GO terms from a source species are projected onto one or more target species based on gene orthology obtained from the Ensembl Compara system. One to one, one to many and many to many orthologies are used but annotations are only projected between orthologs that have at least a 40% peptide identity to each other. Only GO annotations with an evidence type of IDA, IEP, IGI, IMP or IPI are projected, no annotations with a 'NOT' qualifier are projected and annotations to the GO:0005515 protein binding term are not projected. Projected GO annotations using this technique will receive the evidence code Inferred from Electronic Annotation (IEA). The model organism database identifier of the annotation source will be indicated in the 'With' column of the GOA association file.
go_ref_id: GO_REF:0000050
title: Manual transfer of GO annotation data to genes by curator judgment of sequence model
authors: PomBase curators
year: 2012
abstract: Transitive assignment of GO terms to a gene based on a curator's judgment of its match to a sequence model,such as a Pfam or InterPro entry, that has manually curated GO annotations, mappings to GO terms, or a description from which GO terms can be inferred. A statistical model of a sequence or group of sequences is used to make a prediction about the function of a protein or RNA. Annotations are created when a curator evaluates the results, using criteria that include excluding false positives and ensuring that the annotation is accurate for all matches. Statistical scores (such as e values and cutoff scores) and the functional specificity of the model may also be (but are not always) considered. Annotations resulting from the transfer of GO terms use the 'ISM' evidence code and include an accession for the model from which the annotation was projected in the 'with' field (column 8).
go_ref_id: GO_REF:0000051
title: S. pombe keyword mapping
authors: PomBase curators
year: 2006-2012
abstract: Keywords derived from manually curated primary annotation, e.g. gene product descriptions, are mapped to GO terms. Annotations made by this method have the evidence code Non-traceable Author Statement (NAS), and are filtered from the PomBase annotation files wherever another annotation exists that is equally or more specific, and supported by experimental or manually evaluated comparative evidence (such as ISS and its subtypes). Formerly GOC:pombekw2GO.
go_ref_id: GO_REF:0000052
title: Gene Ontology annotation based on curation of immunofluorescence data
authors: Human Protein Atlas
year: 2013
abstract: GO Cellular Component terms are manually assigned by curators studying high resolution confocal microscopy images of immunohistochemically stained tissue. The methodology uses antibody-based proteomics which combines high-throughput generation of affinity-purified antibodies with protein profiling in a variety of cells and tissues. Further information on the annotation methods can be found at http://www.proteinatlas.org/about/assays+annotation
Annotations are only exported to the GO Consortium if the localizations are supported by literature, according to the following validation grading:
Supportive - Subcellular localization supported by literature.
1) One/multiple localizations supported by literature.
2) Multiple localizations partly supported (at least one) by literature.
3) One/multiple localizations in cytoplasm (i.e. Golgi, mitochondria, ER etc) with literature supporting cytoplasmic localization.
Prior to February 2013, all Human Protein Atlas annotations were referenced by PMID:18029348 (Barbe et al. 2008 Mol. Cell Proteomics. 7:499-508), a paper describing the protein localization pilot study and methodology used by the Human Protein Atlas. However, it has been decided that these annotations are more correctly described by a GO reference.
Resource URL: http://www.proteinatlas.org
Protein subcellular localization images can be viewed on the Human Protein Atlas website, e.g. http://www.proteinatlas.org/ENSG00000175899/summary#ifcelline
go_ref_id: GO_REF:0000053
title: Automatic classification of GO using the ELK reasoner
authors: GO ontology editors
year: 2013
abstract: We use the ELK reasoner as part of an ontology development and release pipeline to automatically construct and check a large portion of the GO graph. The editors version of the GO (gene_ontology_write.obo) contains additional metadata, including provenance of graph links. Every week, the GO pipeline executes a process which first removes all links tagged as "is_inferred". The reasoner then generates a list of inferred links which are automatically added to the ontology with the "is_inferred" tag set. The pipeline generates a report describing which links have changed as a part of this process.
go_ref_id: GO_REF:0000054
title: Gene Ontology annotation based on curation of intracellular localizations of expressed fusion proteins in living cells.
authors: LIFEdb
year: 2013
abstract: LIFEdb is a database that was created to manage the experimental data produced by the German Cancer Research Institute (DKFZ) and its collaborators, from work on cDNAs contained in the German cDNA Consortium collection.
A novel cloning technology was used to rapidly generate N- and C-terminal green fluorescent protein fusions of cDNAs to examine the intracellular localizations of expressed fusion proteins in living cells. GO Cellular Component terms are manually assigned by curators studying fluorescence microscope images of cells labelled with GFP-fused cDNAs. Protein coding regions of novel full length cDNAs are tagged with the coding sequence of the green fluorescent protein, the fusion proteins are then expressed and analyzed for their subcellular localization.
Prior to February 2013, all LIFEdb annotations were referenced by PMID: 11256614 (Simpson et al. 2000 EMBO Rep. 1:287-292), a paper describing the protein subcellular localization pilot study and methodology used by LIFEdb. However, it has been decided that these annotations are more correctly described by a GO reference.
Resource URL: http://www.dkfz.de/en/mga/Groups/LIFEdb-Database.html
Protein subcellular localization images can be viewed on the LIFEdb website, http://www.dkfz.de/gpcf/lifedb.php
go_ref_id: GO_REF:0000055
title: Gene Ontology Cellular Component annotation based on cellular fractionation.
authors: AgBase biocurators
year: 2013
abstract: Assignment of GO Cellular Component terms based on experimental evidence of cellular localization from Differential Detergent Fractionation (DDF). Cellular proteins are differentially fractionated and detected using mass spectrometry. Subcellular localization is based upon identification of proteins in different fractions and analysis of their predicted transmembrane domains. Proteins are assigned GO CC based upon a manually reviewed DDF2GO mapping file.
go_ref_id: GO_REF:0000056
title: Taxon constraints to detect inconsistencies in annotation and ontology structure.
authors: The GO Consortium
year: 2013
abstract: GO is intended to cover the full range of species, therefore GO terms are defined to be taxon neutral, avoiding reliance on taxon information for full definition of the given process, function, or component. For certain terms, however, there is obvious implicit taxon specificity, such that the term should only be used to categorize gene products from particular species. Taxon specificity of GO terms is captured using relationships such as "only_in_taxon" and "never_in_taxon". All taxon constraints are inherited by sub-types and parts of the GO term they are applied to. Taxon constraints are used to prevent inappropriate annotations from being made by curators as well as to identify pre-existing annotations that violate the taxon constraints. Errors in annotations are automatically detected by looking for inconsistencies between the taxonomic origin of the annotated gene products and the implicit taxon specificity of the GO terms. The inconsistencies are passed on to curators for correction, in some cases the constraints need to be tightened or relaxed or the structure of the ontology needs to be adjusted. The taxon constraints are further described in this publication: Deegan, Dimmer and Mungall. BMC Bionformatics (2010) Formalization of taxon-based constraints to detect inconsistencies in annotaiton and ontology development. (PMID:20973947).
go_ref_id: GO_REF:0000057
title: Gene Ontology annotations inferred by curators' judgment using experimental data and prior knowledge of apoptotic mechanisms.
authors: GO Apoptosis Working Group
year: 2013
abstract: This GO_REF is meant as a subtype of GO_REF:0000036, and its use is limited to annotation of gene products involved in apoptotic cell death. The Gene Ontology Consortium uses the IC (Inferred by Curator) evidence code when an annotation cannot be supported by any direct evidence, but can be inferred by GO annotations that have been annotated to the same gene/gene product identifier in conjunction with the curator's knowledge of biology (supporting GO annotations must not be IC-evidenced). In many cases an IC-evidenced annotation simply applies the same reference that was used in the supporting GO annotation. The use of IC evidence code in an annotation with reference GO_REF:0000057 signifies a curator inferred the GO term based on evidence from multiple sources of evidence/GO annotations. The 'with/from' field in these annotations will therefore supply more than one GO identifier, obtained from the set of supporting GO annotations assigned to the same gene/gene product identifier which cite publicly-available references. In inferring a specific apoptotic mechanism, the curator may refer to the following publications: PMID:21760595, PMID:19373242, PMID:21415859.