GO Internal

Ontology meeting 2018-10-15

GO wiki (new pages) - Fri, 10/12/2018 - 13:26

David: Created page with "= Conference Line = *Zoom: https://stanford.zoom.us/j/828418143 = Agenda = ==Meetings== ===Montreal GOC Meeting=== *http://wiki.geneontology.org/index.php/2018_Montreal_GO..."

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =

==Meetings==

===Montreal GOC Meeting===
*http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda

===Ontology Developers' workshop===
[[Fall 2018 ontology editors meeting - potential agenda topics]]

== Editors Discussion ==


==GH project link==
https://github.com/geneontology/go-ontology/projects/1

= Minutes =
*On call:


[[Category: Ontology]]
[[Category: Meetings]] David
Categories: GO Internal

Ontology meeting 2018-10-08

GO wiki (new pages) - Fri, 10/12/2018 - 13:24

David: Created page with "= Conference Line = *Zoom: https://stanford.zoom.us/j/828418143 = Agenda = ==Meetings== ===Montreal GOC Meeting=== *http://wiki.geneontology.org/index.php/2018_Montreal_GO..."

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =

==Meetings==

===Montreal GOC Meeting===
*http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda

===Ontology Developers' workshop===
[[Fall 2018 ontology editors meeting - potential agenda topics]]

== Editors Discussion ==


==GH project link==
https://github.com/geneontology/go-ontology/projects/1

= Minutes =
*On call:


[[Category: Ontology]]
[[Category: Meetings]] David
Categories: GO Internal

Annotation Conf. Call 2018-10-09

GO wiki (new pages) - Mon, 10/08/2018 - 07:40

Vanaukenk: Created page with "= Meeting URL = *https://stanford.zoom.us/j/976175422 == GO Conference Calls == *Tuesdays at 8am PDT **1st Tuesday: Alliance Biological Function **2nd Tuesday: GO Consortium..."

= Meeting URL =
*https://stanford.zoom.us/j/976175422

== GO Conference Calls ==
*Tuesdays at 8am PDT
**1st Tuesday: Alliance Biological Function
**2nd Tuesday: GO Consortium
**3rd Tuesday: Alliance Biological Function/GO-CAM Working Group
**4th Tuesday: GO-CAM Working Group
**5th Tuesday: ad hoc, as needed
*One Zoom URL for all - https://stanford.zoom.us/j/976175422

= Agenda =

== Montreal Meeting, October Wednesday, 17th - Friday, 19th ==
*[http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Logistics Logistics]
*[http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda Agenda]

== Molecular Function Refactor ==
=== Distal/Proximal Promoter Binding Terms ===

== Pipeline ==
http://wiki.geneontology.org/index.php/Release_Pipeline
===Annotation File Formats===
====GPAD/GPI for Internal Consortium Annotation Exchange====
*At NYC meeting in May, proposal was made that we would eventually switch to using GPAD/GPI file format for data exchange within the consortium
*Reasons to switch to GPAD/GPI:
**Less redundacy
**Ability to use expanded set of annotation relations (gp2term)
**Ability to use unique ECO identifiers instead of 3-letter GO codes
**Annotation properties field - allows for capturing annotation metadata
*Many groups consume GPAD (any external group using Protein2GO; groups consuming derived GPAD from GO-CAM models)
*Not all groups produce a GPAD
*Any barriers to moving to GPAD/GPI?
*Timeline - by start of new year (2019)?

= Minutes =
*On call:

[[Category:Annotation Working Group]] Vanaukenk
Categories: GO Internal

Species-Specific Terms

GO wiki (new pages) - Fri, 10/05/2018 - 13:46

Pascale:

From http://geneontology.org/page/species-specific-terms

Handling Species Specificity
One of the biggest problems for a controlled vocabulary is choosing term names and definitions that will unambiguously identify a component, function or process. If a word or phrase refers to different entities or processes depending upon the organism, subclasses are created based on differentiating characteristics, such structure, physical composition or order of subprocesses, rather than by identifying the taxonomic group in which the component or process occurs.

A classic example of this is cell wall, which refers to the rigid or semi-rigid structure surrounding the cell membrane in plants, fungi and some prokaryotes. Its composition differs between these three sets of organisms, though, so to allow greater specificity of annotation, the ontology describes three subclasses of cell wall differentiated by their physical characteristics:

cell wall
[i] fungal-type cell wall
[i] peptidoglycan-based cell wall
[i] plant-type cell wall

The definitions for these terms are as follows:

cell wall
The rigid or semi-rigid envelope lying outside the cell membrane of plant, fungal, and most prokaryotic cells, maintaining their shape and protecting them from osmotic lysis. In plants it is made of cellulose and, often, lignin; in fungi it is composed largely of polysaccharides; in bacteria it is composed of peptidoglycan.
plant-type cell wall
A more or less rigid structure lying outside the cell membrane of a cell and composed of cellulose and pectin and other organic and inorganic substances.
peptidoglycan-based cell wall
A protective structure outside the cytoplasmic membrane composed of peptidoglycan (also known as murein), a molecule made up of a glycan (sugar) backbone of repetitively alternating N-acetylglucosamine and N-acetylmuramic acid with short, attached, cross-linked peptide chains containing unusual amino acids. An example of this component is found in Escherichia coli. .
fungal-type cell wall
A rigid yet dynamic structure surrounding the plasma membrane that affords protection from stresses and contributes to cell morphogenesis, consisting of extensively cross-linked glycoproteins and carbohydrates. The glycoproteins may be modified with N- or O-linked carbohydrates, or glycosylphosphatidylinositol (GPI) anchors; the polysaccharides are primarily branched glucans, including beta-linked and alpha-linked glucans, and may also include chitin and other carbohydrate polymers, but not cellulose or pectin. Enzymes involved in cell wall biosynthesis are also found in the cell wall. Note that some forms of fungi develop a capsule outside of the cell wall under certain circumstances; this is considered a separate structure.

Note that the definitions use physical and structural characteristics to differentiate between cell wall types.
Terms with Taxon Restrictions
The Gene Ontology also provides an OBO format file containing species-specific terms and the taxa for which they are or are not appropriate. The file can be viewed online , downloaded by FTP , or found in the GO SVN repository at go/quality_control/annotation_checks/taxon_checks/taxon_go_triggers.obo.
Sensu Terms
The use of sensu, ‘in the sense of’, to designate a certain interpretation of a word or phrase has been deprecated.



[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

Manager Call 2018-10-04

GO wiki (new pages) - Tue, 10/02/2018 - 18:43

Vanaukenk: Vanaukenk moved page Manager Call 2018-10-04 to Manager Call 2018-10-03: Changed to have date coincide with date of the actual call

= Meeting URL =

*https://stanford.zoom.us/j/754529609

= Agenda =

== Montreal GOC Meeting ==
*Review [http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda agenda]



[[Category:GO Consortium]] [[Category:GO Managers Meetings ]] Vanaukenk
Categories: GO Internal

QCQA call 2018-10-01

GO wiki (new pages) - Tue, 10/02/2018 - 07:03

Pascale: Created page with " Category:Quality Control"






[[Category:Quality Control]] Pascale
Categories: GO Internal

Guidelines for checkpoints

GO wiki (new pages) - Mon, 10/01/2018 - 12:05

Pascale:

Gene products should not be annotated to checkpoint terms based on mutations that result in activation of the checkpoint (i.e, cause cell cycle arrest). Such mutations cause a problem which is detected by the checkpoint sensor, and cell cycle arrest indicates that the checkpoint is functioning normally. This does not represent positive regulation of the checkpoint since the mutated gene is not necessarily part of the checkpoint in a normal cell. Gene products can be annotated to checkpoint terms when a mutation results in inactivation of the checkpoint. This includes, recruiting the checkpoint signalling components, part of the sensor machinery detecting the damage, or any component of the signalling pathway to the effector.
Genes which are involved in correcting the problems detected by the checkpoint should also not be annotated to the checkpoint terms, because they act downstream of the checkpoint. These genes should instead be annotated to "response to checkpoint x" terms, which are in a separate branch of the ontology.


[[Category:Annotation]]
[[Category:Annotation Guidelines]] Pascale
Categories: GO Internal

Ontology meeting 2018-10-01

GO wiki (new pages) - Thu, 09/27/2018 - 01:38

Pascale: Created page with "= Conference Line = *Zoom: https://stanford.zoom.us/j/828418143 = Agenda = == Editors Discussion == ===Regulation inference chain=== For Molecular Function, the 'regulat..."

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =


== Editors Discussion ==


===Regulation inference chain===
For Molecular Function, the 'regulates' inference chain is a problem.

--- DNA binding transcription factor activity
is_a (has_part?) DNA binding

if we have regulation of DNA binding transcription factor activity
we don't want to say regulation of DNA binding
(the same would be true of DNA binding transcription factor '''regulator''' activity).

* This is true of all MF except for binding.
* Has_part (or another relation) could work if the regulates inference chains stops over it



[[Category: Ontology]]
[[Category: Meetings]] Pascale
Categories: GO Internal

GO-CAM Working Group Call 2018-09-25

GO wiki (new pages) - Mon, 09/24/2018 - 06:44

Vanaukenk: /* Relations between MF and Input(s) */

= Meeting URL =
https://stanford.zoom.us/j/976175422

=Agenda=

== Evidence Codes in Noctua ==
*Decision: leave evidence codes as is in Noctua
*Will we continue to use full ECO
*Autocomplete searches perform best when searching on first word in term label, e.g. 'direct' or 'mutant'

== Relations between MF and Input(s) ==
*'has input' vs 'has direct input'
*In GO-CAM models, we are using 'has input' for capturing the objects of MFs
*This is different from conventional annotation where curators sometimes made a distinction between 'has direct input' and 'has input'
*Proposal: replace 'has direct input' with 'has inpu't; obsolete 'has direct input'
*Need to review has_input annotations to remove any extensions that are inconsistent with GO-CAM usage, i.e. an indirect or unknown proximity for an input
*Seth retrieved, as of 2018-07-31, [https://drive.google.com/drive/folders/1TlwrEM2KjAzxIYiCGg0_oicMOYfhiGou all MF annotations] that use has_input in annotation extensions.
*Use cases to discuss:
**Enzyme-substrate
***Enzymatic modification of a substrate
*Relations Ontology working group (broader than just GO) that is also considering [https://github.com/oborel/obo-relations/issues/244 how to model participants in an MF] and [https://github.com/oborel/obo-relations/issues/171 documentation of has_input and child relations]

= Minutes =
*On call:

[[Category: Annotation Working Group]] Vanaukenk
Categories: GO Internal

Ontology meeting 2018-09-24

GO wiki (new pages) - Mon, 09/24/2018 - 05:01

Pascale: /* Editors Discussion */

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =


== Editors Discussion ==
* Question for Chris: Can we use Sequence Ontology in annotation extensions / has input for GO CAM models ? Needed to continue work on transcription, for eg: https://github.com/geneontology/go-ontology/issues/16152

==Meetings==

===Montreal GOC Meeting===
*Add name to attendees list if you're going
*http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda

===Ontology Developers' workshop===
The meeting will be in Geneva December (8)9-13
* When to book flights and hotel?
====Topics====
* Filling in missing GO-Rhea-Reactome xrefs
** moving from reactions to biochemical pathways
** implementation of a method to keep in synch with all three resources
** GO-CAM templates for pathways??
* GH ticket work and mini-project planning
** Dealing with and migrating binding terms
** Dealing with cellular processes (https://github.com/geneontology/go-ontology/issues/12849)
** Taxon constaints- How do we handle them going forward?
* Attendees
** David H (Can make it)
** Kimberly (Can make it)
** Karen (can NOT make it)
** Harold (Can make it)
** Pascale (Can make it)
** Barbara (Maybe make it)
** Alan B (Rhea, can make any day)
** Anne (Rhea, prefers Monday or Tuesday)
** Chris
** Peter D (Can make it)
** Ben?
** Jim?
** Paul T (Can make it)

==GH project link==
https://github.com/geneontology/go-ontology/projects/1

= Minutes =
*On call:



[[Category: Ontology]]
[[Category: Meetings]] David
Categories: GO Internal

2018 Montreal GOC users workshop

GO wiki (new pages) - Wed, 09/19/2018 - 12:02

Pascale:

=Gene Ontology workshop 2018 =

The Gene Ontology (GO) is one of the most widely used bioinformatics resource in the world.
Because of the staggering complexity of biological systems and the ever-increasing size of datasets to analyze, biomedical research is becoming increasingly dependent on knowledge stored in computable form.
The GO project provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products.

The GO Consortium develops an up-to-date, comprehensive, computational model of biological systems, from the molecular level to larger pathways, cellular and organism-level systems.
GO defines an ontology is composed of classes (or concepts) that can be used to describe gene function, and relationships between these concepts.
The highly structure of GO makes it ideally amenable to computational analysis.

==Who should attend==
The meeting aims to bring together students and researchers who:
* use GO in any aspect of their work
* want to expand the knowledge on GO
* want to acquire hands-on experience with the GO resource
* encourage exchanges on problems and solutions, and foster collaborations
The workshop will provide an opportunity to discuss projects with other users as well as with developers of the GO.

==Schedule==
The morning session will be a training session, with presentations on the GO resources and on specific tools by GO Consortium members. The afternoon session will be dedicated to oral and poster presentations from participants.

==Abstract submission==
We are inviting interested participants to submit abstracts on any project related to GO or using GO directly or indirectly, for example in data analysis applications, data display, text mining, etc.

==When==
October 16th, 2018, 9:00 to 18:00

==Where==
Pavillion JF Kennedy (PK), University of Quebec in Montreal (UQAM), Montreal, Canada

==Registration link==
http://eventbrite.com/e/gene-ontology-workshop-2018-tickets-49865293435

==Abstract submission==
To submit an abstract: https://easychair.org/cfp/GOW2018

==Important dates==
Sept 28: Abstract submission deadline
Oct 4: Acceptance notification
Oct 16: GO workshop at UQAM

==Organizing committee==
===Chairs===
* Pascale Gaudet, GO Central/SIB Swiss Institute of Bioinformatics
* Laurent-Phillipe Albou, University of Southern California

===Scientific committee===
* Paul Thomas, University of Southern California
* Chris Mungall, Berkeley University
* Judith Blake, Jackson Laboratories
* Paul Sternberg, CalTech
* Michael Cherry, Stanford University
* Susanna Lewis, Berkeley University

===Local chairs===
* Marie-Jean Meurs, UQAM
* Hayda Almeida, UQAM

[[Category:Meetings]] Pascale
Categories: GO Internal

Manager Call 2018-09-20

GO wiki (new pages) - Tue, 09/18/2018 - 06:20

David:

= Meeting URL =

*https://stanford.zoom.us/j/754529609

= Agenda =

== Montreal GOC Meeting ==
*Review [http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda agenda]

== New Project management strategy ==
* ACTION: Each team (product owner / project lead) needs to prioritize the tickets and decide which will be done for the next Milestone (October GO meeting).

== Feedback form update ==
https://github.com/geneontology/go-site/issues/750
What's the time line for deployment? TBD, but pretty much instantaneous when done. (Seth)

== New GAF Submissions ==
SuziA: Update on documentation for new groups to submit data: https://github.com/geneontology/go-annotation/issues/2067

== Ontology Editors' Meeting ==
Geneva- Week of December 10th

The meeting will focus on (Depending on who can attend):
* creating good logical definitions in both a general context and in the context of the GO-Reactome-Rhea alignment
** Filling in missing GO-Rhea xrefs
** moving from reactions to biochemical pathways
** implementation of a method to keep in synch with all three resources
** GO-CAM templates for pathways??
* potential implementation of Design Patterns and the revival of some type of TermGenie ability
* GH ticket work and mini-project planning
** Dealing with and migrating binding terms
** Dealing with cellular processes (https://github.com/geneontology/go-ontology/issues/12849)
* Attendees
** David H (Can make it)
** Kimberly (Can make it)
** Karen
** Harold (Can make it)
** Pascale (Can make it)
** Barbara (Maybe make it)
** Alan B (Rhea, can make any day)
** Anne (Rhea, prefers Monday or Tuesday)
** Chris
** Peter D(Can make it)
** Ben?
** Jim?
Pascale: needed to follow up with Paul regarding funds to hold these

== Job descriptions for each role ==
Pascale and Kimberly stared to create job descriptions for all managers roles:

https://drive.google.com/drive/folders/1F7e2D7T4hleIq8VaH7YW60D9wxQnRFRV

Every manager should add what they believe are their tasks. And then we discuss it here.

* Please have a look.

== Noctua as a teaching tool?==

== Fate of pre-composed terms ==
* explicit binding terms
** X-binding terms will be decomposed as binding and has_input X.
*** Annotation
**** Moving forward
***** Groups that have the ability to use AEs in their own tools will make structured annotations.
***** Groups that do not have the ability to use AEs in their own tools will use Noctua.
**** Existing Annotations
***** On an predetermined date, all existing annotations to pre-composed x binding terms will be converted to binding and an AE.
***** For terms that have a value in the 'with' field with an IPI evidence code, those will be converted to IDA and the 'with' value will be moved over to the AE.
***** For terms that do not have an IPI and a value in the 'with' field, the terms will be converted based on the term. For generic gene products we will use a PRO identifier. For gene families we will use a Panther ID???
* compound terms between two existing GO classes
** Compound terms will be decomposed into their elemental terms with appropriate relationships.
*** Annotation
**** Moving Forward
***** Groups that have the ability to use AEs in their own tools will make structured annotations or co annotate.
***** Groups that do not have the ability to use AEs in their own tools will use Noctua.
**** Existing Annotations
***** Existing annotations will be converted to multiple simple conventional annotations (we need to derive rules for this) OR
***** Existing annotations will be converted to simple GO-CAM models from which annotations will be derived. OR
***** Existing annotations will be converted to conventional annotation with AEs, capturing the context of the original term.
* Considerations
** With respect to groups that have been using these terms and annotations, can we provide them with tools that will allow them to continue to conduct their analyses?

== NAR paper ==

https://docs.google.com/document/d/13rV4qIHcTJt7lTv_N5_6AbG3OlSU0jODjMU7-jP-e8c/edit#

Highlights:
* Substantial improvements to parts of the ontology, particularly the molecular function branch
* Addition of qualifiers for biological process annotations, allowing users for the first time to distinguish when a gene product functions as an integral part of a process, versus when it has a more indirect causal effect on a process
*New evidence codes that allow users to filter out annotations from high-throughput experiments
*A publicly accessible repository of richer, connected GO annotations (we call GO-CAM, for GO-Causal Activity Models), that provide richer annotations and bridge GO annotations and pathway representations
* New production pipeline and QC effort, and examples of improvements

= Minutes =




[[Category:GO Consortium]] [[Category:GO Managers Meetings ]] Vanaukenk
Categories: GO Internal

GO-CAM Working Group Call 2018-09-18

GO wiki (new pages) - Mon, 09/17/2018 - 14:31

Vanaukenk: /* Evidence Codes in Noctua */

= Meeting URL =
https://stanford.zoom.us/j/976175422

=Agenda=

== Evidence Codes in Noctua ==
*Currently, Noctua allows for use of the full Evidence and Conclusion Ontology
*GO has typically only used a subset of ECO codes, e.g. IDA
**ECO:0000314 direct assay evidence used in manual assertion
**has_related_synonym 'IDA'
**database cross reference GOECO:IDA (inferred from direct assay)

== Relations between MF and Input(s) ==
*has_input vs has_direct_input
*Proposal: replace has_direct_input with has_input; obsolete has_direct_input
*Need to review has_input annotations to remove any extensions that are inconsistent with GO-CAM usage, i.e. an indirect or unknown proximity for an input
*Seth retrieved, as of 2018-07-31, [https://drive.google.com/drive/folders/1TlwrEM2KjAzxIYiCGg0_oicMOYfhiGou all MF annotations] that use has_input in annotation extensions.
**Initial review:
***used to capture a regulatory effect, e.g. protein kinase activator activity, when it was not known whether the effect was direct or indirect (e.g. expression of protein or complex X increases the activity of Y)
***used to capture a regulatory subunit whose presence is necessary for the activity to occur (e.g. cyclin-dependent protein kinase)
***used to capture an enzymatic activity when it was not known if the effect on a substrate was direct or indirect (e.g. caspase-dependent but not known if it was the caspase mutated)
***used to capture an enzymatic substrate where there wasn't also a direct binding assay in the paper (e.g. testing possible chemical substrates for glucuronysyltransferase activity)
***used to capture metal ion-dependence of protein binding (e.g. Ca2+-dependent protein binding)
***used (correctly) to capture the physiologically relevant input in a binding reaction (i.e. cross-species experiment where with/from captures experimental binding partner and AE the relevant binding partner)
*Relations Ontology working group (broader than just GO) that is also considering [https://github.com/oborel/obo-relations/issues/244 how to model participants in an MF] and [https://github.com/oborel/obo-relations/issues/171 documentation of has_input and child relations]


[[Category: Annotation Working Group]] Vanaukenk
Categories: GO Internal

GO Annotation Standard Operating Procedures

GO wiki (new pages) - Mon, 09/17/2018 - 13:05

Pascale:

From GO Annotation Standard Operating Procedures
TO BE REVIEWED

<p>This page documents some of the standard operating procedures used by members of the GO Consortium during the process of annotation. Please note that these do not represent the best, or only ways to carry out annotation, but are simply a guide to how some groups currently annotate. More information on annotation can be found in the <a href="/page/go-annotation-policies">GO annotation guide</a> and in the <a href="/page/go-annotation-conventions">GO annotation conventions</a>; if you have any questions on the guidelines given below, please contact the <a href="/form/contact-go">GO helpdesk</a>.</p>
<ul>
<li>
<h4><a href = "#tell">Tell us about your requirements</a></h4>
</li>
<ul><li>
<a href = "#small">I represent a small lab working on a biological area of research</a>
</li>
<li>
<a href = "#est">I have a set of ESTs and I would like to attach annotations</a>
</li>
<li>
<a href = "#genome">I have a genome sequence</a>
</li>
<li>
<a href = "#micro">I have a microarray data set</a>
</li>
<li>
<a href = "#peptide">I have a peptide sequence</a>
</li></ul>
<li>
<h4><a href = "#elect">Electronic annotation</a></h4>
</li>
<ul>
<li>
<a href = "#interpro">InterPro Mapping</a>
</li>
<li>
<a href = "#keyword">Keyword Mapping</a>
</li>
<li>
<a href = "#hamap">HAMAP</a>
</li>
<li>
<a href = "#ec">Enzyme Commission</a>
</li>
<li>
<a href = "#other">Other mappings</a>
</li>
<li>
<a href = "#blast">BLAST</a>
</li>
<li>
<a href = "#none">No similar sequences manually annotated?</a>
</li></ul>
<li>
<h4><a href = "#paper">Literature Annotation</a></h4>
</li>
<li>
<h4><a href = "#sequence">Sequence-based annotation</a></h4>
</li>
<ul>
<li>
<a href ="#general">General principles for sequence IDs</a>
</li>
<li>
<a href = "#flow">Annotation workflow</a>
</li>
</ul>

<h3 id="tell">Tell us about your requirements</h3>

<ol>
<li><h5 id="small">I represent a small lab working on a biological area of research</h5>

<p>In this case, perhaps you have a list of your favorite genes and you wish to annotate them. You have a range of choices depending on what you are trying to achieve.</p>
<p>Please see the range of options below and choose the one that suits you best.</p>

<li><h5 id="est">I have a set of ESTs and I would like to attach annotations</h5>

<p>If you would ultimately like to send the annotations to the consortium for distribution, it is crucial that your EST clusters should maintain the same identifiers over each round of re-clustering. One way to do this is to identify clusters based on one EST that is chosen for each cluster. There may be other good ways that we have not heard about. </p>

<p>Many EST clusters have stable identifiers with version updates (e.g. the UniGene database). These stable identifiers can be used for making GO associations.</p>

<p>Once you have your clusters and stable identifiers follow the <a href = "#elect">IEA directions</a> for making electronic annotation.</p>

<p>You could also run BlastX, or run gene prediction programs and then BlastP. Running InterPro on the sequences will find the longest open reading frame.</p>

<li><h5 id="genome">I have a genome sequence</h5>

<p>You will already have assembled the genome sequence and made gene calls. Once you have the cds sequences or predicted protein sequences then you can follow the instructions on <a href = "#elect">IEA annotation</a> and/or <a href = "#paper">Literature annotation</a>. Please see below.</p>

<li><h5 id="micro">I have a microarray data set</h5>

<p>The action you can take depends somewhat on your sequences.</p>
<ul>
<li>
Are they cDNAs or oligos?
</li>
<li>
Do they have identifiers? Which kind?
</li>
<li>
How do they relate to the genes? If you know which sequence relates to which characterised gene then it will be easy to transfer annotations over.
</li>
<li>
Do the genes have GO annotations? If they do not have full GO annotation from literature then you may like to apply for funding to annotate the genes yourself, or write to your Model Organism Database to ask them to do so.
</li>
<li>
Can you get more up to date annotations than those provided with your tool? It may be that you are seeing only the annotations that come from your proprietary microarray software provider. It is a good idea to ask how often they update their annotations and ontology structure as these change from day to day, and there may be many more annotations available than you are seeing.
</li>
</ul>

</p>It is most likely that you will want to use mainly <a href = "#elect">electronic annotations</a>, supplemented with some <a href = "#paper">literature annotation</a> for those sequences that are not yet fully annotated.</p>

<li><h5 id="peptide">I have a peptide sequence</h5>
<ul>
<li>
Do you know what gene is it?
</li>
<li>
Can you map it to a UniProtKB or MOD identifier?
</li>
<li>
Does this identifier have GO annotation?
</li>
</ul>

<p>If it doesn't, you can request that it be annotated (it helps if you provide literature associated with this gene product). If you cannot map it to a UniProtKB or MOD identifier, then you can make your own GO annotation by any of the electronic or <a href = "#sequence">ISS methods</a> illustrated below.</p>
</ol>

<h3 id="elect">Electronic annotation</h3>

<p>Electronic annotation is very quick and produces large amounts of less detailed annotation very quickly. Electronic annotations are rarely wrong, but tend to be less detailed. For example, electronic annotation is likely to tell you which of your genes are transcription factors but unlikely to tell you in great detail what process the gene controls. You may like to use this method if you have a new genome sequence to annotate, or a microarray with many thousands of sequences.</p>

<img src="/sites/default/files/public/diag-iea-overview.png" width="594" height="157" alt="diag-iea-overview.png" />

<p>This diagram illustrates some of the main ways of making electronic annotation. It should be read from the top down. The diagram shows sequences from UniProt having electronic GO annotation assigned by several computational methods. All of these methods involve use of mapping files. To learn more visit the guide to <a href="/page/download-mappings">information on mappings of GO to other classification systems</a>.</p>

<ul>
<li id="interpro"><strong>InterPro Mapping</strong>
<p>In the case of the Interpro mapping it is possible to assign electronic GO annotation to your sequences based on InterPro domains and a number of other criteria. For example if your sequence has a DNA binding domain then it makes sense to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see the information on InterProScan.</p>

<li id="keyword"><strong>UniProt Keyword Mapping</strong>
<p>This part of the diagram illustrates how sequences already categorized using the UniProt keyword mapping can have GO annotation automatically applied by transferring via the keyword mapping file.</p>

<li id="hamap"><strong>HAMAP</strong>
<p>HAMAP is a system that categorizes sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.</p>

<li id="ec"><strong>Enzyme Commission</strong>
<p>The Enzyme Commission database categories enzymes by the reactions they catalyze. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.</p>

<li id="other"><strong>Other mappings</strong>
<p>These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. <a href="/page/download-mappings">Many other mappings are available</a>, and if there is not a mapping file between GO and your current annotation system, we can assist you in making one.</p>

<li id="blast"><strong>BLAST</strong>
<p>You can also make electronic annotations by BLASTing your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.</p>

<li id="none"><strong>No similar sequences manually annotated?</strong>
<p>If your sequence is similar to other sequences that have been well characterized but not yet annotated from the literature, then one option is to carry out the <a href = "#paper">literature annotation</a> yourself and then transfer by <a href = "#elect">electronic methods</a>.</p>
</ul>

<h3 id="paper">Literature Annotation</h3>

<p>Literature annotation involves capturing published information about the exact function of a gene product as a GO annotations. To do this you must read the publications about the gene and write down all the information. This annotation is time-consuming but produces very high quality, species-specific annotation, and brings the information about the gene product into a format in which it can be used in high-throughput experiments. This is an extremely worthwhile process in the long term. It may be best carried out by people who know the function of the gene product, and the associated biology, in great detail; for example experimental scientists who are familiar with the published literature. If you are doing this, then you may like to write and suggest modifications to the ontology structure as well.</p>

<p>Below is a schematic diagram giving an introduction to the steps involved in literature-based GO annotation. If you are interested in carrying out literature-based annotation you can receive full training in the process by attending a GO annotation camp or by working with an individual GO Consortium annotation mentor.</p>

<img src="/sites/default/files/public/diag-literature-annot.png" width="720" height="540" alt="Literature Based Annotation" />
<p><a href="/sites/default/files/public/diag-literature-annot.png" title="Literature Based Annotation">View a larger version.</a></p>

<h3 id="sequence">Sequence-based annotation</h3>
<strong id="general">General principles for sequence IDs</strong>
<ul>
<li>
You must have stable identifiers for your objects.
</li>
<li>
You must provide information on what the object is, e.g. a protein, nucleotide, EST, <i>etc.</i>. It doesn't matter if a nucleotide sequence is a gene, a genome, or an EST as long as it can be identified as such.
</li>
<li>
If a sequence identifier has become obsolete, there must be a mechanism in place for tracking down the replacement.
</li>
<li>
Your database must have an internal rule that object identifiers are never reused.
</li>
</ul>

<strong id="flow">Annotation workflow</strong>

<p>The following diagram shows the standard operating procedure for sequence-based (<a href="/page/iss-inferred-sequence-or-structural-similarity/">ISS evidence code</a>) annotation used in the past at The Institute for Genomic Research (now <a href="http://www.jcvi.org/" target="blank">JCVI</a>).</p>

<img src="/sites/default/files/public/diag-tigr-annotation.png" width="623" height="987" alt="ISS Evidence Code at TIGR" />
<p><a href="/sites/default/files/public/diag-tigr-annotation.png" title="ISS Evidence Code at TIGR">View a larger version.</a></p>

[[Category: Annotation]] Pascale
Categories: GO Internal

Annotation conventions

GO wiki (new pages) - Mon, 09/17/2018 - 13:00

Pascale: Created page with " From http://geneontology.org/page/go-annotation-conventions#not TO BE REVIEWED <h4>Annotation Conventions</h4> This page contains guidelines which apply to all annotati..."

From http://geneontology.org/page/go-annotation-conventions#not
TO BE REVIEWED

<h4>Annotation Conventions</h4>
This page contains guidelines which apply to all annotation methods and are particularly useful for manual literature-based annotation. More information on annotation can be found in the introduction to <a href="/page/go-annotation-policies">GO Annotation Policies and Guidelines</a> and the <a href="/page/go-annotation-standard-operating-procedures">GO Annotation Standard Operating Procedures</a>.
See also the <a href="http://wiki.geneontology.org/index.php/Consortium_Meetings#annot">Annotation Camp minutes</a> for additional information, including examples, on annotation practices and recommendations.
<p>

<ul>
<li> <a href = "#general">General Recommendations </a> </li>
<li> <a href ="#dbobj">Database Objects </a></li>
<li> <a href ="#refs">References and Evidence </a></li>
<li><a href = "#qual"> Using the Qualifier column</a> </li>
<ul>
<li> <a href ="#not">NOT</a> </li>
<li> <a href ="#colocal">colocalizes_with </a></li>
<li> <a href = "#contri">contributes_to</a> </li>
<li> <a href = "#examples">Examples </a></li>
</ul>
<li> <a href ="#interactions">Annotating gene products that interact with other organisms</a> </li>
<ul>
<li> <a href = "#nomenclature">Nomenclature Conventions </a></li>
<li> <a href ="#newterms">Requesting new terms in the multi-organism process node</a> </li>
<li> <a href ="#procOther">Example: Performing a process with another organism </a></li>
<li> <a href ="#more">Example: Performing a process in more than one species </a></li>
<li> <a href ="#regulating">Example: Regulating a process in another organism </a></li>
</ul>
<li> <a href ="#downstream">Downstream Process guidelines</a> </li>
<ul>
<li> <a href = "#specificTerms">Requesting more specific terms for downstream processes </a></li>
<li> <a href ="#core">Annotating downstream processes for gene products involved in core or specific processes </a></li>
<li> <a href ="#ligand">Annotating downstream processes to gene products in a ligand-receptor signaling pathway </a></li>
<li> <a href ="#general">General ligand-receptor pathway</a></li>
<li> <a href ="#glucose">Regulation of glucose transport</a></li>
<li> <a href ="#note">General note on revision of annotation sets </a></li>
</ul>
<li><a href = "#binding">Binding guidelines </a></li>
<ul>
<li> <a href ="#substrates">Using terms that imply binding of substrates </a></li>
<li> <a href ="#general comment on protein binding">Protein binding annotations in the Gene Ontology</a></li>
<li> <a href ="#descriptive">Choosing more descriptive terms than 'protein binding'</a> </li>
<li> <a href ="#partners">Identifying binding partners using columns 8 and 16 </a></li>
<li> <a href ="#ontology">Ontology development for protein binding </a></li>
</ul>
<li> <a href ="#response">'Response to' guidelines </a></li>
<li><a href = "#regulationTerms">Use of Regulation Terms</a> </li>
<ul><li> <a href ="#background">Background </a></li>
<li> <a href ="#guide1">Guideline 1: Use existing biological knowledge to define the process. </a></li>
<li> <a href ="#guide2">Guideline 2: If you aren't sure, consider annotating to the parent process term.</a> </li>
<li> <a href ="#guide3">Guideline 3: Improve the ontology by defining, wherever possible, the beginning, middle, and end of a process. </a></li>
<li> <a href ="#guide4">Guideline 4: Revisit annotations when new knowledge becomes available. </a></li>
<li> <a href ="#guide5">Guideline 5: Annotations based on mutant phenotypes should take mechanism into account.</a> </li>
<li><a href ="#guide6">Guideline 6: Some gene products may be
annotated to both a process and
regulation of that process. </a></li></ul>
<li> <a href ="#txn">Use of Transcription related terms</a></li>
</ul>

<p>&nbsp;</p>

<h4><a name = "general">General recommendations</a></h4>
<ul>
<li> A gene product can be annotated to zero or more nodes of each ontology. </li>
<li> Annotation of a gene product to one ontology is independent of its annotation to other ontologies. </li>
<li> Annotate gene products in each species database to the most detailed level in the ontology that correctly describes the biology of the gene product. </li>
<li> Keep in mind that annotating to a term implies annotation to all parents via any path, so it is a good idea to check the parentage of a term before annotating (and request new terms or path corrections if necessary). </li>
<li> Uncertain knowledge of where a gene product operates should be denoted by annotating it to two nodes, one of which can be a parent of the other. For instance, a yeast gene product known to be in the nucleolus, but also experimentally observed in the nucleus generally, can be annotated to both nucleolus and nucleus in the cell component ontology. Even though annotation to nucleolus alone implies that a gene product is also in the nucleus, annotate to both so as to explicitly indicate that it has been reported in the two locations. The two annotations may have the same or different supporting evidence. Similar reports of general and specific molecular function or biological process for a gene product could be handled the same way; for example, you may have direct experimental evidence (IDA) for DNA binding, but only a mutant phenotype (IMP) the more specific function term transcription factor activity and the process transcription. You also can annotate to multiple nodes that conflict with each other if there are conflicting claims in the literature. </li>
<li> An individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the complex. This practice is colloquially known as annotating 'to the potential of the complex', and is a way to capture information about what a complex does in the absence of database objects and identifiers representing complexes. For molecular function annotations, also see <a name ="qual">Using the Qualifier column</a> below. </li>
<li> A gene product should be annotated with terms reflecting its normal activity and location. A function, process, or localization (component) observed only in a mutant or disease state is therefore not usually included. In some circumstances, however, what is "normal" is a matter of perspective, depending on the organism being annotated and on the point of view of the annotator. For example, many viruses use host proteins to carry out viral processes. The host protein is then doing something abnormal from the perspective of the host, but completely normal from the perspective of the virus. GO annotators handle these cases by including two taxon IDs in the Taxon column of the gene association file; see <a href ="#interactions">annotating gene products that interact with other organisms</a> for how to handle these cases. </li>
<li> The evidence code No Data (ND) should be used as an indicator of curation status to denote gene products for which no relevant information could be found. It distinguishes gene products with no data available from those that have not yet been annotated. For more details on the code and its usage, please consult the <a href = "/page/nd-no-biological-data-available">ND evidence code documentation</a>. </li>
</ul>
<h4><a name ="dbobj">Database Objects</a></h4>
Because a single gene may encode very different products with very different attributes, GO recommends associating GO terms with database objects representing gene products rather than genes. At present, however, many participating databases are unable to associate GO terms to gene products, and therefore use genes instead. If the database object is a gene, it is associated with all GO terms applicable to any of its products. See the <a href = "/page/go-annotation-file-gaf-format-20">annotation file format guide</a> for more information.
<h4><a name = "refs">References and Evidence</a></h4>
Every annotation must be attributed to a source, which may be a literature reference, another database or a computational analysis.
The annotation must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term. A simple controlled vocabulary of evidence codes is used to capture this; please see the <a href = "/page/guide-go-evidence-codes">GO evidence code documentation</a> for more information on the meaning and use of the evidence codes.
<h4><a name = "qualifier">Using the Qualifier column</a></h4>
The Qualifier column is used for flags that modify the interpretation of an annotation. Allowable values are <b> NOT</b>, contributes_to, and colocalizes_with.
<ul><li><h5><a name = "not">NOT</a></h5></li>
<blockquote><b> NOT </b> may be used with terms from any of the three ontologies.<br></blockquote>
<b> NOT </b> is used to make an explicit note that the gene product is not associated with the GO term. This is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). For example, if a protein has sequence similarity to an enzyme (whose activity is GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as <b> NOT </b> GO:nnnnnnn. (Note: in an email exchange from Sept. 2003 this phenomenon was referred to as "sequence dissimilarity.")
<b> NOT </b> can also be used when a cited reference explicitly says (e.g. "our favorite protein is not found in the nucleus"). Prefixing a GO ID with the string <b> NOT </b> allows annotators to state that a particular gene product is <b> NOT </b> associated with a particular GO term. This usage of <b> NOT </b> was introduced to allow curators to document conflicting claims in the literature.
Note that <b> NOT </b> is used when a GO term might otherwise be expected to apply to a gene product, but an experiment, sequence analysis, etc. proves otherwise. (It is not generally used for negative or inconclusive experimental results.)
<li><h5><a name="colocal">colocalizes_with</a></h5></li>
<blockquote>colocalizes_with may be used only with cellular component terms.</blockquote>
Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier. This qualifier may also be used in cases where the resolution of an assay is not accurate enough to say that the gene product is a bona fide component member.
Example (from <i> Schizosaccharomyces pombe</i>):
Clp1p relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the spindle pole body and the contractile ring (evidence from GFP fusion). Clp1p is annotated to spindle pole body ; GO:0005816 and contractile ring ; GO:0005826, using the colocalizes_with qualifier in both cases.
<li><h5><a name="contri">contributes_to</a></h5></li>
<blockquote>contributes_to may be used only with molecular function terms.</blockquote>
As noted above, an individual gene product that is part of a complex can be annotated to terms that describe the function of the complex. Many such function annotations should use the qualifier contributes_to:
Annotating individual gene products according to attributes of a complex is especially useful for molecular function annotations in cases where a complex has an activity, but not all of the individual subunits do. (For example, there may be a known catalytic subunit and one or more additional subunits, or the activity may only be present when the complex is assembled.) Molecular function annotations of complex subunits that are not known to possess the activity of the complex must include the entry contributes_to in the Qualifier column. The contributes_to qualifier should not be used in biological process annotations. All gene products annotated using contributes_to must also be annotated to a cellular component term representing the complex that possesses the activity.
Annotations using contributes_to will often use the evidence code IC, but other codes may be used as well.
Note that contributes_to is not needed to annotate a catalytic subunit. Furthermore, contributes_to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not.
<li><h5><a name="examples">Examples</a></h5></li>
<ul>
<li> Subunits of nuclear RNA polymerases: none of the individual subunits have RNA polymerase activity, yet all of these subunits are annotated to DNA-dependent RNA polymerase activity (with the contributes_to note), to capture the activity of the complex. </li>
<li> ATP citrate lyase (ACL) in <i> Arabidopsis</i>: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity. </li>
<li> eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to GTP binding and one to RNA binding without qualifiers, and all three are annotated to ribosome binding, with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex. </li>
</ul></ul>
<h4><a name = "interactions">Annotating gene products that interact with other organisms</a></h4>
The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example, in obligate parasitic species (including viruses), almost all their gene products will be interacting with their host organism. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm.
For annotating gene products involved in these multi-organism interactions, there are special terms in the biological process ontology, under multi-organism process ; GO:0051704, and in the cellular component ontology, under other organism ; GO:0044215. More specific information can be found in the <a href = "/page/other-organisms-and-viruses">biological process documentation on multi-organism processes </a> and in the cellular component guidelines on host cell.
The species in the interaction are recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the <a href = "/page/go-annotation-file-gaf-format-20">annotation file format guide</a>.
An additional taxon ID should not be added in cases where the annotation is based on sequence or structural similarity.
<ul><li><a name = "nomenclature">Nomenclature Conventions</a></li>
The terms 'symbiont' and 'host' may carry connotations of the nature of the interaction between two organisms, but in the Gene Ontology, they are used solely to differentiate between organisms on the basis of their size. The word <em> symbiont </em> is used to refer to the smaller organism in a symbiotic interaction; the larger organism is called the <em> host</em>. If the two organisms are the same size, the term will be contain <em> other organism</em>. Note that parasites and pathogens are also referred to as 'symbionts', as symbiosis encompasses parasitism, commensalism and mutualism.
<li><a name = "newterms">Requesting new terms in the multi-organism process node</a></li>
Like the rest of GO, the multi-organism process node is not complete, and you will probably have to request some new terms when annotating your gene products. These should be submitted via the <a href = "http://sourceforge.net/p/geneontology/ontology-requests/">GO curator requests tracker</a> in the usual way. Here are a few points to bear in mind when requesting new terms, and annotating using this node:
<ul>
<li> A term name should make the direction of the interaction clear. An example of this is given below; induction of nodule morphogenesis in host would be used to annotate the symbiont gene product, while induction of nodule morphogenesis by symbiont is used to annotate the host genes. Both processes would be children of a common term nodulation. </li>
<li> If your gene product affects a 'normal' host process, you should always request a new term in the MOP node, rather than just annotating directly to the term in the 'normal' ontology. So for example, if your bacterial gene product regulates the ethylene-mediated signaling pathway in plants, rather than using dual taxon to annotate to regulation of ethylene mediated signaling pathway ; GO:0010104, you should instead request a new term regulation of ethylene mediated signaling pathway in host. </li>
<li> Where an organism subverts a 'normal' biological process, e.g. the transcription of viral DNA by host transcription machinery, host proteins should <strong> not </strong> be annotated to a 'symbiont' term like transcription of symbiont DNA. This is because this would be considered considered a pathological process, i.e. not 'normal' for the host. </li>
<li><a name = "procOther">Example: Performing a process with another organism</a></li>
Nod factor export proteins transfer nod factors out of the purple bacterium <i> Sinorhizobium meliloti </i> into the surrounding soil. Here they are detected by LysM nod factor receptor kinases in <i> Medicago truncatula </i> roots and initiate the process of nodulation.<br>
Annotation of Nod factor export ATP-binding protein I from <i> S. meliloti </i>
suggest a new term induction of nodule morphogenesis in host
<blockquote>nodulation ; GO:0009877
[p] induction of nodule morphogenesis in host ; GO:00new01</blockquote>
<i> Sinorhizobium meliloti </i> taxonomy ID: 382
<i> Medicago truncatula </i> taxonomy ID: 3880
<blockquote>protein name: Nod factor export ATP-binding protein I
GO term: induction of nodule morphogenesis in host ; GO:00new01
taxon column: taxon:382|taxon:3880</blockquote>
Annotation of LysM receptor kinase LYK3 precursor from <i> M. truncatula </i>
suggest a new term induction of nodule morphogenesis by symbiont
<blockquote>nodulation ; GO:0009877
[p] induction of nodule morphogenesis by symbiont ; GO:00new02</blockquote>
<i> Medicago truncatula </i> taxonomy ID: 3880
<i> Sinorhizobium meliloti </i> taxonomy ID: 382
<blockquote>protein name: LysM receptor kinase LYK3 precursor
GO term: induction of nodule morphogenesis by symbiont ; GO:00new02
taxon column: taxon:3880|taxon:382</blockquote>

<li><a name = "more">Example: Performing a process in more than one species</a></li>
The protein cardiotoxin from the southern Indonesian spitting cobra <i> Naja sputatrix </i> kills mammalian cells by cytolysis when it enters the host cell cytoplasm.
Annotation of cardiotoxin precursor, from <i> N. sputatrix </i>
use the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430
<i> Naja sputatrix </i> taxonomy ID: 33626
<i> Mammalia </i> taxonomy ID: 40674
<blockquote>protein name: cardiotoxin precursor
GO term: cytolysis of cells of another organism ; GO:0051715
taxon column: taxon:33626|taxon:40674
protein name: cardiotoxin precursor
GO term: host cell cytoplasm ; GO:0030430
taxon column: taxon:33626|taxon:40674</blockquote>
<li><a name = "regulating">Example: Regulating a process in another organism</a></li>
Mosquito saliva contains D7 proteins, which bind biogenic amines in order to suppress hemostasis in humans.
Annotation of D7 protein long form, from <i> A. gambiae </i>
suggest a new term negative regulation of hemostasis in host
<blockquote>evasion of host defense response ; GO:0030682
[i] negative regulation of hemostasis in host ; GO:00new03</blockquote>
<i> Anopheles gambiae </i> taxonomy ID: 7165
<i> Homo sapiens </i> taxonomy ID: 9606
<blockquote>protein name: D7 protein long form
GO term: negative regulation of hemostasis in host ; GO:00new03
taxon column: taxon:7165|taxon:9606</blockquote>
</ul></ul>
<h4><a name = "downstream">Downstream Process guidelines</a></h4>
Where there is limited knowledge regarding the processes that a gene product is directly involved in, curators may often have annotated to terms that describe the processes that are downstream of the direct activity of the gene product. Where more knowledge regarding a gene product's functional activity exists, curators need to make a judgement as to how to represent its direct activities and whether to continue to include downstream processes in the annotation set. Curators are encouraged to request more specific terms to describe how the gene product is involved in a downstream process and also evaluate the annotation set as more functional information becomes available. More detailed curator guidance is provided below.
<ul><li><a name = "specificTerms">Requesting more specific terms for downstream processes</a></li>
Where a specific, descriptive GO term does not exist (for instance to describe the involvement of a process in another process), curators are encouraged to request these terms to provide more specificity to their annotation.
For example, to describing the "intent" of growth factor BMP2 to change the "state" of the cell is instrumental in cardiac cell differentiation. Therefore requesting the new GO term BMP signaling involved in cardiac cell differentiation would make it possible to qualify how the gene product is involved in the downstream process of cardiac cell differentiation than annotating to separate terms BMP signaling and cardiac cell differentiation.
<li><a name = "core">Annotating downstream processes for gene products involved in core or specific processes</a></li>
Curators should annotate to the experimental evidence in the paper. However, curator judgement should be used, taking into account what the curator knows about:
<ol>
<li>The background of the gene product; is it widely known to have a central role causing it to affect multiple processes, or does it have few specific targets?</li>
<li>the quality of the experimental assays performed in the paper; are they fully explained and the evidence supplied convincing? (See separate guidelines for annotation of high-throughput experiments.)</li>
</ol>
Example 1. Gene product involved in core process.
Yeast RNA polymerase II subunit RPB2
RNA polymerase II subunit RPB2 has a core function of RNA polymerase activity, which has downstream effects on a large number of processes. However, curators should only annotate to the gene product's transcription activity, rather than the multiple downstream processes altered as a consequence of its activity.
Yeast spliceosome
In <i>S. cerevisiae</i>, the mutation of several genes that are components of the spliceosome result in translation defects. However, later work supplied evidence for the genes' involvement in mRNA splicing, <b>not</b> translation. Downstream effects on translation are to be expected as many ribosomal transcripts are spliced in yeast. The curation decision was to remove annotations to the term translation for spliceosome component genes once data was available to describe the direct activity the genes contributed towards.
Example 2. Gene product involved in core and specific process(es).
<i>S. pombe</i> gene Sre1
The <i>S. pombe</i> gene Sre1 is a transcriptional regulator of genes that are involved in heme and phosphoplipid biosynthesis. From reading <a href = "http://www.ncbi.nlm.nih.gov/pubmed/16537923">PMID:16537923</a> the curator decided this information should be captured in the annotation. Therefore annotations were made to:
<ul>
<li>RNA polymerase II core promoter proximal region sequence-specific DNA binding</li>
<li>regulation of transcription, DNA-dependent or regulation of transcription from RNA polymerase II promoter</li>
<li>positive regulation of heme biosynthesis</li>
<li>positive regulation of phospholipid biosynthesis</li>
</ul>
In addition, in accordance with these guidelines for annotating downstream processes, we would recommend that new terms are requested for:
<ul>
<li>regulation of transcription involved in heme biosynthesis</li>
<li>regulation of transcription involved in phospholipid biosynthesis</li>
</ul>
<li><a name = "ligand">Annotating downstream processes to gene products in a ligand-receptor signaling pathway</a></li>
Curators should anotate ligand-receptor signaling pathways as shown in the following diagrams.
For a signaling pathway, the ligand is considered part of the pathway. Therefore a factor which limits or increases the availability of a ligand to a receptor should be annotated as regulating the ligand/receptor pathway.
N.b. Ongoing work to clarify of the start/end of a signaling pathway in the definition of GO terms will allow us to refine these guidelines.
<li><a name = "general">General ligand-receptor pathway</a></li><br>
<img src="/sites/default/files/public/diag-annot-ligand-recep-pwy.gif" width="496" height="741" alt="diag-annot-ligand-recep-pwy.gif" />
<br>
<dl>
<dt>Stimulus</dt>
<dd>regulation of signaling pathway</dd>
<dt>Ligand</dt>
<dd>signaling pathway</dd>
<dd>regulation of <var>other cellular</var> processes</dd>
<dt>Receptor</dt>
<dd>signaling pathway</dd>
<dd>regulation of <var>other cellular</var> processes</dd>
<dt>Signaling molecules</dt>
<dd>signaling pathway</dd>
<dd>regulation of <var>other</var> process(es)</dd>
<dd>regulation of gene-specific transcription</dd>
<dd>regulation of translation</dd>
<dd>(regulation of) transcription in response to <var>stimulus ligand</var></dd>
<dd>(regulation of) transcription involved in <var>other</var> process(es)</dd>
<dd>(regulation of ) <var>other cellular</var> process(es)</dd>
<dt>Transcription factors*</dt>
<dd>signaling pathway</dd>
<dd>regulation of transcription involved in <var>other</var> process(es)</dd>
<dt>Target</dt>
<dd>cellular response to stimulus</dd>
<dd><var>other</var> process(es)</dd>
<dd>regulation of <var>other</var> processes</dd>
</dl>
<p>We would not consider annotating the core transcription machinery to the downstream (other) processes that the target is involved in unless the transcription factor is gene-specific, in which case we would annotate to regulation of transcription involved in <var>other</var> process(es)</p>
<li><a name = "glucose">Regulation of glucose transport</a></li>
<img src="/sites/default/files/public/diag-annot-gluc-transport.gif" width="331" height="586" alt="diag-annot-gluc-transport.gif" />
<dl>
<dt>Insulin (ligand)</dt>
<dd>insulin receptor signaling pathway</dd>
<dd>regulation of glucose transport/homeostasis</dd>
<dt>Insulin receptor (receptor)</dt>
<dd>Insulin receptor signaling pathway</dd>
<dd>Regulation of glucose transport/homeostasis</dd>
<dt>IRS1, PI3K, PDK1, PKC (signaling molecules)</dt>
<dd>Insulin receptor signaling pathway</dd>
<dd>Regulation of glucose transport/homeostasis</dd>
<dd>Protein localization at cell surface (NTR: involved in response to insulin)</dd>
<dt>GLUT4 (target)</dt>
<dd>Cellular response to insulin</dd>
<dd>Glucose transport/homeostasis</dd>
</dl>
<li><a name = "note">General note on current status of revision of annotation sets</a></li>
If a gene product has limited experimental literature, such as a newly characterised protein, it is understandable that curators need to annotate to more general 'downstream' process terms that may, in fact, represent a phenotype.
However, as more functional information is published about a gene product, curators may decide to revise these annotations to downstream processes. However currently different actions are taken by different curation groups, based on considerations of user requirements and curation capacity:
<ol>
<li>Annotations may be removed to indirect/downstream processes, or updated to 'regulation' terms. This 'deleted' information is usually stored in the annotating group's phenotype database.
</li>
<li>Annotations <strong>not</strong> removed to indirect/downstream processes because
<ul>
<li>downstream annotations are supported by good evidence, or the group wants to keep as history of annotation or give a complete overview of knowledge about the gene product. </li>
<li>the curation group does not have resources to revise annotation sets or do not have alternative place to store data</li>
</ul>
</li>
</ol>
Curation groups need to be aware that keep annotations to downstream processes will be a source of such data to other groups who may have a different annotation philosophy.</ul></ul>
<h4><a name ="binding">Binding guidelines</a></h4>
<ul>
<li><a name = "sustrates">Using terms that imply binding of substrates</a></li>
As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.
<li><a name = "general comment on protein binding">Protein binding annotations in the Gene Ontology</a></li>
The Molecular Function (MF) ontology can be used to capture macromolecular interactions, such as protein-protein, protein-nucleic acid, protein-lipid interactions, etc. While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its child terms. In making these annotations, contributing groups may follow slightly different practices with respect to the types of experimental evidence used
to support these inferences, e.g. some groups may use co-immunoprecipitation as supporting evidence for a protein binding annotation between two gene products, others not. However, all groups generally adhere to the principle that, when annotated, protein binding interactions inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are discouraged as sources of GO MF annotations.
<li><a name ="descriptive">Choosing more descriptive terms than 'protein binding'</a></li>
Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see <a href = "/page/go-annotation-file-gaf-format-20">GO Annotation File Format 2.0 Guide</a>.
<li><a name ="partners">Identifying binding partners using columns 8 and 16</a></li>
When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see <a href = "/page/guide-go-evidence-codes">evidence code documentation</a>.<br>
<strong> Examples of using the 'with' column (8) </strong><br>
The annotation of <em> Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) </em> makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.<br>
<ol>
<li> Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin. </li>
<li> Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation <b> Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S </b> captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog. </li>
</ol><br>
<strong> Examples of using the annotation extension column (16) </strong>
<p>The annotation of <b> Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) </b> makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
<ul><li>The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in <a href = "http://www.ncbi.nlm.nih.gov/pubmed/19668196">PMID:19668196</a>. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
<li>The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in <a href = "http://www.ncbi.nlm.nih.gov/pubmed/17408620?dopt=Abstract">PMID:17408620</a>, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.</ul>
<li><a name = "ontology">Ontology development for protein binding</a></li>
Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.
</ul>
<h4><a name = "response">'Response to' guidelines</a></h4>
The definition of the top-level 'response to' terms has been updated to indicate where the response begins and ends:
Any process that results in a change in state or activity of a cell or organism as the result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism.
This change was made and released in ontology version 1.1960<br>
<b> Examples: </b>
<ol><li>response to stimulus ; GO:0050896
Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism
<li>GO:0051716 cellular response to stimulus
Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell.</ul>
Advisory quality control check: High level 'response to' terms should not directly be used for annotation, unless additional information is supplied in column 16.
Be careful to use IEP when the experiment is observing expression level. Example: PMID:8888624 and annotation for <i>A. thaliana</i> <a href = "http://www.arabidopsis.org/servlets/TairObject?accession=locus:2182783">BIP1</a>. Should use IEP than IDA.
<h4><a name ="regulationTerms">Use of Regulation Terms</a></h4>
<ul><li> <a name="#background">Background </a></li>
The GO Consortium recognized quite early on in the development of the Biological Process ontology that there were gene products that participated directly in a process and gene products that regulated a process, positively and/or negatively. But how do curators know to which of these terms they should be annotating and is it possible, for a given process, to annotate the same gene product to both a parent term and one of its associated regulation term?
To begin to address these questions here are some guidelines for annotating, or not, to regulation terms:
<li> <a name="#guide1">Guideline 1: Use existing biological knowledge to define the process. </a></li>
In order to determine whether a gene product participates in a process or regulates that process (or both) curators need to consider the nature of the process. Processes can be considered as ordered assemblies of molecular functions and every process has a beginning, middle, and end.
Use existing biological knowledge and the paper being curated as guides. Is there a defined pathway, i.e. distinct molecular functions, and have the gene products that perform those functions been identified? Does the gene product being annotated perform one of those functions or a function outside of the process that might start, stop, or change the rate at which the process proceeds?
In reality, the beginning, middle, and end of some processes will be easier to define than others. For example, signaling pathways, such as MAPK signaling, will be easier to define than broader, organismal-level processes such as embryonic development. Curators should use their jugdement, based on the published literature, to guide their annotation.<br>
<strong>Example: Atg1</strong>
Saccharomyces cerevisiae Atg1 encodes a protein kinase that is involved in autophagy: "The process by which cells digest parts of their own cytoplasm; allows for both recycling of macromolecular constituents under conditions of cellular stress and remodeling the intracellular structure for cell differentiation."
Atg1 activity is critical for the induction of autophagy, specifically for formation of autophagic vacuoles. Should Atg1 be annotated to autophagic vacuole formation or regulation of autophagic vacuole formation? Authors have used language that could lead curators to make annotations to either term.
In this case, annotators need to consider the sum of what is known about the autophagic pathway and Atg1's role in that pathway.
Using that knowledge, SGD has annotated Atg1 to the parent process term, autophagic vacuole formation, because once Atg1 is active, the 'go' or 'no go' decision for autophagy has already been made. More upstream genes appear to actually be regulating the autophagic pathway.
http://wiki.geneontology.org/index.php/2010_GO_camp_Use_of_Regulation_issues#Example_2
<li> <a name="#guide2">Guideline 2: If you aren't sure, consider annotating to the parent process term. </a></li>
If the gene product performs one of the functions, annotate directly to the process. If the gene product regulates then it should be annotated to regulation of that process.
If you aren't sure what term to use, annotate to the parent process term. As more information about the process becomes available, you may be able to refine your annotations (see Guideline #4 below).
<li> <a name="#guide3">Guideline 3: Improve the ontology by defining, wherever possible, the beginning, middle, and end of a process. </a></li>
Wherever possible, include the beginning, middle, and end of a process in the corresponding term definition. This will help annotators choose the appropriate term for their annotations.
<li> <a name ="#guide4">Guideline 4: Revisit annotations when new knowledge becomes available. </a></li>
GO annotations should reflect the present state of biological knowledge. Therefore, as the understanding of a biological process improves, it may be necessary to revisit and refine existing annotations.
<li> <a name="#guide5">Guideline 5: Annotations based on mutant phenotypes should take mechanism into account.</a></li>
Mutant phenotypes are often used to make annotations to regulation terms because they fit the criteria of the term definition, i.e. authors report a change in the frequency, rate, or extent of a process.
However, in using IMP to correctly make regulation annotations it is important to consider various factors, including: 1) the assay type, 2) nature of the alleles (null vs reduction of function), and 3) molecular identity of the gene product.
Again, if it isn't clear that a gene product is involved in regulation, it is better to annotate to the parent process term.
<strong>Example: muscle contraction and <i> C. elegans </i> mutants</strong>
In <i>C. elegans</i>, a number of genes can mutate to paralysis or slowed locomotion due to defects in muscle contraction. This includes genes that encode everything from myosin heavy chain to calcium channels to transcription factors. Depending upon the nature of the allele, sometimes the mutant phenotypes for the same gene can lead to both process and regulation terms. In this case, consideration of the process, the nature of the allele (complete or partial loss of function), and the molecular identity of the gene product can guide curators in making the appropriate annotation.
http://wiki.geneontology.org/images/4/47/Regulation_example.pdf
<li><a name ="#guide6">Guideline 6: Some gene products may be annotated to both a process and regulation of that process.</a></li>
Positive and negative feedback loops are an essential part of many signaling pathways.
If one member of a pathway regulates the activity of a <em> different </em> member of the pathway, it could be annotated to both the process and regulation of that process.
When annotating gene products involved in a signaling pathway, however, curators should not annotate gene products that directly activate the next gene product in the pathway to regulation of that pathway.
For example, MAPKK would not be annotated to positive regulation of MAPKKK cascade just because it phosphorylates and activates MAPK.
However, gene products that (for example) feedback on to earlier steps in the pathway, may be annotated to both the parent process term and a regulation term.<br>
<strong>Example: ERK1/2</strong>
ERK1/2 activation requires activity of FRS2alpha which, in turn, is negatively regulated by activated ERK1/2.
Could ERK1/2 be annotated to both MAPKKK cascade and negative regulation of MAPKKK cascade?
<a href = "http://www.molbiolcell.org/content/21/4/664.full">Phosphoprotein Enriched in Astrocytes 15 kDa (PEA-15) Reprograms Growth Factor Signaling by Inhibiting Threonine Phosphorylation of Fibroblast Receptor Substrate 2{alpha}</a>
Cases where the presence/absence of one of the members of a pathway is limiting should not be annotated to regulation, e.g. if the amount of a receptor on the surface of a cell regulates the process, the receptor should <em> not </em> be annotated to the regulation term.</ul>
<h4><a name ="txn">Use of Transcription related terms</a></h4>

<p>The transcription branch of the ontology was overhauled in 2011 to remove any overlap between <strong>Function</strong> and <strong>Process</strong> terms and to accurately represent <strong>Function</strong> terms so they actually describe molecular activities (<em>how</em> something occurs). <a href = "http://gocwiki.geneontology.org/index.php/Proposals_to_overhaul_transcription_in_GO_-_2010" target="blank">You may read details of the overhaul here</a>. </p>

<p>These changes will consequently affect annotations. For example, if the experiments indicate that a gene product is involved in regulating transcription but gave no indication on <em>how</em> it acts, it would be appropriate to annotate that gene product only to <strong>Process</strong> terms. The <a href="ftp://ftp.geneontology.org/pub/go/www/txnAnnotationGuide.pdf" target="blank">Transcription Annotation Guide</a> is available to facilitate the process of annotating gene products using this new ontology structure.</p>


[[Category: Annotation]] Pascale
Categories: GO Internal

Authoritative Database Groups

GO wiki (new pages) - Mon, 09/17/2018 - 12:54

Pascale:


Moved from the Website - contents to be discussed

Authoritative Database Groups
Where two or more databases are submitting data on the same species the GO Consortium encourages the model whereby one database group collects all annotation data for that species, removes the redundant (duplicate) annotations, and then submits the total dataset to the central repository. This ensures that no redundant annotations will appear in the master dataset. The table below documents those species for which a single database group is responsible for collating and submitting annotations.
The format of the IDs used by these database groups can be found in the list of GO database abbreviations. For converting between different ID types, please see the tools for ID mapping on the GO wiki.
More information on authoritative database groups and avoiding redundancy can be found in the GO annotation policies and guidelines.
<table summary="List of model organisms and the database group responsible for providing the annotations for that species">
<caption>
GO Consortium groups responsible for all annotations for a species
</caption>
<thead>
<tr>
<th>
Project name
</th>
<th>
Species
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
Candida Genome Database
</td>
<td>
<ul>
<li>
<i>Candida albicans</i>, taxon:5476
</li>
</ul>
</td>
</tr>
<tr>
<td>
dictyBase
</td>
<td>
<ul>
<li>
<i>Dictyostelium </i>, taxon:5782
</li>
<li>
<i>Dictyostelium discoideum</i>, taxon:44689
</li>
<li>
<i>Dictyostelium discoideum AX2</i>, taxon:366501
</li>
<li>
<i>Dictyostelium discoideum AX4</i>, taxon:352472
</li>
</ul>
</td>
</tr>
<tr>
<td>
FlyBase
</td>
<td>
<ul>
<li>
<i>Drosophila melanogaster</i>, taxon:7227 (fruit fly)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Leishmania major GeneDB
</td>
<td>
<ul>
<li>
<i>Leishmania major</i>, taxon:5664
</li>
</ul>
</td>
</tr>
<tr>
<td>
Plasmodium falciparum GeneDB
</td>
<td>
<ul>
<li>
<i>Plasmodium falciparum</i>, taxon:5833 (malaria parasite P. falciparum)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Pombase
</td>
<td>
<ul>
<li>
<i>Schizosaccharomyces pombe</i>, taxon:4896 (fission yeast)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Trypanosoma brucei GeneDB
</td>
<td>
<ul>
<li>
<i>Trypanosoma brucei TREU927</i>, taxon:185431
</li>
</ul>
</td>
</tr>
<tr>
<td>
Glossina morsitans GeneDB
</td>
<td>
<ul>
<li>
<i>Glossina morsitans morsitans</i>, taxon:37546
</li>
</ul>
</td>
</tr>
<tr>
<td>
goa_chicken, UniProtKB-GOA
</td>
<td>
<ul>
<li>
<i>Gallus gallus</i>, taxon:9031 (chicken)
</li>
<li>
<i>Gallus gallus bankiva</i>, taxon:208525
</li>
<li>
<i>Gallus gallus gallus</i>, taxon:208526
</li>
<li>
<i>Gallus gallus murghi</i>, taxon:400035
</li>
<li>
<i>Gallus gallus spadiceus</i>, taxon:208524
</li>
</ul>
</td>
</tr>
<tr>
<td>
goa_cow, UniProtKB-GOA
</td>
<td>
<ul>
<li>
<i>Bos taurus</i>, taxon:9913 (cattle)
</li>
<li>
<i>Bos taurus X Bison bison</i>, taxon:297284 (beefalo)
</li>
<li>
<i>Bos taurus x Bos indicus</i>, taxon:30523
</li>
</ul>
</td>
</tr>
<tr>
<td>
goa_human, UniProtKB-GOA
</td>
<td>
<ul>
<li>
<i>Homo sapiens</i>, taxon:9606 (human)
</li>
</ul>
</td>
</tr>
<tr>
<td>
gramene_oryza, Gramene
</td>
<td>
<ul>
<li>
<i>Oryza alta</i>, taxon:52545
</li>
<li>
<i>Oryza australiensis</i>, taxon:4532
</li>
<li>
<i>Oryza barthii</i>, taxon:65489
</li>
<li>
<i>Oryza brachyantha</i>, taxon:4533
</li>
<li>
<i>Oryza coarctata</i>, taxon:77588
</li>
<li>
<i>Oryza eichingeri</i>, taxon:29689
</li>
<li>
<i>Oryza glaberrima</i>, taxon:4538 (African rice)
</li>
<li>
<i>Oryza glumipatula</i>, taxon:40148
</li>
<li>
<i>Oryza grandiglumis</i>, taxon:29690
</li>
<li>
<i>Oryza granulata</i>, taxon:110450
</li>
<li>
<i>Oryza latifolia</i>, taxon:4534
</li>
<li>
<i>Oryza longiglumis</i>, taxon:83309
</li>
<li>
<i>Oryza longistaminata</i>, taxon:4528
</li>
<li>
<i>Oryza malampuzhaensis</i>, taxon:127571
</li>
<li>
<i>Oryza meridionalis</i>, taxon:40149
</li>
<li>
<i>Oryza meyeriana</i>, taxon:83307
</li>
<li>
<i>Oryza minuta</i>, taxon:63629
</li>
<li>
<i>Oryza nivara</i>, taxon:4536
</li>
<li>
<i>Oryza officinalis</i>, taxon:4535
</li>
<li>
<i>Oryza punctata</i>, taxon:4537
</li>
<li>
<i>Oryza rhizomatis</i>, taxon:65491
</li>
<li>
<i>Oryza ridleyi</i>, taxon:83308
</li>
<li>
<i>Oryza rufipogon</i>, taxon:4529
</li>
<li>
<i>Oryza sativa</i>, taxon:4530 (rice)
</li>
<li>
<i>Oryza sativa Indica Group</i>, taxon:39946
</li>
<li>
<i>Oryza sativa Japonica Group</i>, taxon:39947
</li>
<li>
<i>Oryza schlechteri</i>, taxon:110451
</li>
<li>
<i>Oryza sp. IRGC 105360</i>, taxon:364100
</li>
<li>
<i>Oryza sp. IRGC 81916</i>, taxon:364099
</li>
<li>
<i>Panicum </i>, taxon:4539
</li>
</ul>
</td>
</tr>
<tr>
<td>
Mouse Genome Informatics
</td>
<td>
<ul>
<li>
<i>Mus musculus</i>, taxon:10090 (house mouse)
</li>
</ul>
</td>
</tr>
<tr>
<td>
PAMGO_Atumefaciens
</td>
<td>
<ul>
<li>
<i>Agrobacterium tumefaciens str. C58</i>, taxon:176299
</li>
</ul>
</td>
</tr>
<tr>
<td>
Rat Genome Database
</td>
<td>
<ul>
<li>
<i>Rattus norvegicus</i>, taxon:10116 (Norway rat)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Saccharomyces Genome Database
</td>
<td>
<ul>
<li>
<i>Saccharomyces cerevisiae</i>, taxon:4932 (baker's yeast)
</li>
<li>
<i>Saccharomyces cerevisiae RM11-1a</i>, taxon:285006
</li>
<li>
<i>Saccharomyces cerevisiae YJM789</i>, taxon:307796
</li>
<li>
<i>Saccharomyces cerevisiae var. diastaticus</i>, taxon:41870
</li>
</ul>
</td>
</tr>
<tr>
<td>
The Arabidopsis Information Resource
</td>
<td>
<ul>
<li>
<i>Arabidopsis thaliana</i>, taxon:3702 (thale cress)
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Aphagocytophilum
</td>
<td>
<ul>
<li>
<i>Anaplasma phagocytophilum HZ</i>, taxon:212042
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Banthracis
</td>
<td>
<ul>
<li>
<i>Bacillus anthracis str. Ames</i>, taxon:198094
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cburnetii
</td>
<td>
<ul>
<li>
<i>Coxiella burnetii RSA 493</i>, taxon:227377
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Chydrogenoformans
</td>
<td>
<ul>
<li>
<i>Carboxydothermus hydrogenoformans Z-2901</i>, taxon:246194
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cjejuni
</td>
<td>
<ul>
<li>
<i>Campylobacter jejuni RM1221</i>, taxon:195099
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cperfringens
</td>
<td>
<ul>
<li>
<i>Clostridium perfringens ATCC 13124</i>, taxon:195103
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cpsychrerythraea
</td>
<td>
<ul>
<li>
<i>Colwellia psychrerythraea 34H</i>, taxon:167879
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Dethenogenes
</td>
<td>
<ul>
<li>
<i>Dehalococcoides ethenogenes 195</i>, taxon:243164
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Echaffeensis
</td>
<td>
<ul>
<li>
<i>Ehrlichia chaffeensis str. Arkansas</i>, taxon:205920
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Gsulfurreducens
</td>
<td>
<ul>
<li>
<i>Geobacter sulfurreducens PCA</i>, taxon:243231
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Hneptunium
</td>
<td>
<ul>
<li>
<i>Hyphomonas neptunium ATCC 15444</i>, taxon:228405
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Lmonocytogenes
</td>
<td>
<ul>
<li>
<i>Listeria monocytogenes str. 4b F2365</i>, taxon:265669
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Mcapsulatus
</td>
<td>
<ul>
<li>
<i>Methylococcus capsulatus str. Bath</i>, taxon:243233
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Nsennetsu
</td>
<td>
<ul>
<li>
<i>Neorickettsia sennetsu str. Miyayama</i>, taxon:222891
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Pfluorescens
</td>
<td>
<ul>
<li>
<i>Pseudomonas fluorescens Pf-5</i>, taxon:220664
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Psyringae
</td>
<td>
<ul>
<li>
<i>Pseudomonas syringae pv. tomato str. DC3000</i>, taxon:223283
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Psyringae_phaseolicola
</td>
<td>
<ul>
<li>
<i>Pseudomonas syringae pv. phaseolicola 1448A</i>, taxon:264730
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Soneidensis
</td>
<td>
<ul>
<li>
<i>Shewanella oneidensis MR-1</i>, taxon:211586
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Spomeroyi
</td>
<td>
<ul>
<li>
<i>Silicibacter pomeroyi DSS-3</i>, taxon:246200
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Tbrucei_chr2
</td>
<td>
<ul>
<li>
<i>Trypanosoma brucei</i>, taxon:5691
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Vcholerae
</td>
<td>
<ul>
<li>
<i>Vibrio cholerae O1 biovar El tor</i>, taxon:686
</li>
</ul>
</td>
</tr>
<tr>
<td>
WormBase database of nematode biology
</td>
<td>
<ul>
<li>
<i>Caenorhabditis elegans</i>, taxon:6239
</li>
</ul>
</td>
</tr>
<tr>
<td>
Zebrafish Information Network
</td>
<td>
<ul>
<li>
<i>Danio rerio</i>, taxon:7955 (zebrafish)
</li>
</ul>
</td>
</tr>
</tbody>
</table>

[[Category:Managers]] Pascale
Categories: GO Internal

Guidelines for new Biological Processes

GO wiki (new pages) - Fri, 09/14/2018 - 07:40

Pascale: Created page with " http://www.geneontology.org/page/biological-process-ontology-guidelines Category:OntologyCategory:GO Editors"


http://www.geneontology.org/page/biological-process-ontology-guidelines

[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

Guidelines for new Cellular Components

GO wiki (new pages) - Fri, 09/14/2018 - 07:28

Pascale: Created page with " * Requesting a New Complex ID from IntAct Category:OntologyCategory:GO Editors"


* [[Requesting a New Complex ID from IntAct]]

[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

Guidelines for new Molecular Functions

GO wiki (new pages) - Fri, 09/14/2018 - 07:27

Pascale:



* [[Curator Guide: Enzymes and Reactions]] (being reviewed)
* [[Notes on specific terms]] (being reviewed)
* Specificity of activities

https://github.com/geneontology/go-ontology/issues/12257 @thomaspd commented that "An activity should not be named after a gene product, as a gene product could potentially have multiple molecular functions and not just the one it's named after. Also, protein phosphatase inhibitor means "directly [i.e. via direct physical interaction] inhibits some protein phosphatase activity", not "directly inhibits some protein named protein phosphatase 2A"-- properly speaking you don't inhibit a protein, you inhibit the activity of a protein."

This was discussed briefly during the GO editors call: http://wiki.geneontology.org/index.php/Ontology_meeting_2016-01-28#TermGenie_review_queue

While attendees were in general in agreement with the above, @tberardini pointed to cases in which we may want to keep the specificity, such as http://amigo.geneontology.org/amigo/term/GO:0018024#display-lineage-tab

* [[Editorial-type_sections_copied_over_from_the_GO_website_ontology_documentation:_MF]]




[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

GO import files

GO wiki (new pages) - Fri, 09/14/2018 - 06:54

Pascale:

=Protocol for managing import files=
[[Adding_Terms_and_Regenerating_the_Import_Files]]

=What is an import file?=
Some terms from external ontologies are used in equivalence axioms and subclass assertions in certain GO classes. External classes are imported as needed by generating import files that contain the necessary class hierarchy from the external ontology. The external class hierarchy is used for reasoning in GO.

===Direct imports===
* chebi_import.owl
* caro_import.owl
* cl_import.owl
* go-brige.owl
* fao_import.owl
* go-gci.owl
* go-taxon-groupings.owl
* ncbitaxon_import.owl
* oba_import.owl
* pato_import.owl
* po_import.owl
* pr_import.owl
* ro_import.owl
* ro_pending.owl
* so_import.owl
* uberon_import.owl
* x-disjoint.owl


===Indirect imports===
* taxslim.owl


[[Category:GO Editors]][[Category:Ontology]][[Category:Editor_Guide_2018]] Pascale
Categories: GO Internal