Feed aggregator

2018 Montreal GOC users workshop

GO wiki (new pages) - Wed, 09/19/2018 - 12:02

Pascale:

=Gene Ontology workshop 2018 =

The Gene Ontology (GO) is one of the most widely used bioinformatics resource in the world.
Because of the staggering complexity of biological systems and the ever-increasing size of datasets to analyze, biomedical research is becoming increasingly dependent on knowledge stored in computable form.
The GO project provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products.

The GO Consortium develops an up-to-date, comprehensive, computational model of biological systems, from the molecular level to larger pathways, cellular and organism-level systems.
GO defines an ontology is composed of classes (or concepts) that can be used to describe gene function, and relationships between these concepts.
The highly structure of GO makes it ideally amenable to computational analysis.

==Who should attend==
The meeting aims to bring together students and researchers who:
* use GO in any aspect of their work
* want to expand the knowledge on GO
* want to acquire hands-on experience with the GO resource
* encourage exchanges on problems and solutions, and foster collaborations
The workshop will provide an opportunity to discuss projects with other users as well as with developers of the GO.

==Schedule==
The morning session will be a training session, with presentations on the GO resources and on specific tools by GO Consortium members. The afternoon session will be dedicated to oral and poster presentations from participants.

==Abstract submission==
We are inviting interested participants to submit abstracts on any project related to GO or using GO directly or indirectly, for example in data analysis applications, data display, text mining, etc.

==When==
October 16th, 2018, 9:00 to 18:00

==Where==
Pavillion JF Kennedy (PK), University of Quebec in Montreal (UQAM), Montreal, Canada

==Registration link==
http://eventbrite.com/e/gene-ontology-workshop-2018-tickets-49865293435

==Abstract submission==
To submit an abstract: https://easychair.org/cfp/GOW2018

==Important dates==
Sept 28: Abstract submission deadline
Oct 4: Acceptance notification
Oct 16: GO workshop at UQAM

==Organizing committee==
===Chairs===
* Pascale Gaudet, GO Central/SIB Swiss Institute of Bioinformatics
* Laurent-Phillipe Albou, University of Southern California

===Scientific committee===
* Paul Thomas, University of Southern California
* Chris Mungall, Berkeley University
* Judith Blake, Jackson Laboratories
* Paul Sternberg, CalTech
* Michael Cherry, Stanford University
* Susanna Lewis, Berkeley University

===Local chairs===
* Marie-Jean Meurs, UQAM
* Hayda Almeida, UQAM

[[Category:Meetings]] Pascale
Categories: GO Internal

Manager Call 2018-09-20

GO wiki (new pages) - Tue, 09/18/2018 - 06:20

David:

= Meeting URL =

*https://stanford.zoom.us/j/754529609

= Agenda =

== Montreal GOC Meeting ==
*Review [http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda agenda]

== New Project management strategy ==
* ACTION: Each team (product owner / project lead) needs to prioritize the tickets and decide which will be done for the next Milestone (October GO meeting).

== Feedback form update ==
https://github.com/geneontology/go-site/issues/750
What's the time line for deployment? TBD, but pretty much instantaneous when done. (Seth)

== New GAF Submissions ==
SuziA: Update on documentation for new groups to submit data: https://github.com/geneontology/go-annotation/issues/2067

== Ontology Editors' Meeting ==
Geneva- Week of December 10th

The meeting will focus on (Depending on who can attend):
* creating good logical definitions in both a general context and in the context of the GO-Reactome-Rhea alignment
** Filling in missing GO-Rhea xrefs
** moving from reactions to biochemical pathways
** implementation of a method to keep in synch with all three resources
** GO-CAM templates for pathways??
* potential implementation of Design Patterns and the revival of some type of TermGenie ability
* GH ticket work and mini-project planning
** Dealing with and migrating binding terms
** Dealing with cellular processes (https://github.com/geneontology/go-ontology/issues/12849)
* Attendees
** David H (Can make it)
** Kimberly (Can make it)
** Karen
** Harold (Can make it)
** Pascale (Can make it)
** Barbara (Maybe make it)
** Alan B (Rhea, can make any day)
** Anne (Rhea, prefers Monday or Tuesday)
** Chris
** Peter D(Can make it)
** Ben?
** Jim?
Pascale: needed to follow up with Paul regarding funds to hold these

== Job descriptions for each role ==
Pascale and Kimberly stared to create job descriptions for all managers roles:

https://drive.google.com/drive/folders/1F7e2D7T4hleIq8VaH7YW60D9wxQnRFRV

Every manager should add what they believe are their tasks. And then we discuss it here.

* Please have a look.

== Noctua as a teaching tool?==

== Fate of pre-composed terms ==
* explicit binding terms
** X-binding terms will be decomposed as binding and has_input X.
*** Annotation
**** Moving forward
***** Groups that have the ability to use AEs in their own tools will make structured annotations.
***** Groups that do not have the ability to use AEs in their own tools will use Noctua.
**** Existing Annotations
***** On an predetermined date, all existing annotations to pre-composed x binding terms will be converted to binding and an AE.
***** For terms that have a value in the 'with' field with an IPI evidence code, those will be converted to IDA and the 'with' value will be moved over to the AE.
***** For terms that do not have an IPI and a value in the 'with' field, the terms will be converted based on the term. For generic gene products we will use a PRO identifier. For gene families we will use a Panther ID???
* compound terms between two existing GO classes
** Compound terms will be decomposed into their elemental terms with appropriate relationships.
*** Annotation
**** Moving Forward
***** Groups that have the ability to use AEs in their own tools will make structured annotations or co annotate.
***** Groups that do not have the ability to use AEs in their own tools will use Noctua.
**** Existing Annotations
***** Existing annotations will be converted to multiple simple conventional annotations (we need to derive rules for this) OR
***** Existing annotations will be converted to simple GO-CAM models from which annotations will be derived. OR
***** Existing annotations will be converted to conventional annotation with AEs, capturing the context of the original term.
* Considerations
** With respect to groups that have been using these terms and annotations, can we provide them with tools that will allow them to continue to conduct their analyses?

== NAR paper ==

https://docs.google.com/document/d/13rV4qIHcTJt7lTv_N5_6AbG3OlSU0jODjMU7-jP-e8c/edit#

Highlights:
* Substantial improvements to parts of the ontology, particularly the molecular function branch
* Addition of qualifiers for biological process annotations, allowing users for the first time to distinguish when a gene product functions as an integral part of a process, versus when it has a more indirect causal effect on a process
*New evidence codes that allow users to filter out annotations from high-throughput experiments
*A publicly accessible repository of richer, connected GO annotations (we call GO-CAM, for GO-Causal Activity Models), that provide richer annotations and bridge GO annotations and pathway representations
* New production pipeline and QC effort, and examples of improvements

= Minutes =




[[Category:GO Consortium]] [[Category:GO Managers Meetings ]] Vanaukenk
Categories: GO Internal

GO-CAM Working Group Call 2018-09-18

GO wiki (new pages) - Mon, 09/17/2018 - 14:31

Vanaukenk: /* Evidence Codes in Noctua */

= Meeting URL =
https://stanford.zoom.us/j/976175422

=Agenda=

== Evidence Codes in Noctua ==
*Currently, Noctua allows for use of the full Evidence and Conclusion Ontology
*GO has typically only used a subset of ECO codes, e.g. IDA
**ECO:0000314 direct assay evidence used in manual assertion
**has_related_synonym 'IDA'
**database cross reference GOECO:IDA (inferred from direct assay)

== Relations between MF and Input(s) ==
*has_input vs has_direct_input
*Proposal: replace has_direct_input with has_input; obsolete has_direct_input
*Need to review has_input annotations to remove any extensions that are inconsistent with GO-CAM usage, i.e. an indirect or unknown proximity for an input
*Seth retrieved, as of 2018-07-31, [https://drive.google.com/drive/folders/1TlwrEM2KjAzxIYiCGg0_oicMOYfhiGou all MF annotations] that use has_input in annotation extensions.
**Initial review:
***used to capture a regulatory effect, e.g. protein kinase activator activity, when it was not known whether the effect was direct or indirect (e.g. expression of protein or complex X increases the activity of Y)
***used to capture a regulatory subunit whose presence is necessary for the activity to occur (e.g. cyclin-dependent protein kinase)
***used to capture an enzymatic activity when it was not known if the effect on a substrate was direct or indirect (e.g. caspase-dependent but not known if it was the caspase mutated)
***used to capture an enzymatic substrate where there wasn't also a direct binding assay in the paper (e.g. testing possible chemical substrates for glucuronysyltransferase activity)
***used to capture metal ion-dependence of protein binding (e.g. Ca2+-dependent protein binding)
***used (correctly) to capture the physiologically relevant input in a binding reaction (i.e. cross-species experiment where with/from captures experimental binding partner and AE the relevant binding partner)
*Relations Ontology working group (broader than just GO) that is also considering [https://github.com/oborel/obo-relations/issues/244 how to model participants in an MF] and [https://github.com/oborel/obo-relations/issues/171 documentation of has_input and child relations]


[[Category: Annotation Working Group]] Vanaukenk
Categories: GO Internal

GO Annotation Standard Operating Procedures

GO wiki (new pages) - Mon, 09/17/2018 - 13:05

Pascale:

From GO Annotation Standard Operating Procedures
TO BE REVIEWED

<p>This page documents some of the standard operating procedures used by members of the GO Consortium during the process of annotation. Please note that these do not represent the best, or only ways to carry out annotation, but are simply a guide to how some groups currently annotate. More information on annotation can be found in the <a href="/page/go-annotation-policies">GO annotation guide</a> and in the <a href="/page/go-annotation-conventions">GO annotation conventions</a>; if you have any questions on the guidelines given below, please contact the <a href="/form/contact-go">GO helpdesk</a>.</p>
<ul>
<li>
<h4><a href = "#tell">Tell us about your requirements</a></h4>
</li>
<ul><li>
<a href = "#small">I represent a small lab working on a biological area of research</a>
</li>
<li>
<a href = "#est">I have a set of ESTs and I would like to attach annotations</a>
</li>
<li>
<a href = "#genome">I have a genome sequence</a>
</li>
<li>
<a href = "#micro">I have a microarray data set</a>
</li>
<li>
<a href = "#peptide">I have a peptide sequence</a>
</li></ul>
<li>
<h4><a href = "#elect">Electronic annotation</a></h4>
</li>
<ul>
<li>
<a href = "#interpro">InterPro Mapping</a>
</li>
<li>
<a href = "#keyword">Keyword Mapping</a>
</li>
<li>
<a href = "#hamap">HAMAP</a>
</li>
<li>
<a href = "#ec">Enzyme Commission</a>
</li>
<li>
<a href = "#other">Other mappings</a>
</li>
<li>
<a href = "#blast">BLAST</a>
</li>
<li>
<a href = "#none">No similar sequences manually annotated?</a>
</li></ul>
<li>
<h4><a href = "#paper">Literature Annotation</a></h4>
</li>
<li>
<h4><a href = "#sequence">Sequence-based annotation</a></h4>
</li>
<ul>
<li>
<a href ="#general">General principles for sequence IDs</a>
</li>
<li>
<a href = "#flow">Annotation workflow</a>
</li>
</ul>

<h3 id="tell">Tell us about your requirements</h3>

<ol>
<li><h5 id="small">I represent a small lab working on a biological area of research</h5>

<p>In this case, perhaps you have a list of your favorite genes and you wish to annotate them. You have a range of choices depending on what you are trying to achieve.</p>
<p>Please see the range of options below and choose the one that suits you best.</p>

<li><h5 id="est">I have a set of ESTs and I would like to attach annotations</h5>

<p>If you would ultimately like to send the annotations to the consortium for distribution, it is crucial that your EST clusters should maintain the same identifiers over each round of re-clustering. One way to do this is to identify clusters based on one EST that is chosen for each cluster. There may be other good ways that we have not heard about. </p>

<p>Many EST clusters have stable identifiers with version updates (e.g. the UniGene database). These stable identifiers can be used for making GO associations.</p>

<p>Once you have your clusters and stable identifiers follow the <a href = "#elect">IEA directions</a> for making electronic annotation.</p>

<p>You could also run BlastX, or run gene prediction programs and then BlastP. Running InterPro on the sequences will find the longest open reading frame.</p>

<li><h5 id="genome">I have a genome sequence</h5>

<p>You will already have assembled the genome sequence and made gene calls. Once you have the cds sequences or predicted protein sequences then you can follow the instructions on <a href = "#elect">IEA annotation</a> and/or <a href = "#paper">Literature annotation</a>. Please see below.</p>

<li><h5 id="micro">I have a microarray data set</h5>

<p>The action you can take depends somewhat on your sequences.</p>
<ul>
<li>
Are they cDNAs or oligos?
</li>
<li>
Do they have identifiers? Which kind?
</li>
<li>
How do they relate to the genes? If you know which sequence relates to which characterised gene then it will be easy to transfer annotations over.
</li>
<li>
Do the genes have GO annotations? If they do not have full GO annotation from literature then you may like to apply for funding to annotate the genes yourself, or write to your Model Organism Database to ask them to do so.
</li>
<li>
Can you get more up to date annotations than those provided with your tool? It may be that you are seeing only the annotations that come from your proprietary microarray software provider. It is a good idea to ask how often they update their annotations and ontology structure as these change from day to day, and there may be many more annotations available than you are seeing.
</li>
</ul>

</p>It is most likely that you will want to use mainly <a href = "#elect">electronic annotations</a>, supplemented with some <a href = "#paper">literature annotation</a> for those sequences that are not yet fully annotated.</p>

<li><h5 id="peptide">I have a peptide sequence</h5>
<ul>
<li>
Do you know what gene is it?
</li>
<li>
Can you map it to a UniProtKB or MOD identifier?
</li>
<li>
Does this identifier have GO annotation?
</li>
</ul>

<p>If it doesn't, you can request that it be annotated (it helps if you provide literature associated with this gene product). If you cannot map it to a UniProtKB or MOD identifier, then you can make your own GO annotation by any of the electronic or <a href = "#sequence">ISS methods</a> illustrated below.</p>
</ol>

<h3 id="elect">Electronic annotation</h3>

<p>Electronic annotation is very quick and produces large amounts of less detailed annotation very quickly. Electronic annotations are rarely wrong, but tend to be less detailed. For example, electronic annotation is likely to tell you which of your genes are transcription factors but unlikely to tell you in great detail what process the gene controls. You may like to use this method if you have a new genome sequence to annotate, or a microarray with many thousands of sequences.</p>

<img src="/sites/default/files/public/diag-iea-overview.png" width="594" height="157" alt="diag-iea-overview.png" />

<p>This diagram illustrates some of the main ways of making electronic annotation. It should be read from the top down. The diagram shows sequences from UniProt having electronic GO annotation assigned by several computational methods. All of these methods involve use of mapping files. To learn more visit the guide to <a href="/page/download-mappings">information on mappings of GO to other classification systems</a>.</p>

<ul>
<li id="interpro"><strong>InterPro Mapping</strong>
<p>In the case of the Interpro mapping it is possible to assign electronic GO annotation to your sequences based on InterPro domains and a number of other criteria. For example if your sequence has a DNA binding domain then it makes sense to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see the information on InterProScan.</p>

<li id="keyword"><strong>UniProt Keyword Mapping</strong>
<p>This part of the diagram illustrates how sequences already categorized using the UniProt keyword mapping can have GO annotation automatically applied by transferring via the keyword mapping file.</p>

<li id="hamap"><strong>HAMAP</strong>
<p>HAMAP is a system that categorizes sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.</p>

<li id="ec"><strong>Enzyme Commission</strong>
<p>The Enzyme Commission database categories enzymes by the reactions they catalyze. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.</p>

<li id="other"><strong>Other mappings</strong>
<p>These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. <a href="/page/download-mappings">Many other mappings are available</a>, and if there is not a mapping file between GO and your current annotation system, we can assist you in making one.</p>

<li id="blast"><strong>BLAST</strong>
<p>You can also make electronic annotations by BLASTing your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.</p>

<li id="none"><strong>No similar sequences manually annotated?</strong>
<p>If your sequence is similar to other sequences that have been well characterized but not yet annotated from the literature, then one option is to carry out the <a href = "#paper">literature annotation</a> yourself and then transfer by <a href = "#elect">electronic methods</a>.</p>
</ul>

<h3 id="paper">Literature Annotation</h3>

<p>Literature annotation involves capturing published information about the exact function of a gene product as a GO annotations. To do this you must read the publications about the gene and write down all the information. This annotation is time-consuming but produces very high quality, species-specific annotation, and brings the information about the gene product into a format in which it can be used in high-throughput experiments. This is an extremely worthwhile process in the long term. It may be best carried out by people who know the function of the gene product, and the associated biology, in great detail; for example experimental scientists who are familiar with the published literature. If you are doing this, then you may like to write and suggest modifications to the ontology structure as well.</p>

<p>Below is a schematic diagram giving an introduction to the steps involved in literature-based GO annotation. If you are interested in carrying out literature-based annotation you can receive full training in the process by attending a GO annotation camp or by working with an individual GO Consortium annotation mentor.</p>

<img src="/sites/default/files/public/diag-literature-annot.png" width="720" height="540" alt="Literature Based Annotation" />
<p><a href="/sites/default/files/public/diag-literature-annot.png" title="Literature Based Annotation">View a larger version.</a></p>

<h3 id="sequence">Sequence-based annotation</h3>
<strong id="general">General principles for sequence IDs</strong>
<ul>
<li>
You must have stable identifiers for your objects.
</li>
<li>
You must provide information on what the object is, e.g. a protein, nucleotide, EST, <i>etc.</i>. It doesn't matter if a nucleotide sequence is a gene, a genome, or an EST as long as it can be identified as such.
</li>
<li>
If a sequence identifier has become obsolete, there must be a mechanism in place for tracking down the replacement.
</li>
<li>
Your database must have an internal rule that object identifiers are never reused.
</li>
</ul>

<strong id="flow">Annotation workflow</strong>

<p>The following diagram shows the standard operating procedure for sequence-based (<a href="/page/iss-inferred-sequence-or-structural-similarity/">ISS evidence code</a>) annotation used in the past at The Institute for Genomic Research (now <a href="http://www.jcvi.org/" target="blank">JCVI</a>).</p>

<img src="/sites/default/files/public/diag-tigr-annotation.png" width="623" height="987" alt="ISS Evidence Code at TIGR" />
<p><a href="/sites/default/files/public/diag-tigr-annotation.png" title="ISS Evidence Code at TIGR">View a larger version.</a></p>

[[Category: Annotation]] Pascale
Categories: GO Internal

Annotation conventions

GO wiki (new pages) - Mon, 09/17/2018 - 13:00

Pascale: Created page with " From http://geneontology.org/page/go-annotation-conventions#not TO BE REVIEWED <h4>Annotation Conventions</h4> This page contains guidelines which apply to all annotati..."

From http://geneontology.org/page/go-annotation-conventions#not
TO BE REVIEWED

<h4>Annotation Conventions</h4>
This page contains guidelines which apply to all annotation methods and are particularly useful for manual literature-based annotation. More information on annotation can be found in the introduction to <a href="/page/go-annotation-policies">GO Annotation Policies and Guidelines</a> and the <a href="/page/go-annotation-standard-operating-procedures">GO Annotation Standard Operating Procedures</a>.
See also the <a href="http://wiki.geneontology.org/index.php/Consortium_Meetings#annot">Annotation Camp minutes</a> for additional information, including examples, on annotation practices and recommendations.
<p>

<ul>
<li> <a href = "#general">General Recommendations </a> </li>
<li> <a href ="#dbobj">Database Objects </a></li>
<li> <a href ="#refs">References and Evidence </a></li>
<li><a href = "#qual"> Using the Qualifier column</a> </li>
<ul>
<li> <a href ="#not">NOT</a> </li>
<li> <a href ="#colocal">colocalizes_with </a></li>
<li> <a href = "#contri">contributes_to</a> </li>
<li> <a href = "#examples">Examples </a></li>
</ul>
<li> <a href ="#interactions">Annotating gene products that interact with other organisms</a> </li>
<ul>
<li> <a href = "#nomenclature">Nomenclature Conventions </a></li>
<li> <a href ="#newterms">Requesting new terms in the multi-organism process node</a> </li>
<li> <a href ="#procOther">Example: Performing a process with another organism </a></li>
<li> <a href ="#more">Example: Performing a process in more than one species </a></li>
<li> <a href ="#regulating">Example: Regulating a process in another organism </a></li>
</ul>
<li> <a href ="#downstream">Downstream Process guidelines</a> </li>
<ul>
<li> <a href = "#specificTerms">Requesting more specific terms for downstream processes </a></li>
<li> <a href ="#core">Annotating downstream processes for gene products involved in core or specific processes </a></li>
<li> <a href ="#ligand">Annotating downstream processes to gene products in a ligand-receptor signaling pathway </a></li>
<li> <a href ="#general">General ligand-receptor pathway</a></li>
<li> <a href ="#glucose">Regulation of glucose transport</a></li>
<li> <a href ="#note">General note on revision of annotation sets </a></li>
</ul>
<li><a href = "#binding">Binding guidelines </a></li>
<ul>
<li> <a href ="#substrates">Using terms that imply binding of substrates </a></li>
<li> <a href ="#general comment on protein binding">Protein binding annotations in the Gene Ontology</a></li>
<li> <a href ="#descriptive">Choosing more descriptive terms than 'protein binding'</a> </li>
<li> <a href ="#partners">Identifying binding partners using columns 8 and 16 </a></li>
<li> <a href ="#ontology">Ontology development for protein binding </a></li>
</ul>
<li> <a href ="#response">'Response to' guidelines </a></li>
<li><a href = "#regulationTerms">Use of Regulation Terms</a> </li>
<ul><li> <a href ="#background">Background </a></li>
<li> <a href ="#guide1">Guideline 1: Use existing biological knowledge to define the process. </a></li>
<li> <a href ="#guide2">Guideline 2: If you aren't sure, consider annotating to the parent process term.</a> </li>
<li> <a href ="#guide3">Guideline 3: Improve the ontology by defining, wherever possible, the beginning, middle, and end of a process. </a></li>
<li> <a href ="#guide4">Guideline 4: Revisit annotations when new knowledge becomes available. </a></li>
<li> <a href ="#guide5">Guideline 5: Annotations based on mutant phenotypes should take mechanism into account.</a> </li>
<li><a href ="#guide6">Guideline 6: Some gene products may be
annotated to both a process and
regulation of that process. </a></li></ul>
<li> <a href ="#txn">Use of Transcription related terms</a></li>
</ul>

<p>&nbsp;</p>

<h4><a name = "general">General recommendations</a></h4>
<ul>
<li> A gene product can be annotated to zero or more nodes of each ontology. </li>
<li> Annotation of a gene product to one ontology is independent of its annotation to other ontologies. </li>
<li> Annotate gene products in each species database to the most detailed level in the ontology that correctly describes the biology of the gene product. </li>
<li> Keep in mind that annotating to a term implies annotation to all parents via any path, so it is a good idea to check the parentage of a term before annotating (and request new terms or path corrections if necessary). </li>
<li> Uncertain knowledge of where a gene product operates should be denoted by annotating it to two nodes, one of which can be a parent of the other. For instance, a yeast gene product known to be in the nucleolus, but also experimentally observed in the nucleus generally, can be annotated to both nucleolus and nucleus in the cell component ontology. Even though annotation to nucleolus alone implies that a gene product is also in the nucleus, annotate to both so as to explicitly indicate that it has been reported in the two locations. The two annotations may have the same or different supporting evidence. Similar reports of general and specific molecular function or biological process for a gene product could be handled the same way; for example, you may have direct experimental evidence (IDA) for DNA binding, but only a mutant phenotype (IMP) the more specific function term transcription factor activity and the process transcription. You also can annotate to multiple nodes that conflict with each other if there are conflicting claims in the literature. </li>
<li> An individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the complex. This practice is colloquially known as annotating 'to the potential of the complex', and is a way to capture information about what a complex does in the absence of database objects and identifiers representing complexes. For molecular function annotations, also see <a name ="qual">Using the Qualifier column</a> below. </li>
<li> A gene product should be annotated with terms reflecting its normal activity and location. A function, process, or localization (component) observed only in a mutant or disease state is therefore not usually included. In some circumstances, however, what is "normal" is a matter of perspective, depending on the organism being annotated and on the point of view of the annotator. For example, many viruses use host proteins to carry out viral processes. The host protein is then doing something abnormal from the perspective of the host, but completely normal from the perspective of the virus. GO annotators handle these cases by including two taxon IDs in the Taxon column of the gene association file; see <a href ="#interactions">annotating gene products that interact with other organisms</a> for how to handle these cases. </li>
<li> The evidence code No Data (ND) should be used as an indicator of curation status to denote gene products for which no relevant information could be found. It distinguishes gene products with no data available from those that have not yet been annotated. For more details on the code and its usage, please consult the <a href = "/page/nd-no-biological-data-available">ND evidence code documentation</a>. </li>
</ul>
<h4><a name ="dbobj">Database Objects</a></h4>
Because a single gene may encode very different products with very different attributes, GO recommends associating GO terms with database objects representing gene products rather than genes. At present, however, many participating databases are unable to associate GO terms to gene products, and therefore use genes instead. If the database object is a gene, it is associated with all GO terms applicable to any of its products. See the <a href = "/page/go-annotation-file-gaf-format-20">annotation file format guide</a> for more information.
<h4><a name = "refs">References and Evidence</a></h4>
Every annotation must be attributed to a source, which may be a literature reference, another database or a computational analysis.
The annotation must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term. A simple controlled vocabulary of evidence codes is used to capture this; please see the <a href = "/page/guide-go-evidence-codes">GO evidence code documentation</a> for more information on the meaning and use of the evidence codes.
<h4><a name = "qualifier">Using the Qualifier column</a></h4>
The Qualifier column is used for flags that modify the interpretation of an annotation. Allowable values are <b> NOT</b>, contributes_to, and colocalizes_with.
<ul><li><h5><a name = "not">NOT</a></h5></li>
<blockquote><b> NOT </b> may be used with terms from any of the three ontologies.<br></blockquote>
<b> NOT </b> is used to make an explicit note that the gene product is not associated with the GO term. This is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). For example, if a protein has sequence similarity to an enzyme (whose activity is GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as <b> NOT </b> GO:nnnnnnn. (Note: in an email exchange from Sept. 2003 this phenomenon was referred to as "sequence dissimilarity.")
<b> NOT </b> can also be used when a cited reference explicitly says (e.g. "our favorite protein is not found in the nucleus"). Prefixing a GO ID with the string <b> NOT </b> allows annotators to state that a particular gene product is <b> NOT </b> associated with a particular GO term. This usage of <b> NOT </b> was introduced to allow curators to document conflicting claims in the literature.
Note that <b> NOT </b> is used when a GO term might otherwise be expected to apply to a gene product, but an experiment, sequence analysis, etc. proves otherwise. (It is not generally used for negative or inconclusive experimental results.)
<li><h5><a name="colocal">colocalizes_with</a></h5></li>
<blockquote>colocalizes_with may be used only with cellular component terms.</blockquote>
Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier. This qualifier may also be used in cases where the resolution of an assay is not accurate enough to say that the gene product is a bona fide component member.
Example (from <i> Schizosaccharomyces pombe</i>):
Clp1p relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the spindle pole body and the contractile ring (evidence from GFP fusion). Clp1p is annotated to spindle pole body ; GO:0005816 and contractile ring ; GO:0005826, using the colocalizes_with qualifier in both cases.
<li><h5><a name="contri">contributes_to</a></h5></li>
<blockquote>contributes_to may be used only with molecular function terms.</blockquote>
As noted above, an individual gene product that is part of a complex can be annotated to terms that describe the function of the complex. Many such function annotations should use the qualifier contributes_to:
Annotating individual gene products according to attributes of a complex is especially useful for molecular function annotations in cases where a complex has an activity, but not all of the individual subunits do. (For example, there may be a known catalytic subunit and one or more additional subunits, or the activity may only be present when the complex is assembled.) Molecular function annotations of complex subunits that are not known to possess the activity of the complex must include the entry contributes_to in the Qualifier column. The contributes_to qualifier should not be used in biological process annotations. All gene products annotated using contributes_to must also be annotated to a cellular component term representing the complex that possesses the activity.
Annotations using contributes_to will often use the evidence code IC, but other codes may be used as well.
Note that contributes_to is not needed to annotate a catalytic subunit. Furthermore, contributes_to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not.
<li><h5><a name="examples">Examples</a></h5></li>
<ul>
<li> Subunits of nuclear RNA polymerases: none of the individual subunits have RNA polymerase activity, yet all of these subunits are annotated to DNA-dependent RNA polymerase activity (with the contributes_to note), to capture the activity of the complex. </li>
<li> ATP citrate lyase (ACL) in <i> Arabidopsis</i>: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity. </li>
<li> eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to GTP binding and one to RNA binding without qualifiers, and all three are annotated to ribosome binding, with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex. </li>
</ul></ul>
<h4><a name = "interactions">Annotating gene products that interact with other organisms</a></h4>
The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example, in obligate parasitic species (including viruses), almost all their gene products will be interacting with their host organism. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm.
For annotating gene products involved in these multi-organism interactions, there are special terms in the biological process ontology, under multi-organism process ; GO:0051704, and in the cellular component ontology, under other organism ; GO:0044215. More specific information can be found in the <a href = "/page/other-organisms-and-viruses">biological process documentation on multi-organism processes </a> and in the cellular component guidelines on host cell.
The species in the interaction are recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the <a href = "/page/go-annotation-file-gaf-format-20">annotation file format guide</a>.
An additional taxon ID should not be added in cases where the annotation is based on sequence or structural similarity.
<ul><li><a name = "nomenclature">Nomenclature Conventions</a></li>
The terms 'symbiont' and 'host' may carry connotations of the nature of the interaction between two organisms, but in the Gene Ontology, they are used solely to differentiate between organisms on the basis of their size. The word <em> symbiont </em> is used to refer to the smaller organism in a symbiotic interaction; the larger organism is called the <em> host</em>. If the two organisms are the same size, the term will be contain <em> other organism</em>. Note that parasites and pathogens are also referred to as 'symbionts', as symbiosis encompasses parasitism, commensalism and mutualism.
<li><a name = "newterms">Requesting new terms in the multi-organism process node</a></li>
Like the rest of GO, the multi-organism process node is not complete, and you will probably have to request some new terms when annotating your gene products. These should be submitted via the <a href = "http://sourceforge.net/p/geneontology/ontology-requests/">GO curator requests tracker</a> in the usual way. Here are a few points to bear in mind when requesting new terms, and annotating using this node:
<ul>
<li> A term name should make the direction of the interaction clear. An example of this is given below; induction of nodule morphogenesis in host would be used to annotate the symbiont gene product, while induction of nodule morphogenesis by symbiont is used to annotate the host genes. Both processes would be children of a common term nodulation. </li>
<li> If your gene product affects a 'normal' host process, you should always request a new term in the MOP node, rather than just annotating directly to the term in the 'normal' ontology. So for example, if your bacterial gene product regulates the ethylene-mediated signaling pathway in plants, rather than using dual taxon to annotate to regulation of ethylene mediated signaling pathway ; GO:0010104, you should instead request a new term regulation of ethylene mediated signaling pathway in host. </li>
<li> Where an organism subverts a 'normal' biological process, e.g. the transcription of viral DNA by host transcription machinery, host proteins should <strong> not </strong> be annotated to a 'symbiont' term like transcription of symbiont DNA. This is because this would be considered considered a pathological process, i.e. not 'normal' for the host. </li>
<li><a name = "procOther">Example: Performing a process with another organism</a></li>
Nod factor export proteins transfer nod factors out of the purple bacterium <i> Sinorhizobium meliloti </i> into the surrounding soil. Here they are detected by LysM nod factor receptor kinases in <i> Medicago truncatula </i> roots and initiate the process of nodulation.<br>
Annotation of Nod factor export ATP-binding protein I from <i> S. meliloti </i>
suggest a new term induction of nodule morphogenesis in host
<blockquote>nodulation ; GO:0009877
[p] induction of nodule morphogenesis in host ; GO:00new01</blockquote>
<i> Sinorhizobium meliloti </i> taxonomy ID: 382
<i> Medicago truncatula </i> taxonomy ID: 3880
<blockquote>protein name: Nod factor export ATP-binding protein I
GO term: induction of nodule morphogenesis in host ; GO:00new01
taxon column: taxon:382|taxon:3880</blockquote>
Annotation of LysM receptor kinase LYK3 precursor from <i> M. truncatula </i>
suggest a new term induction of nodule morphogenesis by symbiont
<blockquote>nodulation ; GO:0009877
[p] induction of nodule morphogenesis by symbiont ; GO:00new02</blockquote>
<i> Medicago truncatula </i> taxonomy ID: 3880
<i> Sinorhizobium meliloti </i> taxonomy ID: 382
<blockquote>protein name: LysM receptor kinase LYK3 precursor
GO term: induction of nodule morphogenesis by symbiont ; GO:00new02
taxon column: taxon:3880|taxon:382</blockquote>

<li><a name = "more">Example: Performing a process in more than one species</a></li>
The protein cardiotoxin from the southern Indonesian spitting cobra <i> Naja sputatrix </i> kills mammalian cells by cytolysis when it enters the host cell cytoplasm.
Annotation of cardiotoxin precursor, from <i> N. sputatrix </i>
use the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430
<i> Naja sputatrix </i> taxonomy ID: 33626
<i> Mammalia </i> taxonomy ID: 40674
<blockquote>protein name: cardiotoxin precursor
GO term: cytolysis of cells of another organism ; GO:0051715
taxon column: taxon:33626|taxon:40674
protein name: cardiotoxin precursor
GO term: host cell cytoplasm ; GO:0030430
taxon column: taxon:33626|taxon:40674</blockquote>
<li><a name = "regulating">Example: Regulating a process in another organism</a></li>
Mosquito saliva contains D7 proteins, which bind biogenic amines in order to suppress hemostasis in humans.
Annotation of D7 protein long form, from <i> A. gambiae </i>
suggest a new term negative regulation of hemostasis in host
<blockquote>evasion of host defense response ; GO:0030682
[i] negative regulation of hemostasis in host ; GO:00new03</blockquote>
<i> Anopheles gambiae </i> taxonomy ID: 7165
<i> Homo sapiens </i> taxonomy ID: 9606
<blockquote>protein name: D7 protein long form
GO term: negative regulation of hemostasis in host ; GO:00new03
taxon column: taxon:7165|taxon:9606</blockquote>
</ul></ul>
<h4><a name = "downstream">Downstream Process guidelines</a></h4>
Where there is limited knowledge regarding the processes that a gene product is directly involved in, curators may often have annotated to terms that describe the processes that are downstream of the direct activity of the gene product. Where more knowledge regarding a gene product's functional activity exists, curators need to make a judgement as to how to represent its direct activities and whether to continue to include downstream processes in the annotation set. Curators are encouraged to request more specific terms to describe how the gene product is involved in a downstream process and also evaluate the annotation set as more functional information becomes available. More detailed curator guidance is provided below.
<ul><li><a name = "specificTerms">Requesting more specific terms for downstream processes</a></li>
Where a specific, descriptive GO term does not exist (for instance to describe the involvement of a process in another process), curators are encouraged to request these terms to provide more specificity to their annotation.
For example, to describing the "intent" of growth factor BMP2 to change the "state" of the cell is instrumental in cardiac cell differentiation. Therefore requesting the new GO term BMP signaling involved in cardiac cell differentiation would make it possible to qualify how the gene product is involved in the downstream process of cardiac cell differentiation than annotating to separate terms BMP signaling and cardiac cell differentiation.
<li><a name = "core">Annotating downstream processes for gene products involved in core or specific processes</a></li>
Curators should annotate to the experimental evidence in the paper. However, curator judgement should be used, taking into account what the curator knows about:
<ol>
<li>The background of the gene product; is it widely known to have a central role causing it to affect multiple processes, or does it have few specific targets?</li>
<li>the quality of the experimental assays performed in the paper; are they fully explained and the evidence supplied convincing? (See separate guidelines for annotation of high-throughput experiments.)</li>
</ol>
Example 1. Gene product involved in core process.
Yeast RNA polymerase II subunit RPB2
RNA polymerase II subunit RPB2 has a core function of RNA polymerase activity, which has downstream effects on a large number of processes. However, curators should only annotate to the gene product's transcription activity, rather than the multiple downstream processes altered as a consequence of its activity.
Yeast spliceosome
In <i>S. cerevisiae</i>, the mutation of several genes that are components of the spliceosome result in translation defects. However, later work supplied evidence for the genes' involvement in mRNA splicing, <b>not</b> translation. Downstream effects on translation are to be expected as many ribosomal transcripts are spliced in yeast. The curation decision was to remove annotations to the term translation for spliceosome component genes once data was available to describe the direct activity the genes contributed towards.
Example 2. Gene product involved in core and specific process(es).
<i>S. pombe</i> gene Sre1
The <i>S. pombe</i> gene Sre1 is a transcriptional regulator of genes that are involved in heme and phosphoplipid biosynthesis. From reading <a href = "http://www.ncbi.nlm.nih.gov/pubmed/16537923">PMID:16537923</a> the curator decided this information should be captured in the annotation. Therefore annotations were made to:
<ul>
<li>RNA polymerase II core promoter proximal region sequence-specific DNA binding</li>
<li>regulation of transcription, DNA-dependent or regulation of transcription from RNA polymerase II promoter</li>
<li>positive regulation of heme biosynthesis</li>
<li>positive regulation of phospholipid biosynthesis</li>
</ul>
In addition, in accordance with these guidelines for annotating downstream processes, we would recommend that new terms are requested for:
<ul>
<li>regulation of transcription involved in heme biosynthesis</li>
<li>regulation of transcription involved in phospholipid biosynthesis</li>
</ul>
<li><a name = "ligand">Annotating downstream processes to gene products in a ligand-receptor signaling pathway</a></li>
Curators should anotate ligand-receptor signaling pathways as shown in the following diagrams.
For a signaling pathway, the ligand is considered part of the pathway. Therefore a factor which limits or increases the availability of a ligand to a receptor should be annotated as regulating the ligand/receptor pathway.
N.b. Ongoing work to clarify of the start/end of a signaling pathway in the definition of GO terms will allow us to refine these guidelines.
<li><a name = "general">General ligand-receptor pathway</a></li><br>
<img src="/sites/default/files/public/diag-annot-ligand-recep-pwy.gif" width="496" height="741" alt="diag-annot-ligand-recep-pwy.gif" />
<br>
<dl>
<dt>Stimulus</dt>
<dd>regulation of signaling pathway</dd>
<dt>Ligand</dt>
<dd>signaling pathway</dd>
<dd>regulation of <var>other cellular</var> processes</dd>
<dt>Receptor</dt>
<dd>signaling pathway</dd>
<dd>regulation of <var>other cellular</var> processes</dd>
<dt>Signaling molecules</dt>
<dd>signaling pathway</dd>
<dd>regulation of <var>other</var> process(es)</dd>
<dd>regulation of gene-specific transcription</dd>
<dd>regulation of translation</dd>
<dd>(regulation of) transcription in response to <var>stimulus ligand</var></dd>
<dd>(regulation of) transcription involved in <var>other</var> process(es)</dd>
<dd>(regulation of ) <var>other cellular</var> process(es)</dd>
<dt>Transcription factors*</dt>
<dd>signaling pathway</dd>
<dd>regulation of transcription involved in <var>other</var> process(es)</dd>
<dt>Target</dt>
<dd>cellular response to stimulus</dd>
<dd><var>other</var> process(es)</dd>
<dd>regulation of <var>other</var> processes</dd>
</dl>
<p>We would not consider annotating the core transcription machinery to the downstream (other) processes that the target is involved in unless the transcription factor is gene-specific, in which case we would annotate to regulation of transcription involved in <var>other</var> process(es)</p>
<li><a name = "glucose">Regulation of glucose transport</a></li>
<img src="/sites/default/files/public/diag-annot-gluc-transport.gif" width="331" height="586" alt="diag-annot-gluc-transport.gif" />
<dl>
<dt>Insulin (ligand)</dt>
<dd>insulin receptor signaling pathway</dd>
<dd>regulation of glucose transport/homeostasis</dd>
<dt>Insulin receptor (receptor)</dt>
<dd>Insulin receptor signaling pathway</dd>
<dd>Regulation of glucose transport/homeostasis</dd>
<dt>IRS1, PI3K, PDK1, PKC (signaling molecules)</dt>
<dd>Insulin receptor signaling pathway</dd>
<dd>Regulation of glucose transport/homeostasis</dd>
<dd>Protein localization at cell surface (NTR: involved in response to insulin)</dd>
<dt>GLUT4 (target)</dt>
<dd>Cellular response to insulin</dd>
<dd>Glucose transport/homeostasis</dd>
</dl>
<li><a name = "note">General note on current status of revision of annotation sets</a></li>
If a gene product has limited experimental literature, such as a newly characterised protein, it is understandable that curators need to annotate to more general 'downstream' process terms that may, in fact, represent a phenotype.
However, as more functional information is published about a gene product, curators may decide to revise these annotations to downstream processes. However currently different actions are taken by different curation groups, based on considerations of user requirements and curation capacity:
<ol>
<li>Annotations may be removed to indirect/downstream processes, or updated to 'regulation' terms. This 'deleted' information is usually stored in the annotating group's phenotype database.
</li>
<li>Annotations <strong>not</strong> removed to indirect/downstream processes because
<ul>
<li>downstream annotations are supported by good evidence, or the group wants to keep as history of annotation or give a complete overview of knowledge about the gene product. </li>
<li>the curation group does not have resources to revise annotation sets or do not have alternative place to store data</li>
</ul>
</li>
</ol>
Curation groups need to be aware that keep annotations to downstream processes will be a source of such data to other groups who may have a different annotation philosophy.</ul></ul>
<h4><a name ="binding">Binding guidelines</a></h4>
<ul>
<li><a name = "sustrates">Using terms that imply binding of substrates</a></li>
As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.
<li><a name = "general comment on protein binding">Protein binding annotations in the Gene Ontology</a></li>
The Molecular Function (MF) ontology can be used to capture macromolecular interactions, such as protein-protein, protein-nucleic acid, protein-lipid interactions, etc. While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its child terms. In making these annotations, contributing groups may follow slightly different practices with respect to the types of experimental evidence used
to support these inferences, e.g. some groups may use co-immunoprecipitation as supporting evidence for a protein binding annotation between two gene products, others not. However, all groups generally adhere to the principle that, when annotated, protein binding interactions inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are discouraged as sources of GO MF annotations.
<li><a name ="descriptive">Choosing more descriptive terms than 'protein binding'</a></li>
Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see <a href = "/page/go-annotation-file-gaf-format-20">GO Annotation File Format 2.0 Guide</a>.
<li><a name ="partners">Identifying binding partners using columns 8 and 16</a></li>
When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see <a href = "/page/guide-go-evidence-codes">evidence code documentation</a>.<br>
<strong> Examples of using the 'with' column (8) </strong><br>
The annotation of <em> Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) </em> makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.<br>
<ol>
<li> Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin. </li>
<li> Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation <b> Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S </b> captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog. </li>
</ol><br>
<strong> Examples of using the annotation extension column (16) </strong>
<p>The annotation of <b> Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) </b> makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
<ul><li>The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in <a href = "http://www.ncbi.nlm.nih.gov/pubmed/19668196">PMID:19668196</a>. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
<li>The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in <a href = "http://www.ncbi.nlm.nih.gov/pubmed/17408620?dopt=Abstract">PMID:17408620</a>, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.</ul>
<li><a name = "ontology">Ontology development for protein binding</a></li>
Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.
</ul>
<h4><a name = "response">'Response to' guidelines</a></h4>
The definition of the top-level 'response to' terms has been updated to indicate where the response begins and ends:
Any process that results in a change in state or activity of a cell or organism as the result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism.
This change was made and released in ontology version 1.1960<br>
<b> Examples: </b>
<ol><li>response to stimulus ; GO:0050896
Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism
<li>GO:0051716 cellular response to stimulus
Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell.</ul>
Advisory quality control check: High level 'response to' terms should not directly be used for annotation, unless additional information is supplied in column 16.
Be careful to use IEP when the experiment is observing expression level. Example: PMID:8888624 and annotation for <i>A. thaliana</i> <a href = "http://www.arabidopsis.org/servlets/TairObject?accession=locus:2182783">BIP1</a>. Should use IEP than IDA.
<h4><a name ="regulationTerms">Use of Regulation Terms</a></h4>
<ul><li> <a name="#background">Background </a></li>
The GO Consortium recognized quite early on in the development of the Biological Process ontology that there were gene products that participated directly in a process and gene products that regulated a process, positively and/or negatively. But how do curators know to which of these terms they should be annotating and is it possible, for a given process, to annotate the same gene product to both a parent term and one of its associated regulation term?
To begin to address these questions here are some guidelines for annotating, or not, to regulation terms:
<li> <a name="#guide1">Guideline 1: Use existing biological knowledge to define the process. </a></li>
In order to determine whether a gene product participates in a process or regulates that process (or both) curators need to consider the nature of the process. Processes can be considered as ordered assemblies of molecular functions and every process has a beginning, middle, and end.
Use existing biological knowledge and the paper being curated as guides. Is there a defined pathway, i.e. distinct molecular functions, and have the gene products that perform those functions been identified? Does the gene product being annotated perform one of those functions or a function outside of the process that might start, stop, or change the rate at which the process proceeds?
In reality, the beginning, middle, and end of some processes will be easier to define than others. For example, signaling pathways, such as MAPK signaling, will be easier to define than broader, organismal-level processes such as embryonic development. Curators should use their jugdement, based on the published literature, to guide their annotation.<br>
<strong>Example: Atg1</strong>
Saccharomyces cerevisiae Atg1 encodes a protein kinase that is involved in autophagy: "The process by which cells digest parts of their own cytoplasm; allows for both recycling of macromolecular constituents under conditions of cellular stress and remodeling the intracellular structure for cell differentiation."
Atg1 activity is critical for the induction of autophagy, specifically for formation of autophagic vacuoles. Should Atg1 be annotated to autophagic vacuole formation or regulation of autophagic vacuole formation? Authors have used language that could lead curators to make annotations to either term.
In this case, annotators need to consider the sum of what is known about the autophagic pathway and Atg1's role in that pathway.
Using that knowledge, SGD has annotated Atg1 to the parent process term, autophagic vacuole formation, because once Atg1 is active, the 'go' or 'no go' decision for autophagy has already been made. More upstream genes appear to actually be regulating the autophagic pathway.
http://wiki.geneontology.org/index.php/2010_GO_camp_Use_of_Regulation_issues#Example_2
<li> <a name="#guide2">Guideline 2: If you aren't sure, consider annotating to the parent process term. </a></li>
If the gene product performs one of the functions, annotate directly to the process. If the gene product regulates then it should be annotated to regulation of that process.
If you aren't sure what term to use, annotate to the parent process term. As more information about the process becomes available, you may be able to refine your annotations (see Guideline #4 below).
<li> <a name="#guide3">Guideline 3: Improve the ontology by defining, wherever possible, the beginning, middle, and end of a process. </a></li>
Wherever possible, include the beginning, middle, and end of a process in the corresponding term definition. This will help annotators choose the appropriate term for their annotations.
<li> <a name ="#guide4">Guideline 4: Revisit annotations when new knowledge becomes available. </a></li>
GO annotations should reflect the present state of biological knowledge. Therefore, as the understanding of a biological process improves, it may be necessary to revisit and refine existing annotations.
<li> <a name="#guide5">Guideline 5: Annotations based on mutant phenotypes should take mechanism into account.</a></li>
Mutant phenotypes are often used to make annotations to regulation terms because they fit the criteria of the term definition, i.e. authors report a change in the frequency, rate, or extent of a process.
However, in using IMP to correctly make regulation annotations it is important to consider various factors, including: 1) the assay type, 2) nature of the alleles (null vs reduction of function), and 3) molecular identity of the gene product.
Again, if it isn't clear that a gene product is involved in regulation, it is better to annotate to the parent process term.
<strong>Example: muscle contraction and <i> C. elegans </i> mutants</strong>
In <i>C. elegans</i>, a number of genes can mutate to paralysis or slowed locomotion due to defects in muscle contraction. This includes genes that encode everything from myosin heavy chain to calcium channels to transcription factors. Depending upon the nature of the allele, sometimes the mutant phenotypes for the same gene can lead to both process and regulation terms. In this case, consideration of the process, the nature of the allele (complete or partial loss of function), and the molecular identity of the gene product can guide curators in making the appropriate annotation.
http://wiki.geneontology.org/images/4/47/Regulation_example.pdf
<li><a name ="#guide6">Guideline 6: Some gene products may be annotated to both a process and regulation of that process.</a></li>
Positive and negative feedback loops are an essential part of many signaling pathways.
If one member of a pathway regulates the activity of a <em> different </em> member of the pathway, it could be annotated to both the process and regulation of that process.
When annotating gene products involved in a signaling pathway, however, curators should not annotate gene products that directly activate the next gene product in the pathway to regulation of that pathway.
For example, MAPKK would not be annotated to positive regulation of MAPKKK cascade just because it phosphorylates and activates MAPK.
However, gene products that (for example) feedback on to earlier steps in the pathway, may be annotated to both the parent process term and a regulation term.<br>
<strong>Example: ERK1/2</strong>
ERK1/2 activation requires activity of FRS2alpha which, in turn, is negatively regulated by activated ERK1/2.
Could ERK1/2 be annotated to both MAPKKK cascade and negative regulation of MAPKKK cascade?
<a href = "http://www.molbiolcell.org/content/21/4/664.full">Phosphoprotein Enriched in Astrocytes 15 kDa (PEA-15) Reprograms Growth Factor Signaling by Inhibiting Threonine Phosphorylation of Fibroblast Receptor Substrate 2{alpha}</a>
Cases where the presence/absence of one of the members of a pathway is limiting should not be annotated to regulation, e.g. if the amount of a receptor on the surface of a cell regulates the process, the receptor should <em> not </em> be annotated to the regulation term.</ul>
<h4><a name ="txn">Use of Transcription related terms</a></h4>

<p>The transcription branch of the ontology was overhauled in 2011 to remove any overlap between <strong>Function</strong> and <strong>Process</strong> terms and to accurately represent <strong>Function</strong> terms so they actually describe molecular activities (<em>how</em> something occurs). <a href = "http://gocwiki.geneontology.org/index.php/Proposals_to_overhaul_transcription_in_GO_-_2010" target="blank">You may read details of the overhaul here</a>. </p>

<p>These changes will consequently affect annotations. For example, if the experiments indicate that a gene product is involved in regulating transcription but gave no indication on <em>how</em> it acts, it would be appropriate to annotate that gene product only to <strong>Process</strong> terms. The <a href="ftp://ftp.geneontology.org/pub/go/www/txnAnnotationGuide.pdf" target="blank">Transcription Annotation Guide</a> is available to facilitate the process of annotating gene products using this new ontology structure.</p>


[[Category: Annotation]] Pascale
Categories: GO Internal

Authoritative Database Groups

GO wiki (new pages) - Mon, 09/17/2018 - 12:54

Pascale:


Moved from the Website - contents to be discussed

Authoritative Database Groups
Where two or more databases are submitting data on the same species the GO Consortium encourages the model whereby one database group collects all annotation data for that species, removes the redundant (duplicate) annotations, and then submits the total dataset to the central repository. This ensures that no redundant annotations will appear in the master dataset. The table below documents those species for which a single database group is responsible for collating and submitting annotations.
The format of the IDs used by these database groups can be found in the list of GO database abbreviations. For converting between different ID types, please see the tools for ID mapping on the GO wiki.
More information on authoritative database groups and avoiding redundancy can be found in the GO annotation policies and guidelines.
<table summary="List of model organisms and the database group responsible for providing the annotations for that species">
<caption>
GO Consortium groups responsible for all annotations for a species
</caption>
<thead>
<tr>
<th>
Project name
</th>
<th>
Species
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
Candida Genome Database
</td>
<td>
<ul>
<li>
<i>Candida albicans</i>, taxon:5476
</li>
</ul>
</td>
</tr>
<tr>
<td>
dictyBase
</td>
<td>
<ul>
<li>
<i>Dictyostelium </i>, taxon:5782
</li>
<li>
<i>Dictyostelium discoideum</i>, taxon:44689
</li>
<li>
<i>Dictyostelium discoideum AX2</i>, taxon:366501
</li>
<li>
<i>Dictyostelium discoideum AX4</i>, taxon:352472
</li>
</ul>
</td>
</tr>
<tr>
<td>
FlyBase
</td>
<td>
<ul>
<li>
<i>Drosophila melanogaster</i>, taxon:7227 (fruit fly)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Leishmania major GeneDB
</td>
<td>
<ul>
<li>
<i>Leishmania major</i>, taxon:5664
</li>
</ul>
</td>
</tr>
<tr>
<td>
Plasmodium falciparum GeneDB
</td>
<td>
<ul>
<li>
<i>Plasmodium falciparum</i>, taxon:5833 (malaria parasite P. falciparum)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Pombase
</td>
<td>
<ul>
<li>
<i>Schizosaccharomyces pombe</i>, taxon:4896 (fission yeast)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Trypanosoma brucei GeneDB
</td>
<td>
<ul>
<li>
<i>Trypanosoma brucei TREU927</i>, taxon:185431
</li>
</ul>
</td>
</tr>
<tr>
<td>
Glossina morsitans GeneDB
</td>
<td>
<ul>
<li>
<i>Glossina morsitans morsitans</i>, taxon:37546
</li>
</ul>
</td>
</tr>
<tr>
<td>
goa_chicken, UniProtKB-GOA
</td>
<td>
<ul>
<li>
<i>Gallus gallus</i>, taxon:9031 (chicken)
</li>
<li>
<i>Gallus gallus bankiva</i>, taxon:208525
</li>
<li>
<i>Gallus gallus gallus</i>, taxon:208526
</li>
<li>
<i>Gallus gallus murghi</i>, taxon:400035
</li>
<li>
<i>Gallus gallus spadiceus</i>, taxon:208524
</li>
</ul>
</td>
</tr>
<tr>
<td>
goa_cow, UniProtKB-GOA
</td>
<td>
<ul>
<li>
<i>Bos taurus</i>, taxon:9913 (cattle)
</li>
<li>
<i>Bos taurus X Bison bison</i>, taxon:297284 (beefalo)
</li>
<li>
<i>Bos taurus x Bos indicus</i>, taxon:30523
</li>
</ul>
</td>
</tr>
<tr>
<td>
goa_human, UniProtKB-GOA
</td>
<td>
<ul>
<li>
<i>Homo sapiens</i>, taxon:9606 (human)
</li>
</ul>
</td>
</tr>
<tr>
<td>
gramene_oryza, Gramene
</td>
<td>
<ul>
<li>
<i>Oryza alta</i>, taxon:52545
</li>
<li>
<i>Oryza australiensis</i>, taxon:4532
</li>
<li>
<i>Oryza barthii</i>, taxon:65489
</li>
<li>
<i>Oryza brachyantha</i>, taxon:4533
</li>
<li>
<i>Oryza coarctata</i>, taxon:77588
</li>
<li>
<i>Oryza eichingeri</i>, taxon:29689
</li>
<li>
<i>Oryza glaberrima</i>, taxon:4538 (African rice)
</li>
<li>
<i>Oryza glumipatula</i>, taxon:40148
</li>
<li>
<i>Oryza grandiglumis</i>, taxon:29690
</li>
<li>
<i>Oryza granulata</i>, taxon:110450
</li>
<li>
<i>Oryza latifolia</i>, taxon:4534
</li>
<li>
<i>Oryza longiglumis</i>, taxon:83309
</li>
<li>
<i>Oryza longistaminata</i>, taxon:4528
</li>
<li>
<i>Oryza malampuzhaensis</i>, taxon:127571
</li>
<li>
<i>Oryza meridionalis</i>, taxon:40149
</li>
<li>
<i>Oryza meyeriana</i>, taxon:83307
</li>
<li>
<i>Oryza minuta</i>, taxon:63629
</li>
<li>
<i>Oryza nivara</i>, taxon:4536
</li>
<li>
<i>Oryza officinalis</i>, taxon:4535
</li>
<li>
<i>Oryza punctata</i>, taxon:4537
</li>
<li>
<i>Oryza rhizomatis</i>, taxon:65491
</li>
<li>
<i>Oryza ridleyi</i>, taxon:83308
</li>
<li>
<i>Oryza rufipogon</i>, taxon:4529
</li>
<li>
<i>Oryza sativa</i>, taxon:4530 (rice)
</li>
<li>
<i>Oryza sativa Indica Group</i>, taxon:39946
</li>
<li>
<i>Oryza sativa Japonica Group</i>, taxon:39947
</li>
<li>
<i>Oryza schlechteri</i>, taxon:110451
</li>
<li>
<i>Oryza sp. IRGC 105360</i>, taxon:364100
</li>
<li>
<i>Oryza sp. IRGC 81916</i>, taxon:364099
</li>
<li>
<i>Panicum </i>, taxon:4539
</li>
</ul>
</td>
</tr>
<tr>
<td>
Mouse Genome Informatics
</td>
<td>
<ul>
<li>
<i>Mus musculus</i>, taxon:10090 (house mouse)
</li>
</ul>
</td>
</tr>
<tr>
<td>
PAMGO_Atumefaciens
</td>
<td>
<ul>
<li>
<i>Agrobacterium tumefaciens str. C58</i>, taxon:176299
</li>
</ul>
</td>
</tr>
<tr>
<td>
Rat Genome Database
</td>
<td>
<ul>
<li>
<i>Rattus norvegicus</i>, taxon:10116 (Norway rat)
</li>
</ul>
</td>
</tr>
<tr>
<td>
Saccharomyces Genome Database
</td>
<td>
<ul>
<li>
<i>Saccharomyces cerevisiae</i>, taxon:4932 (baker's yeast)
</li>
<li>
<i>Saccharomyces cerevisiae RM11-1a</i>, taxon:285006
</li>
<li>
<i>Saccharomyces cerevisiae YJM789</i>, taxon:307796
</li>
<li>
<i>Saccharomyces cerevisiae var. diastaticus</i>, taxon:41870
</li>
</ul>
</td>
</tr>
<tr>
<td>
The Arabidopsis Information Resource
</td>
<td>
<ul>
<li>
<i>Arabidopsis thaliana</i>, taxon:3702 (thale cress)
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Aphagocytophilum
</td>
<td>
<ul>
<li>
<i>Anaplasma phagocytophilum HZ</i>, taxon:212042
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Banthracis
</td>
<td>
<ul>
<li>
<i>Bacillus anthracis str. Ames</i>, taxon:198094
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cburnetii
</td>
<td>
<ul>
<li>
<i>Coxiella burnetii RSA 493</i>, taxon:227377
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Chydrogenoformans
</td>
<td>
<ul>
<li>
<i>Carboxydothermus hydrogenoformans Z-2901</i>, taxon:246194
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cjejuni
</td>
<td>
<ul>
<li>
<i>Campylobacter jejuni RM1221</i>, taxon:195099
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cperfringens
</td>
<td>
<ul>
<li>
<i>Clostridium perfringens ATCC 13124</i>, taxon:195103
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Cpsychrerythraea
</td>
<td>
<ul>
<li>
<i>Colwellia psychrerythraea 34H</i>, taxon:167879
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Dethenogenes
</td>
<td>
<ul>
<li>
<i>Dehalococcoides ethenogenes 195</i>, taxon:243164
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Echaffeensis
</td>
<td>
<ul>
<li>
<i>Ehrlichia chaffeensis str. Arkansas</i>, taxon:205920
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Gsulfurreducens
</td>
<td>
<ul>
<li>
<i>Geobacter sulfurreducens PCA</i>, taxon:243231
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Hneptunium
</td>
<td>
<ul>
<li>
<i>Hyphomonas neptunium ATCC 15444</i>, taxon:228405
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Lmonocytogenes
</td>
<td>
<ul>
<li>
<i>Listeria monocytogenes str. 4b F2365</i>, taxon:265669
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Mcapsulatus
</td>
<td>
<ul>
<li>
<i>Methylococcus capsulatus str. Bath</i>, taxon:243233
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Nsennetsu
</td>
<td>
<ul>
<li>
<i>Neorickettsia sennetsu str. Miyayama</i>, taxon:222891
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Pfluorescens
</td>
<td>
<ul>
<li>
<i>Pseudomonas fluorescens Pf-5</i>, taxon:220664
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Psyringae
</td>
<td>
<ul>
<li>
<i>Pseudomonas syringae pv. tomato str. DC3000</i>, taxon:223283
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Psyringae_phaseolicola
</td>
<td>
<ul>
<li>
<i>Pseudomonas syringae pv. phaseolicola 1448A</i>, taxon:264730
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Soneidensis
</td>
<td>
<ul>
<li>
<i>Shewanella oneidensis MR-1</i>, taxon:211586
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Spomeroyi
</td>
<td>
<ul>
<li>
<i>Silicibacter pomeroyi DSS-3</i>, taxon:246200
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Tbrucei_chr2
</td>
<td>
<ul>
<li>
<i>Trypanosoma brucei</i>, taxon:5691
</li>
</ul>
</td>
</tr>
<tr>
<td>
tigr_Vcholerae
</td>
<td>
<ul>
<li>
<i>Vibrio cholerae O1 biovar El tor</i>, taxon:686
</li>
</ul>
</td>
</tr>
<tr>
<td>
WormBase database of nematode biology
</td>
<td>
<ul>
<li>
<i>Caenorhabditis elegans</i>, taxon:6239
</li>
</ul>
</td>
</tr>
<tr>
<td>
Zebrafish Information Network
</td>
<td>
<ul>
<li>
<i>Danio rerio</i>, taxon:7955 (zebrafish)
</li>
</ul>
</td>
</tr>
</tbody>
</table>

[[Category:Managers]] Pascale
Categories: GO Internal

Guidelines for new Biological Processes

GO wiki (new pages) - Fri, 09/14/2018 - 07:40

Pascale: Created page with " http://www.geneontology.org/page/biological-process-ontology-guidelines Category:OntologyCategory:GO Editors"


http://www.geneontology.org/page/biological-process-ontology-guidelines

[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

Guidelines for new Cellular Components

GO wiki (new pages) - Fri, 09/14/2018 - 07:28

Pascale: Created page with " * Requesting a New Complex ID from IntAct Category:OntologyCategory:GO Editors"


* [[Requesting a New Complex ID from IntAct]]

[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

Guidelines for new Molecular Functions

GO wiki (new pages) - Fri, 09/14/2018 - 07:27

Pascale:



* [[Curator Guide: Enzymes and Reactions]] (being reviewed)
* [[Notes on specific terms]] (being reviewed)
* Specificity of activities

https://github.com/geneontology/go-ontology/issues/12257 @thomaspd commented that "An activity should not be named after a gene product, as a gene product could potentially have multiple molecular functions and not just the one it's named after. Also, protein phosphatase inhibitor means "directly [i.e. via direct physical interaction] inhibits some protein phosphatase activity", not "directly inhibits some protein named protein phosphatase 2A"-- properly speaking you don't inhibit a protein, you inhibit the activity of a protein."

This was discussed briefly during the GO editors call: http://wiki.geneontology.org/index.php/Ontology_meeting_2016-01-28#TermGenie_review_queue

While attendees were in general in agreement with the above, @tberardini pointed to cases in which we may want to keep the specificity, such as http://amigo.geneontology.org/amigo/term/GO:0018024#display-lineage-tab

* [[Editorial-type_sections_copied_over_from_the_GO_website_ontology_documentation:_MF]]




[[Category:Ontology]][[Category:GO Editors]] Pascale
Categories: GO Internal

GO import files

GO wiki (new pages) - Fri, 09/14/2018 - 06:54

Pascale:

=Protocol for managing import files=
[[Adding_Terms_and_Regenerating_the_Import_Files]]

=What is an import file?=
Some terms from external ontologies are used in equivalence axioms and subclass assertions in certain GO classes. External classes are imported as needed by generating import files that contain the necessary class hierarchy from the external ontology. The external class hierarchy is used for reasoning in GO.

===Direct imports===
* chebi_import.owl
* caro_import.owl
* cl_import.owl
* go-brige.owl
* fao_import.owl
* go-gci.owl
* go-taxon-groupings.owl
* ncbitaxon_import.owl
* oba_import.owl
* pato_import.owl
* po_import.owl
* pr_import.owl
* ro_import.owl
* ro_pending.owl
* so_import.owl
* uberon_import.owl
* x-disjoint.owl


===Indirect imports===
* taxslim.owl


[[Category:GO Editors]][[Category:Ontology]][[Category:Editor_Guide_2018]] Pascale
Categories: GO Internal

Ontology meeting 2018-09-17

GO wiki (new pages) - Thu, 09/13/2018 - 07:08

David: /* Ontology Developers' workshop */

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =


== Editors Discussion ==


==Meetings==

===Montreal GOC Meeting===
*Add name to attendees list if you're going

===Ontology Developers' workshop===
The meeting will be in Geneva December (8)9-13
* When to book flights and hotel?
====Topics====
* Filling in missing GO-Rhea-Reactome xrefs
** moving from reactions to biochemical pathways
** implementation of a method to keep in synch with all three resources
** GO-CAM templates for pathways??
* GH ticket work and mini-project planning
** Dealing with and migrating binding terms
** Dealing with cellular processes (https://github.com/geneontology/go-ontology/issues/12849)
** Taxon constaints- How do we handle them going forward?
* Attendees
** David H (Can make it)
** Kimberly (Can make it)
** Karen
** Harold (Can make it)
** Pascale (Can make it)
** Barbara (Maybe make it)
** Alan B (Rhea, can make any day)
** Anne (Rhea, prefers Monday or Tuesday)
** Chris
** Peter D (Can make it)
** Ben?
** Jim?
** Paul T (Can make it)

==GH project link==
https://github.com/geneontology/go-ontology/projects/1

= Minutes =
*On call:



[[Category: Ontology]]
[[Category: Meetings]] David
Categories: GO Internal

Annotation Conf. Call 2018-09-11

GO wiki (new pages) - Fri, 09/07/2018 - 12:02

Vanaukenk: /* Pipeline */

= Meeting URL =
*https://stanford.zoom.us/j/976175422

= Agenda =
== Montreal Meeting, October Wednesday, 17th - Friday, 19th ==
*[http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Logistics Logistics]
*[http://wiki.geneontology.org/index.php/2018_Montreal_GOC_Meeting_Agenda Agenda]

== GO Conference Calls ==
=== Tuesdays ===
*Still will be Tuesdays at 8am PDT
*Proposed meeting schedule:
**1st Tuesday: Alliance Biological Function
**2nd Tuesday: GO Consortium
**3rd Tuesday: Alliance Biological Function/GO-CAM Working Group
**4th Tuesday: GO-CAM Working Group
**5th Tuesday: ad hoc, as needed
*One Zoom URL for all - https://stanford.zoom.us/j/976175422

== Pipeline ==
*Directionality of file transfer should now be GO -> Annotation group ftp (or something similar)
*Groups should no longer be submitting files to SVN
*Source URL should be indicated in the [https://github.com/geneontology/go-site/tree/master/metadata/datasets datasets.yaml files]

== Annotation Discussion ==
=== ND Annotations ===
*[https://github.com/geneontology/go-annotation/issues/2045 Automatic deletion of ND-evidenced annotations]
*ND annotations to a root node term are intended to signify that the curator looked at all available evidence and the MF, BP, or CC of a gene/gene product is not known
*GOA proposed to remove ND annotations automatically from '''their''' curation database because this was not happening manually
**Note that curators can only update or remove annotations from their respective groups in Protein2GO
**Annotation from only a select set of evidence codes would result in automatic removal of ND annotations
ECO:0000269 (EXP) and its descendants

ECO:0006056 (HTP) and its descendants

ECO:0000250 (ISS) and its descendants

ECO:0000317 (IGC) and its descendants
*Generally agreed that when there is experimental or sequence-based evidence for a more granular annotation, the ND annotation should be removed
**However, 'protein binding' MF annotations are thought of differently by different groups and we need to reach a consensus about how these MF annotations should be considered wrt automatically removing existing MF ND annotations
*Proposal: as part of QC pipeline, alert groups to genes/gene products that have an ND annotation and also an annotation to a more granular term (any evidence code?)

=== PAINT Annotations ===
*Refresher on how PAINT annotations are created
*[https://github.com/geneontology/go-annotation/issues/2042 Doubled up IBA+EXP annotations (from Karen Christie)]
*[https://github.com/geneontology/go-annotation/issues/2049 PAINT:circular transfer of annotations]

=== Annotation Review ===
*84 open annotation review tickets
*Reminder to check go-annotation tracker tickets where your group has been assigned and finish up where you can
*Questions? Please add questions/comments to the individual tickets
*We can discuss questions on future calls, if needed

= Minutes =
*On call: David, Edith, George, Harold, Helen, Karen, Kimberly, Laurent-Philippe, Li, Liz, Marie-Claire, Michele, Midori, Pascale, Paul T., Sabrina, Shur-Jen, Seth, Stacia, Suzi A, Suzi L, Tanya, Tom, Stacia, Val

== Annotation Discussion ==
=== ND Annotations ===
*We discussed the proposal to automatically remove ND annotations from the GOA database when experimental annotations from another source are available.
*Generally, removing ND annotations once a suitable, more granular term can be assigned to a gene/gene product is standard practice.
*However, not all ND annotations in Protein2GO can be edited by all groups and having an automated mechanism for dealing with ND annotations that are no longer applicable is desirable.
**One sticking point, however, has been what constitutes an appropriate Molecular Function annotation for removing NDs to the root MF node, specifically wrt child terms in the 'binding' (GO:0005488) hierarchy of MF.
**Some groups do not consider annotations to 'protein binding' (GO:0005515) sufficient to otherwise remove an ND annotation to root MF; other groups do.
***Annotations to child terms of 'protein binding' (GO:0005515) are viewed differently, however, by some groups who feel that the more granular protein binding terms provide useful information.
**Some groups also view annotations to terms like 'ATP binding' (GO:0005524) suitable for supplanting an ND annotation to root MF.
*We discussed various options around this issue, e.g. handling the potentially contradictory information at the level of display, allowing binding annotations to supplant ND annotations, moving binding out from under MF and having it as a separate ontology.
*For now, the automatic removal of ND annotations in Protein2GO is on hold.
*ACTION ITEM: Kimberly will come up with more concrete proposals for the different options and we will discuss them and pick the best option.

=== PAINT Annotations ===
*We also discussed whether it is okay to include IBA annotations derived from PAINT curation when an experimental annotation that was used to create the IBA annotation exists for that gene product.
*Are these annotations circular or do they provide an additional level of QC that should also be captured?
*Note that during PAINT curation, propagation of an annotation to a given node does not require that there is experimental evidence for more than one leaf node, i.e. annotations may be propagated based on one experimentally supported annotation and a review of the overall conservation of family members.
*If we consider propagation from one experimental annotation plus family review a good form of QC, then do other forms of QC warrant inclusion in our annotation set and if so, what would the evidence code be?
*Should we just not allow propagation of annotations using IBA when there already exists an experimental annotation?
*We didn't reach a resolution on this yet, but further discussion is on [https://github.com/geneontology/go-annotation/issues/2042 Doubled up IBA+EXP annotations (from Karen Christie)]


[[Category:Annotation Working Group]] Vanaukenk
Categories: GO Internal

Restoring an Obsolete Ontology Term

GO wiki (new pages) - Fri, 09/07/2018 - 08:14

Pascale:

See [[Ontology_Editors_Daily_Workflow]] for creating branches and basic Protégé instructions.

# Navigate to the obsolete term
# In the annotation window:
#* Modify the term label to remove 'obsolete'
#* Modify the term definition to remove 'OBSOLETE.'
#* Update the comment to Note that this term was reinstated from obsolete.
#* Remove any replaced_by and consider tags by clicking on the (x) at the right-hand side
#* Add Subclasses as appropriate
#* Remove the owl:deprecated: true tag
# Run the reasoner
# Save.

See [[Ontology_Editors_Daily_Workflow]] for commit, push and merge instructions.

[[Category:Ontology]][[Category:GO Editors]][[Category:Editor_Guide_2018]] Pascale
Categories: GO Internal

Ontology meeting 2018-09-10

GO wiki (new pages) - Fri, 09/07/2018 - 06:34

David:

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =

== Editors Discussion ==

===Project Update===
* RO meeting to take place in Colorado in October

===Ontology Developers' workshop possibility===
The meeting will focus on (Depending on who can attend):
* creating good logical definitions in both a general context and in the context of the GO-Reactome-Rhea alignment
** Filling in missing GO-Rhea xrefs
** moving from reactions to biochemical pathways
** implementation of a method to keep in synch with all three resources
** GO-CAM templates for pathways??
* potential implementation of Design Patterns and the revival of some type of TermGenie ability
* GH ticket work and mini-project planning
** Dealing with and migrating binding terms
** Dealing with cellular processes (https://github.com/geneontology/go-ontology/issues/12849)
* Attendees
** David H (Can make it)
** Kimberly (Can make it)
** Karen
** Harold (Can make it)
** Pascale (Can make it)
** Barbara (Maybe make it)
** Alan B (Rhea, can make any day)
** Anne (Rhea, prefers Monday or Tuesday)
** Chris
** Peter D?
** Ben?
** Jim?

=== Tickets ===

OBI ticket: https://github.com/obi-ontology/obi/issues/963

=== Failures due to use in other ontologies ===
Friday after much discussion, Rachael and I tried to finally obsolete lamina reticularis. When we created the pull request, the Travis check failed due to the use of the term in Uberon. Do we really want this behavior? It will make us dependent on other ontologies to proceed with our work.

https://travis-ci.org/geneontology/go-ontology/builds/425727676?utm_source=github_status&utm_medium=notification
https://github.com/geneontology/go-ontology/pull/16345


==GH project link==
https://github.com/geneontology/go-ontology/projects/1

= Minutes =
*On call:


[[Category: Ontology]]
[[Category: Meetings]] David
Categories: GO Internal

Manager Call 2018-09-06

GO wiki (new pages) - Tue, 09/04/2018 - 07:24

Pascale: /* New Project management strategy */



=Follow up from last week=

=New Project management strategy=
* We will use 'Project level projects'

* Product owner
* Tech lead
* Lead architect




== Feedback form update (UPDATED)==
https://github.com/geneontology/go-site/issues/750

Did we figure out the payment? No, still on my plate (Seth)

What's the time line for deployment? TBD, but pretty much instantaneous when done. (Seth)

== New GAF Submissions ==
SuziA: Update on documentation

== Ontology Editors' Meeting ==
Geneva- Week of December 10th

The meeting will focus on:
* creating good logical definitions in both a general context and in the context of the GO-Reactome-Rhea alignment
** Filling in missing GO-Rhea xrefs
** moving from reactions to biochemical pathways
** implementation of a method to keep in synch with all three resources
** GO-CAM templates for pathways??
* potential implementation of Design Patterns and the revival of some type of TermGenie ability
* GH ticket work and mini-project planning
* Attendees
** David H (Can make it)
** Kimberly (Can make it)
** Karen
** Harold (Can make it)
** Pascale
** Barbara (Maybe make it)
** Alan B (Rhea, can make any day)
** Anne (Rhea, prefers Monday or Tuesday)
** Chris
** Peter D?
** Ben?
** Jim?
Pascale: needed to follow up with Paul regarding funds to hold these

=Job descriptions for each role=
Pascale and Kimberly stared to create job descriptions for all managers roles:

https://drive.google.com/drive/folders/1F7e2D7T4hleIq8VaH7YW60D9wxQnRFRV

Every manager should add what they believe are their tasks. And then we discuss it here.

* Please have a look.

=Fate of pre-composed terms=
* explicit binding terms
* compound terms between two existing GO classes


[[Category:GO Consortium]] [[Category:GO Managers Meetings ]] David
Categories: GO Internal

Internal email lists

GO wiki (new pages) - Tue, 08/28/2018 - 10:50

Pascale: Created page with "GOC communicates internally via mailing lists. These are the links to request addition to the different mailing lists: * https://mailman.stanford.edu/mailman/listinfo/go-con..."

GOC communicates internally via mailing lists. These are the links to request addition to the different mailing lists:

* https://mailman.stanford.edu/mailman/listinfo/go-consortium
* https://mailman.stanford.edu/mailman/listinfo/go-council
* https://mailman.stanford.edu/mailman/listinfo/go-curator-tracker
* https://mailman.stanford.edu/mailman/listinfo/go-directors
* https://mailman.stanford.edu/mailman/listinfo/go-discuss
* https://mailman.stanford.edu/mailman/listinfo/go-friends
* https://mailman.stanford.edu/mailman/listinfo/go-helpdesk
* https://mailman.stanford.edu/mailman/listinfo/go-managers
* https://mailman.stanford.edu/mailman/listinfo/go-obodiff
* https://mailman.stanford.edu/mailman/listinfo/go-ontology
* https://mailman.stanford.edu/mailman/listinfo/go-quality
* https://mailman.stanford.edu/mailman/listinfo/go-refgenome
* https://mailman.stanford.edu/mailman/listinfo/go-software


[[Category:GO Consortium]] Pascale
Categories: GO Internal

Ontology meeting 2018-08-27

GO wiki (new pages) - Sun, 08/26/2018 - 17:33

Vanaukenk: Created page with "= Conference Line = *Zoom: https://stanford.zoom.us/j/828418143 = Agenda = == Editors Discussion == ===Project Update=== * RO meeting to take place in Colorado in October..."

= Conference Line =

*Zoom: https://stanford.zoom.us/j/828418143

= Agenda =

== Editors Discussion ==

===Project Update===
* RO meeting to take place in Colorado in October
* Discuss GO-relevant tickets on these calls?

===Ontology Developers' workshop possibility===
* One meeting in December; proposal to have the meeting in Geneva the week of December 10th.
* Maybe we can take a day or two to also meet with Anne and/or Alan to discuss the Rhea alignment project.
*Attendees:
**Yes:Harold, Kimberly
**Maybe: Barbara
**No:

=== Tickets ===

==GH project link==
https://github.com/geneontology/go-ontology/projects/1

= Minutes =
*On call:


[[Category: Ontology]]
[[Category: Meetings]] Vanaukenk
Categories: GO Internal

PAINT database update pipeline

GO wiki (new pages) - Fri, 08/24/2018 - 09:45

Debert: /* Import ontology */

==Import EXP files==

* File location:
http://geneontology.org/gene-associations/*.gaf.gz

==Import ontology==

* File location:
http://geneontology.org/ontology/go.obo

==Integration with existing PAINT annotations==

* Managing differences:
** Missing EXP evidence
** New exp evidence
(see for ex https://github.com/pantherdb/fullgo_paint_update/issues/10)
** Handling obsolete and merged terms
- obsolete terms:
a) has a 'replaced_by' tag -> replace to the new term
b) does not have a 'replaced_by' tag -> output message xxx
** etc ?


* List of messages for automatic changed: 'Comment' ('Update comment' in PAINT interface)
** What types of messages ? obsoletes, missing EXP... what else ?
* 'View omitted annotation information' -> generated on the fly ???
** What types of messages ? tree changes information (missing nodes) - what else ?

==Managing new PTHR versions==


[[Category:Reference Genome]] [[Category:Working Groups]] Pascale
Categories: GO Internal

GO-CAM Working Group Call 2018-08-28

GO wiki (new pages) - Fri, 08/24/2018 - 07:07

Vanaukenk: Created page with "= Meeting URL = https://stanford.zoom.us/j/976175422 =Agenda= == Relations between MF and Input(s) == *has_input vs has_direct_input *Proposal: replace has_direct_input with..."

= Meeting URL =
https://stanford.zoom.us/j/976175422

=Agenda=

== Relations between MF and Input(s) ==
*has_input vs has_direct_input
*Proposal: replace has_direct_input with has_input; obsolete has_direct_input
*Need to review has_input annotations to remove any extensions that are inconsistent with GO-CAM usage, i.e. an indirect or unknown proximity for an input
*Seth retrieved, as of 2018-07-31, [https://drive.google.com/drive/folders/1TlwrEM2KjAzxIYiCGg0_oicMOYfhiGou all MF annotations] that use has_input in annotation extensions.
**Initial review:
***used to capture a regulatory effect, e.g. protein kinase activator activity, when it was not known whether the effect was direct or indirect (e.g. expression of protein or complex X increases the activity of Y)
***used to capture a regulatory subunit whose presence is necessary for the activity to occur (e.g. cyclin-dependent protein kinase)
***used to capture an enzymatic activity when it was not known if the effect on a substrate was direct or indirect (e.g. caspase-dependent but not known if it was the caspase mutated)
***used to capture an enzymatic substrate where there wasn't also a direct binding assay in the paper (e.g. testing possible chemical substrates for glucuronysyltransferase activity)
***used to capture metal ion-dependence of protein binding (e.g. Ca2+-dependent protein binding)
***used (correctly) to capture the physiologically relevant input in a binding reaction (i.e. cross-species experiment where with/from captures experimental binding partner and AE the relevant binding partner)
*Relations Ontology working group (broader than just GO) that is also considering [https://github.com/oborel/obo-relations/issues/244 how to model participants in an MF] and [https://github.com/oborel/obo-relations/issues/171 documentation of has_input and child relations]

== Modeling Transcription in GO-CAM ==
*Sabrina - [http://noctua.berkeleybop.org/editor/graph/gomodel:5a5fc23a00000137 PMID:28687631 'Clock1a affects mesoderm development and primitive hematopoiesis by regulating Nodal-Smad3 signaling in the zebrafish embryo.']

=== Relations between Transcription Factor MFs and Regulation of Transcription BPs ===
*Transcription factor activity is 'part_of' regulation of transcription
*This is consistent with the relations in the ontology and produces the correct annotations in the GPAD output file
*A consequence of this is that any regulation terms needed for annotation will have to be instantiated in the ontology
*This principle will be applied more broadly, i.e. if an entity plays a regulatory role in a process, its MF is 'part_of' some regulation of BP

== Direct vs Unknown Mechanism of Regulation ==
=== Capturing Unknown Mechanism of Regulation ===
*If it is not known if the TF directly regulates the expression of a gene, then the input for the TF activity is left blank.
**In this case, however, it is okay to use evidence from another experiment that might have shown different context (i.e. a different gene was regulated) as supporting evidence for the TF activity.
*The curator can model the unknown mechanism of regulation by saying that the TF is part_of regulation of transcription that is causally_upstream_of_or_within the positive or negative regulation of transcription that ultimately controls the expression of the gene. The gene is then added as 'has input' to the most distal transcriptional regulatory process.

== Relations between BP and input(s) ==
*Duplicating has_input for MF and BP results in multiple entries in the AE field of the BP annotation in the GPAD

== Relations between BP and MF of transcriptional target ==
*[https://www.ebi.ac.uk/ols/ontologies/ro/properties?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_0002304 causally upstream of, positive effect]
*[https://www.ebi.ac.uk/ols/ontologies/ro/properties?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_0002305 causally upstream of, negative effect]

== Missing Property Chains ==
*We still need relation chains that allow us to capture that a gene involved in process 1 that is upstream of process 2 is upstream of process 2. For example:
**In mode (activity-centric):
***part of o causally upstream of -> causally upstream of
**In annotation file (gene-centric):
*** involved in o acts upstream of -> acts upstream of

== Root Node vs Existing Molecular Functions ==
*Curators should always try to construct models using the known MF of a gene product, even if that MF was not specifically demonstrated in the paper they are annotating.
*Associated evidence for that MF will always point back to the paper in which the MF was interrogated.
*Creating models in this way will allow us to build on existing knowledge to create the most comprehensive and up-to-date model for a given BP.
*Proposal: if a gene product has more than one MF, curators should use either: 1) experimental data that supports the selection of one function vs another, 2) the common parent of the two functions, or 3) the biological context of the annotated process to select the most appropriate function(s) for that gene product.
**Examples: beta-catenin and PDIA6

=Minutes=
*On call: Kimberly, Tanya, Chris G, Chris M, Dave F, Dmitry, Dustin, Edith, Giulia, Harold, Helen, Jim, Jennifer, Karen, Kevin M, Laurent-Philippe, Li, Liz, Marie-Claire, Nathan, Pascale, Petra, Rob, Sabrina, Seth, Shut-Jen, Stacia, Suzi L, Suzi A, Jae

== has_input vs has_direct_input ==
*has_input has been used in MF annotation extensions in different ways and we need to be consistent both within this relation as well as with the child relation has_direct_input























[[Category: Annotation Working Group]] Vanaukenk
Categories: GO Internal

Extensions2GO-CAM

GO wiki (new pages) - Thu, 08/23/2018 - 11:42

Paul Thomas: /* Simple conversions */

[[Category:GO-CAM]]
=Simple conversions=
These are relations that are essentially identical in extensions and GO-CAM
*If the aspect is F (column A)
**No more than one occurs_in(CC)
**No more than one occurs_in(CL)
**No more than one occurs_in(UBERON or EMAPA)
**No more than one has_input/has_direct_input(geneID or ChEBI)
**No more than one happens_during(BP)
**No more than one part_of(BP)
**No more than one has_regulation_target(geneID)
**No more than one activated_by(ChEBI)
**No more than one inhibited_by(ChEBI)
*If the aspect is C
**No more than one occurs_in(CC)
**No more than one occurs_in(CL)
**No more than one occurs_in(UBERON or EMAPA)
*If the aspect is P
**No more than one occurs_in(CC)
**No more than one occurs_in(CL)
**No more than one occurs_in(UBERON or EMAPA)
**No more than one has_input/has_direct_input(geneID or ChEBI)
**No more than one part_of(BP)
=has_regulation_target=
*GP-A [regulation of molecular function Z] has_regulation_target GP-B
**can be expressed as [GP-A]<-enabled_by-[ GO:0003674]-regulates->[molecular function Z]-enabled_by->[GP-B]
**The variations on this are all children of the term “regulation of molecular function” (GO:0065009). Variations are “positive regulation of MF Z”, which would be expressed with the positively_regulates relation instead; “negative regulation of MF Z”. **Note that MF Z should appear in the logical definition of the term “regulation of MF Z”, so you can get the GO ID from that.
*GP-A [regulation of transcription] has_regulation_target GP-B
**can be expressed as [GP-A]<-enabled_by-[GO:0003674]-regulates->[transcription]-has_input->[GP-B]
**These are all children of “regulation of transcription, DNA templated” (GO:0006351). Similarly to above, there are positive/negative variations, and you can get the specific GO ID for [transcription] from the logical definition. Paul Thomas
Categories: GO Internal