Feed aggregator

GO-CAM and Noctua 2017 Agenda and Minutes

GO wiki (new pages) - Tue, 01/16/2018 - 07:31

Vanaukenk:

[[GO-CAM May 10th, 2017]]

[[GO-CAM May 24th, 2017]]

[[GO-CAM June 14th, 2017]]

[[GO-CAM June 28th, 2017]]

[[GO-CAM July 12th, 2017]]

[[GO-CAM ectopic meeting July 19th 2017]]

[[GO-CAM July 26th, 2017]]

[[GO-CAM August 9th, 2017]]

[[GO-CAM August 23, 2017]]

[[GO-CAM Sept 13, 2017]]

[[GO-CAM Sept 27, 2017]]

[[GO-CAM October 25th, 2017]]

[[GO-CAM November 8th, 2017]]

[[GO-CAM November 22, 2017]]

[[GO-CAM December 13, 2017]]

Back to: [[Annotation]]


[[Category: Annotation]] Vanaukenk
Categories: GO Internal

GO-CAM and Noctua 2016 Agenda and Minutes

GO wiki (new pages) - Tue, 01/16/2018 - 07:24

Vanaukenk:

[[LEGO January 4, 2016]]

[[LEGO January 18, 2016]]

[[LEGO February 1, 2016]]

[[LEGO February 15, 2016]]

[[LEGO March 7, 2016]]

[[LEGO March 21, 2016]]

[[LEGO March 28, 2016]]

[[LEGO April 4, 2016]]

[[LEGO April 25, 2016]]

[[LEGO May 2, 2016]]

[[LEGO May 9, 2016]]

[[LEGO May 16, 2016]]

[[LEGO May 23, 2016]]

[[LEGO June 6, 2016]]

[[LEGO June 13, 2016]]

[[LEGO June 20, 2016]]

[[LEGO June 27, 2016]]

[[LEGO July 11, 2016]]

[[LEGO July 18, 2016]]

[[LEGO July 25, 2016]]

[[LEGO August 8, 2016]]

[[LEGO August 15, 2016]]

[[LEGO August 22, 2016]]

[[LEGO August 29, 2016]]

[[LEGO September 12, 2016]]

[[LEGO September 19, 2016]]

[[LEGO September 26, 2016]]

[[LEGO GAF/GPAD September 28, 2016]]

[[LEGO October 5, 2016]]

[[LEGO GAF/GPAD October 5, 2016]]

[[LEGO October 10, 2016]]

[[LEGO October 17, 2016]]

[[LEGO October 24, 2016]]

[[LEGO October 31, 2016]]


Back to: [[Annotation]]

[[Category: Annotation]] Vanaukenk
Categories: GO Internal

Manager Call 2018-01-18

GO wiki (new pages) - Tue, 01/16/2018 - 02:57

Pascale:

[[Category:GO Managers Meetings]]

= Call in info=
https://stanford.zoom.us/j/754529609

= Agenda =

==Noctua V1.0 progress report==


==Docathon==
*Review draft agenda for docathon
**http://wiki.geneontology.org/index.php/2018_Berkeley_GO_Docathon
**Create use cases and user stories to focus documentation efforts
**Who are the users?
***Curators
***MODs, other curation projects (e.g. Reactome)
***Bioinformaticians
***Systems biologists
***Software developers
***Bench scientists
**Post schedule for coming day so people can videoconference



==NYU meeting==
*Do we have any info on logistics?
*Start working on agenda?


==EC2GO mappings==
*We are missing about 1,500 reactions (25%) of EC (it's possible that we are also missing the corresponding terms
*What is the goal: (a) keep up to date? (b) remove incorrect (moved) mappings?
*Back in 2013 there was a Plan of action with respect to EC and Rhea, is this still our plan? What's the priority? http://wiki.geneontology.org/index.php/Enzymes_and_EC_mappings#Plan_of_Action


= Minutes =
*On call: Pascale
Categories: GO Internal

Inferred from High Throughput Expression Pattern (HEP)

GO wiki (new pages) - Wed, 01/10/2018 - 14:06

Vanaukenk: Created page with "'''HEP: Inferred from High Throughput Expression Pattern''' No data at: http://www.evidenceontology.org/term/ECO:0007007/ [http://wiki.geneontology.org/index.php/Guide_to_..."

'''HEP: Inferred from High Throughput Expression Pattern'''

No data at:

http://www.evidenceontology.org/term/ECO:0007007/


[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from High Throughput Genetic Interaction (HGI)

GO wiki (new pages) - Wed, 01/10/2018 - 14:04

Vanaukenk: Created page with "'''HTP: Inferred from High Throughput Genetic Interaction''' No data at: http://www.evidenceontology.org/term/ECO:0007003/ [http://wiki.geneontology.org/index.php/Guide_t..."

'''HTP: Inferred from High Throughput Genetic Interaction'''

No data at:

http://www.evidenceontology.org/term/ECO:0007003/


[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Hight Throughput Mutant Phenotype (HMP)

GO wiki (new pages) - Wed, 01/10/2018 - 14:00

Vanaukenk: Created page with "'''HMP: Inferred from High Throughput Mutant Phenotype''' No data at: http://www.evidenceontology.org/term/ECO:0007001/ [http://wiki.geneontology.org/index.php/Guide_to_GO_..."

'''HMP: Inferred from High Throughput Mutant Phenotype'''

No data at:

http://www.evidenceontology.org/term/ECO:0007001/

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from High Throughput Direct Assay (HDA)

GO wiki (new pages) - Wed, 01/10/2018 - 13:59

Vanaukenk:

'''HDA: High Throughput Direct Assay'''

No data at:

http://www.evidenceontology.org/term/ECO:0007005/



[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from High Throughput Experiment (HTP)

GO wiki (new pages) - Wed, 01/10/2018 - 13:57

Vanaukenk: Created page with "'''HTP: High Throughput Experiment''' [http://www.evidenceontology.org/term/ECO:0006056/ ECO:0006056 high throughput evidence used in manual assertion] [http://wiki.geneon..."

'''HTP: High Throughput Experiment'''

[http://www.evidenceontology.org/term/ECO:0006056/ ECO:0006056 high throughput evidence used in manual assertion]


[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Electronic Annotation (IEA)

GO wiki (new pages) - Wed, 01/10/2018 - 12:40

Vanaukenk: Created page with "Automatically-assigned Evidence Codes The Automatically-assigned Evidence Code is: IEA: Inferred from Electronic Annotation Note: Annotations using the IEA code should be rev..."

Automatically-assigned Evidence Codes
The Automatically-assigned Evidence Code is:

IEA: Inferred from Electronic Annotation
Note: Annotations using the IEA code should be reviewed after one year, any older than this date will be deleted.

Annotations based on "matches" in sequence similarity comparisons if they have not been reviewed by a curator
Annotations transferred from database records, if not reviewed by a curator
Annotations made on the basis of keyword mapping files, if not reviewed by a curator
If annotations based on sequence similarity based methods have been reviewed by a curator, use ISS instead and change the reference from the one that describes the computational analysis to one that says that the curator reviewed the sequence similarity and approved it.
Used for annotations that depend directly on computation or automated transfer of annotations from a database, particularly when the analysis is performed internally and not published. A key feature that distinguishes this evidence code from others is that it is not made by a curator; use IEA when no curator has checked the specific annotation to verify its accuracy. The actual method used (BLAST search, Swiss-Prot keyword mapping, etc.) doesn't matter.

When the method used to make annotations using the IEA code is performed internally by the annotating group and is not published, a short description of the method should be written and added to the GO Consortium's collection of GO references, where it will be given a GO_REF ID which can be used to cite the reference in gene association files.

Examples where the IEA evidence code should be used:

Annotations based on "matches" in sequence similarity comparisons if they have not been reviewed by a curator. If annotations based on sequence similarity based methods have been reviewed by a curator, use ISS instead.
Annotations transferred from database records, if not reviewed by a curator. If such annotations are reviewed by a curator and the database record has no linked publication, consider the NAS code.
Annotations made on the basis of keyword mapping files, if not reviewed by a curator
Examples where the IEA evidence code should not be used:

Annotations based on "matches" in sequence similarity comparisons and which have been reviewed by a curator should be made with ISS code.
Annotations transferred from database records, where the annotation is reviewed by a curator should not receive the IEA code. If the source is not traceable and the annotation is worth making, NAS should be used.
Usage of the With/From Column for IEA

At the January 2007 GOC meeting, it was agreed that it will be required to make an entry in the with/from column for all annotations made after May 1, 2007 when using this evidence code to indicate what individual sequences, sequence objects, methods, keyword mapping files, etc. are the basis of the annotation. When multiple entries are placed in the with/from field, they are separated by pipes.

...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... UniProt:A0A7W6 A0A7W6_9PARI GO:0006118 GOA:interpro|GO_REF:0000002 IEA InterPro:IPR005797 ...
... UniProt:A0A7W4 A0A7W6_9PARI GO:0006118 GOA:spkw|GO_REF:0000004 IEA SP_KW:KW-0496 ...
... UniProt:A0K8M1 A0K8M1_BURCH GO:0004830 GOA:spec|GO_REF:0000003 IEA EC:6.1.1.2 ...
... UniProt:A0KAB8 Y2695_BURCH GO:0008237 GOA:hamap|GO_REF:0000020 IEA HAMAP:MF_00009 ...
... UniProt:O77797 AKAP3_BOVIN GO:0009434 GOA:compara|GO_REF:0000019 IEA Ensembl:ENSMUSP00000093091 ...

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

No biological Data available (ND) evidence code

GO wiki (new pages) - Wed, 01/10/2018 - 12:39

Vanaukenk: Created page with "ND: No Biological Data Available Updated November 9, 2007 Used for annotations when information about the molecular function, biological process, or cellular component of the..."

ND: No Biological Data Available
Updated November 9, 2007

Used for annotations when information about the molecular function, biological process, or cellular component of the gene or gene product being annotated is not available.

Use of the ND evidence code indicates that the annotator at the contributing database found no information that allowed making an annotation to any term indicating specific knowledge from the ontology in question (molecular function, biological process, or cellular component) as of the date indicated. This code should be used only for annotations to the root terms, molecular function ; GO:0003674, biological process ; GO:0008150, or cellular component ; GO:0005575, which, when used in annotations, indicate that no knowledge is available about a gene product in that aspect of GO.

Annotations made with the ND evidence code should be accompanied by a reference that explains that curators looked but found no information. Note that some groups check only published literature while other groups also make sequence comparisons to see if an annotation can be made on the basis of a sequence comparison. The GO Reference collection includes a reference that can be used with ND when both literature and sequence have been checked; to use it, put "GO_REF:0000015" in the reference column of a gene association file.

Note that use of the ND evidence code with an annotation to one of the root nodes to indicate lack of knowledge in that aspect makes a statement about the lack of knowledge only with respect to that particular aspect of the ontology. Use of the ND evidence code to indicate lack of knowledge in one particular aspect does not make any statement about the availability of knowledge or evidence in the other GO aspects.

Even if an author states in a paper that there is no data available or nothing is known about the gene product in a particular GO aspect, annotation to the corresponding root node should be made with ND evidence code citing either the annotating group's internal reference or the GOC's reference on use of the ND evidence code, not a specific paper.

Note: The ND evidence code, unlike other evidence codes, should be considered as a code that indicates curation status/progress than as method used to derive an annotation.

When a gene product is annotated to a GO term using the NOT qualifier, this is a statement that it is not appropriate to associate that specific GO term with that particular gene product. However, such a negative annotation does not make any positive statements about the role of that gene product. Thus, there should always be a positive annotation, in addition to the NOT annotation. If nothing is known about the role of the gene product in a given aspect (molecular function, biological process, or cellular component) of GO, then the positive annotation should be made to the root node for that aspect using the ND evidence code.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred by Curator (IC)

GO wiki (new pages) - Wed, 01/10/2018 - 12:39

Vanaukenk: Created page with "IC: Inferred by Curator Updated September 22, 2011 The IC evidence code is to be used for those cases where an annotation is not supported by any direct evidence, but can be..."

IC: Inferred by Curator
Updated September 22, 2011

The IC evidence code is to be used for those cases where an annotation is not supported by any direct evidence, but can be reasonably inferred by a curator from other GO annotations, for which evidence is available.

An example would be when there is evidence (be it direct assay, sequence similarity or even from electronic annotation) that a particular gene product has the function RNA polymerase II transcription factor activity ; GO:0003702. There is no direct evidence showing that this gene product is located in the nucleus, but this would be a perfectly reasonable inference for a curator to make since the curator is annotating a eukaryotic gene product that is associated with a specific nuclear RNA polymerase. This inference will be linked to the annotation to the term RNA polymerase II transcription factor activity ; GO:0003702 in two ways: both annotations will share the same reference; and the annotation inferred by a curator will include one or more with/from statements pointing to the GO term(s) used by the curator for the inference.

In many cases a GO term can be inferred from just one other annotation as described above. Occasionally, there are cases where a curator has to infer the GO term based on evidence from multiple sources of evidence/GO annotations. The 'with/from' field in these annotations will therefore supply >1 GO identifier, obtained from the set of supporting GO annotations assigned to the same gene/gene product identifier which cite publicly-available references. In addition, such IC-annotations will use reference GO_REF:0000036.

Usage of the With/From Column for IC

Note that the with/from field must always be filled in with a GO ID when using this evidence code.

For example, Noel et al., 1998 (PMID:9651335) provides evidence that the protein encoded by the S. cerevisiae UGA3 gene has the function "specific RNA polymerase II transcription factor activity" ; GO:0003704. From this, the curator deduces it is located in the nucleus and thus makes an annotation to the cellular component term "nucleus" ; GO:0005634 with the GO ID for the function term in the with/from for the component annotation.

The second example shown below illustrates the use of IC with GO_REF:0000036. In this case, a curator has inferred an annotation for the CUP9 gene to the GO Term "RNA polymerase II transcription factor activity, sequence-specific transcription regulatory region DNA binding"; GO:0001133 based on evidence from PMID:9427760 that CUP9 is involved in "RNA polymerase II core promoter proximal region sequence-specific DNA binding" (GO:0000978), as well as evidence from PMID:18708352 that CUP9 is involved in "negative regulation of transcription from RNA polymerase" (GO:0000122). The with/from column supplies the GO IDs derived from these two publications separated by comma symbols (meaning AND) because both of these GO terms are required to support the inferred annotation to GO:0001133. If either of the GO terms could support the inference, they should be separated with a pipe (meaning OR).

...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... SGDID:S000002329 UGA3 GO:0003704 PMID:9651335 IPI ...
... SGDID:S000002329 UGA3 GO:0005634 PMID:9651335 IC GO:0003704 ...
...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... SGDID:S000006098 CUP9 GO:0000122 PMID:18708352 IMP ...
... SGDID:S000006098 CUP9 GO:0000978 PMID:9427760 IDA ...
... SGDID:S000006098 CUP9 GO:0001133 GO_REF:0000036 IC GO:0000122,GO:0000978 ...
Where;

GO:0003704 specific RNA polymerase II transcription factor activity
GO:0005634 nucleus
GO:0000122 negative regulation of transcription from RNA polymerase II promoter
GO:0000978 RNA polymerase II core promoter proximal region sequence-specific DNA binding
GO:0001133 RNA polymerase II transcription factor activity, sequence-specific transcription regulatory region DNA binding

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Non-traceable Author Statement (NAS)

GO wiki (new pages) - Wed, 01/10/2018 - 12:37

Vanaukenk: Created page with "NAS: Non-traceable Author Statement Updated November 9, 2007 Database entries that don't cite a paper (e.g. UniProt Knowledgebase records, YPD protein reports) Statements in..."

NAS: Non-traceable Author Statement
Updated November 9, 2007

Database entries that don't cite a paper (e.g. UniProt Knowledgebase records, YPD protein reports)
Statements in papers (abstract, introduction, or discussion) that a curator cannot trace to another publication
The NAS evidence code should be used in all cases where the author makes a statement that a curator wants to capture but for which there are neither results presented nor a specific reference cited in the source used to make the annotation. The source of the information may be peer reviewed papers, textbooks, or database records. For some annotations using the NAS code, there will not be an entry in the with/from field.

The NAS code is also used for making annotations from database entries when a curator reviews the annotations that result. Typically such annotations will refer to an unpublished reference describing what was done, either a reference with a GO_REF id or an internal reference from the specific annotating database.

Cases where the NAS code should be used:

In Ladd et al., 2001 (PMID:11158314), the authors state that:
"All of the CELF proteins contain multiple potential protein kinase C and casein kinase II phosphorylation sites. All are predicted to have predominantly nuclear localization, and CELF3, CELF4, and CELF5 each possess a consensus nuclear localization signal sequence near the C terminus."
As this paper provided no reference to support the author's ascertion that CELF3 is located to the nucleus (nor presentation of sequence analyses related to this statement), and the absence of better published data at the time of curation, CELF3 has been annotated to the GO term nucleus with the NAS evidence code.
...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... UniProt:Q5SZQ8 CELF3_HUMAN GO:0009102 PMID:11158314 NAS ...
Cases where the NAS code should not be used:

When an author makes a statement that is attributed to a source cited in the reference list, use the TAS evidence code.
When an annotator makes an annotation based on a combination of another GO annotation and common knowledge. For example, if a curator makes an annotation to the cellular component term nucleus on the basis that the gene product is already annotated to the molecular function term general RNA polymerase II transcription factor activity and the common knowledge that transcription factors interacting with RNA polymerase II act in the nucleus, then the IC evidence code should be used with the GO ID for the GO term from which the annotation was derived in the with/from field and the same reference should be cited as was used for the annotation to the term whose GO ID is placed in the with/from field.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Traceable Author Statement (TAS)

GO wiki (new pages) - Wed, 01/10/2018 - 12:36

Vanaukenk: Created page with "TAS: Traceable Author Statement Updated November 9, 2007 Any statement in an article where the original evidence (experimental results, sequence comparison, etc.) is not dire..."

TAS: Traceable Author Statement
Updated November 9, 2007

Any statement in an article where the original evidence (experimental results, sequence comparison, etc.) is not directly shown, but is referenced in the article and therefore can be traced to another source.
The TAS evidence code covers author statements that are attributed to a cited source. Typically this type of information comes from review articles. Material from the introductions and discussion sections of non-review papers may also be suitable if another reference is cited as the source of experimental work or analysis.

When annotating with this code the curator should use caution and be aware that authors often cite papers dealing with experiments that were performed in organisms different from the one being discussed in the paper at hand. Thus a problem with the TAS code is that it may turn out from following up the references in the paper that no experiments were performed on the gene in the organism actually being characterized in the primary paper. For this reason we recommend (when time and resources allow) that curators track down the cited paper and annotate directly from the experimental paper using the appropriate experimental evidence code. When this is not possible and it is necessary to annotate from reviews, the TAS code is the appropriate code to use for statements that are associated with a cited reference.

Once an annotation has been made to a given term using an experimental evidence code, we recommend removing any annotations made to the same term using the TAS evidence code.

Note that prior to July 2006, it was allowed to use the TAS evidence code for annotations based on information found in a text book or dictionary; as text book material has often become common knowledge (e.g. "everybody" knows that enolase is a glycolytic enzyme). However, at the 2006 GO Annotation Camp, it was concluded that this sort of information is not traceable to its source and is thus not suitable for the TAS evidence code. When annotating on the basis of common knowledge possessed by the curator, consider the IC code. When annotating an author statement that that is not associated with a cited reference, use the NAS code.

Examples where the TAS evidence code should be used:

Annotating the twelve S. cerevisiae genes (RPO21, RPB2, RPB3, RPB4, RPB5, RPO26, RPB7, RPB8, RPB9, RPB10, RPC10, and RPB11) that are part of the core complex of RNA polymerase II to the GO term DNA-directed RNA polymerase II, core complex ; GO:00005665 based on a table in Meyer and Young, 1998 (PMID:9774381) listing each of these genes as encoding a subunit of the enzyme and giving one or more references for each subunit.
Annotating the human myo9b gene to the GO term Rho GTPase activator activity ; GO:0005100 based on this statement in the introduction of a research article, Post et al., 2002 (PMID:11801597):
"Biochemical characterization of both bacterially expressed Myr5 and Myr7 tail domains and tissue-purified human Myo9b demonstrate that these myosins IX are active GAPs for Rho but not Rac or CDC 42 (3,4,7)."
Examples where the TAS evidence code should not be used:

In Ladd et al., 2001 (PMID:11158314), the authors state:
"All of the CELF proteins contain multiple potential protein kinase C and casein kinase II phosphorylation sites. All are predicted to have predominantly nuclear localization, and CELF3, CELF4, and CELF5 each possess a consensus nuclear localization signal sequence near the C terminus."
As this paper provided no reference to support the author's ascertion that CELF3 is located to the nucleus (nor presentation of sequence analyses related to this statement), and the absence of better published data at the time of curation, CELF3 has been annotated to the GO term nucleus with the NAS evidence code and not the TAS evidence code.
...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... gene B GO:0005634 PMID:11158314 IGC operon_geneA_ID|operon_geneC_ID (from operon in annotated organism) ...
... UniProt:Q5SZQ8 CELF3_HUMAN GO:0009102 PMID:15347579 NAS ...
When an annotator makes an annotation based on a combination of another GO annotation and common knowledge. For example, if a curator makes an annotation to the cellular component term nucleus on the basis that the gene product is already annotated to the molecular function term general RNA polymerase II transcription factor activity and the common knowledge that transcription factors interacting with RNA polymerase II act in the nucleus, then the IC evidence code should be used with the GO ID for the GO term from which the annotation was derived in the with/from field and the same reference should be cited as was used for the annotation to the term whose GO ID is placed in the with/from field.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Reviewed Computational Analysis (RCA)

GO wiki (new pages) - Wed, 01/10/2018 - 12:36

Vanaukenk: Created page with "RCA: inferred from Reviewed Computational Analysis Updated November 9, 2007 Note: Annotations using the RCA code should be reviewed after one year, any older than this date w..."

RCA: inferred from Reviewed Computational Analysis
Updated November 9, 2007

Note: Annotations using the RCA code should be reviewed after one year, any older than this date will be deleted.

Predictions based on computational analyses of large-scale experimental data sets
Predictions based on computational analyses that integrate datasets of several types, including experimental data (e.g. expression data, protein-protein interaction data, genetic interaction data, etc.), sequence data (e.g. promoter sequence, sequence-based structural predictions, etc.), or mathematical models
The RCA code should be used for annotations made from predictions based on computational analyses of large-scale experimental data sets, or on computational analyses that integrate multiple types of data into the analysis. Acceptable experimental data types include protein-protein interaction data (e.g. two-hybrid results, mass spectroscopic identification of proteins identified by affinity tag purifications, etc.) synthetic genetic interactions, microarray expression results. Sequence-based data based on the sequence of the gene product, including structural predictions based on sequence, may be included provided that the analysis included non-sequence-based data as well. Sequence information related to promotor sequence features may also be included as a data type within these analyses. Predictions based on mathematical modelling which attempts to duplicate existing experimental results are also appropriate for use of this evidence code.

Analyses based purely on comparisons of the gene product sequence, including sequence similarity with experimentally characterized gene products, as determined by pairwise or multiple alignment; prediction methods for non-coding RNA genes; recognized functional domains, as determined by tools such as InterPro, Pfam, SMART, etc. and including the use of files such as interpro2go, pfam2go, smart2go to convert the domain hits to GO terms; predicted protein features, e.g., transmembrane regions, signal sequence, etc.; structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction; or analyses combining multiple types of data based on the gene product sequence should use the ISS evidence code (or the IEA code if it is not reviewed by a curator).

Similarly for experimental data, if the annotation was made purely on the basis of an experimental result, e.g. a protein-protein interaction with a characterized protein, a genetic interaction with a characterized gene, or having a similar microarray expression pattern as a characterized gene, then the appropriate experimental evidence code, IPI, IGI, or IEP, respectively, should be used instead.

Examples where the RCA evidence code should be used:

Samanta and Liang, 2003 (PMID:14566057) analyzed all interactions for S. cerevisiae present in the Database of Interacting Proteins (DIP) and made predictions about the roles of genes that were uncharacterized at the time. GO Annotations resulting from this publication include the process term 'rRNA processing' for both UTP30 and NOP6, neither of which was experimentally characterized at the time. A role for NOP6 in the biogenesis of the small ribosomal subunit has subsequently been indicated via a genetic interaction with the experimentally characterized gene EMG1.
Troyanskaya et al., 2003 (PMID:12826619) ...
Examples where the RCA evidence code should not be used:

Annotations based on more than one type of gene product sequence based evidence, including such things as BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. should use the ISS code.
Annotations based on integrated computational analyses, if they have not been reviewed by a curator, should receive the IEA code.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Rapid Divergence(IRD)

GO wiki (new pages) - Wed, 01/10/2018 - 12:35

Vanaukenk: Created page with "IRD: Inferred from Rapid Divergence Updated May 3, 2011 A type of phylogenetic evidence characterized by rapid divergence from ancestral sequence. Annotating with this eviden..."

IRD: Inferred from Rapid Divergence
Updated May 3, 2011

A type of phylogenetic evidence characterized by rapid divergence from ancestral sequence. Annotating with this evidence code implies a NOT annotation.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Key Residues (IKR)

GO wiki (new pages) - Wed, 01/10/2018 - 12:35

Vanaukenk: Created page with "IKR: Inferred from Key Residues Updated May 2, 2012 A type of manually-curated evidence derived from sequence analysis, characterized by the lack of key sequence residues. Al..."

IKR: Inferred from Key Residues
Updated May 2, 2012

A type of manually-curated evidence derived from sequence analysis, characterized by the lack of key sequence residues. All annotations that apply this evidence code should use the 'NOT' qualifier. This evidence code is used to annotate a gene product when, although homologous to a particular protein family, it has lost essential residues and is very unlikely to be able to carry out an associated function, participate in the expected associated process, or found in a certain location. This annotation statement can be supported by a published literature reference (e.g. a PubMed identifier) that has described the sequence analysis efforts, or by a GO Reference that describes the process a curator undertook to become sufficiently convinced of the sequence mutation. Where an IKR annotation statement is made using a GO Reference, inclusion of an identifier in the 'with/from' column of the annotation format that can indicate to the user the lacking residues (e.g. an alignment, domain or annotation rule identifier) is absolutely required. In contrast, when an IKR annotation statement is supported by a published literature reference,a value in the 'with/from' field is highly recommended although not required. This evidence code is also referred to as IMR (inferred from Missing Residues).
Examples where the IKR evidence code should be used:

Curator-Determined IKR Annotation Example: Rat HPT (P06866) is homologous to serine proteases and contains a match to the peptidase S1 domain. However further sequence analysis by a curator looking at the the Peptidase S1B, active site established it has lost all essential catalytic residues, making it unable to carry out serine protease activity.
Curator-Determined IKR Annotation Example, Using PAINT : Curators determined that Drosophila neuroligin protein does not have carboxylesterase activity, based on phylogeny-based evidence. The Panther identifier in the 'with/from' field links out to an evidence record citing annotation data from orthologous gene products, supporting the annotation statement.
Paper-Curated IKR Annotation Example: Ross,J., Jiang,H., Kanost,M.R. and Wang,Y. (2003) Serine proteases and their homologs in the Drosophila melanogaster genome: an initial analysis of sequence conservation and phylogenetic relationships. Gene 30;304:117-31 (PMID:12568721). The authors describe the determination of serine protease activity of proteins from the D. melanogaster S1 serine protease gene family, by determining the presence of conserved His, Asp, Ser catalytic triad residues in retrieved sequences. If all three residues were present in the conserved TAAHC, DIAL, and GDSGGP motifs, the sequence was considered to have serine protease activity. Any sequence lacking one of the key residues was identified as an a serine protease homolog, lacking proteolytic activity.
...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... P06866 RatHPT NOT GO:0004252serine-type endopeptidase activity GO_REF:0000047 IKR InterPro:IPR000126 ...
... P06866 neuroligin NOT GO:0004091carboxylesterase activity GO_REF:0000033 IKR PANTHER:PTHR11559_AN146 ...
... FB:FBgn0033192 gene S1 NOT GO:0004252serine-type endopeptidase activity PMID:12568721 IKR ...
Examples where the IKR evidence code should not be used:

If there is experimental evidence available from a publication to support a NOT-evidenced annotation. In such instances, the curator should make the IDA, IMP or EXP NOT-qualified annotation based on the experimental evidence. If a paper supplies data that showed the active site was missing and additionally carried out an experimental assay to show lack of activity, it would be correct to create two annotation statements from this paper; both NOT IKR and NOT IDA.
CAUTION: Where curators make judgements of functionning using the IKR evidence code, they should be able to draw on some level of expertise regarding the protein family, as there will always be exceptions to the rule. For instance, Q9H4A3 (WNK1_HUMAN) is a good example where nature has confounded prediction; Cys-250 is present instead of the conserved Lys which is expected to be an active site residue. However Lys-233 appears to fulfill the required catalytic function.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Biological aspect of Descendant (IBD)

GO wiki (new pages) - Wed, 01/10/2018 - 12:34

Vanaukenk: Created page with "IBD: Inferred from Biological aspect of Descendent Updated May 3, 2011 A type of phylogenetic evidence whereby an aspect of an ancestral gene is inferred through the characte..."

IBD: Inferred from Biological aspect of Descendent
Updated May 3, 2011

A type of phylogenetic evidence whereby an aspect of an ancestral gene is inferred through the characterization of an aspect of a descendant gene.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Biological aspect of Ancestor (IBA)

GO wiki (new pages) - Wed, 01/10/2018 - 12:31

Vanaukenk: Created page with "IBA: Inferred from Biological aspect of Ancestor Updated May 3, 2011 A type of phylogenetic evidence whereby an aspect of a descendent is inferred through the characterizatio..."

IBA: Inferred from Biological aspect of Ancestor
Updated May 3, 2011

A type of phylogenetic evidence whereby an aspect of a descendent is inferred through the characterization of an aspect of a ancestral gene.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Genomic Context (IGC)

GO wiki (new pages) - Wed, 01/10/2018 - 12:31

Vanaukenk: Created page with "IGC: Inferred from Genomic Context Updated November 9, 2007 operon structure syntenic regions pathway analysis genome scale analysis of processes This evidence code can be us..."

IGC: Inferred from Genomic Context
Updated November 9, 2007

operon structure
syntenic regions
pathway analysis
genome scale analysis of processes
This evidence code can be used whenever information about the genomic context of a gene product forms part of the evidence for a particular annotation. Genomic context includes, but is not limited to, such things as identity of the genes neighboring the gene product in question (i.e. synteny), operon structure, and phylogenetic or other whole genome analysis.

IGC may be used in situations where part of the evidence for the function of a protein is that it is present in a putative operon for which the other members of the operon have strong sequence or literature based evidence for function. The presence of the gene in an operon specific for a particular function, pathway, complex, etc. is itself a form of evidence. It is encouraged that when using this code with operon structure that the id numbers for the genes in the operon be put in the with/from field.

The IGC evidence code can also be used to annotate gene products encoded by genes within a region of conserved synteny. For instance, sequence similarity alone may be too low to make an inference but orthology can often be predicted based on the position of a gene within a region of synteny and this used to strengthen the assertion. In these cases the with/from field should be used to store the identity of the positional ortholog.

In the area of process annotations, in order for us to assert that a gene product is involved in a particular process in the cell, that process itself must be happening in that cell. The only way to know if a process is happening is to determine if all of the elements required for that process are present. This is often accomplished by looking to see if there are genes in the genome which can complete every step in the process in question. The same holds true for subunits of protein complexes. This often entails examining many different gene products and many different evidence types found all around the genome of an organism to reach a particular conclusion.

When the method used to make annotations using the IGC code is performed internally by the annotating group and is not published, a short description of the method should be written and added to the GO Consortium's collection of GO references, where it will be given a GO_REF ID which can be used to cite the reference in gene association files.

Usage of the With/From Column for IGC

We recommend making an entry in the with/from column when using this evidence code. In cases where operon structure or synteny are the compelling evidence, include identifier(s) for the neighboring genes in the with/from column. In casees where metabolic reconstruction is the compelling evidence, and there is an identifier for the pathway or system, that should be entered in the with/from column. When multiple entries are placed in the with/from field, they are separated by pipes.

Note that there has been some discrepancy between groups as to the use of the with/from column; please see the Note on Usage of the With/from Column for more details.

...
2.

DB Object ID

3.

DB Object Symbol

4.

Qualifier

5.

GO ID

6.

DB:Reference

7.

Evidence Code

8.

With/From

...
... TIGR_CMR:gene_B_ID gene B GO:0009231 GO_REF:0000025 IGC operon_geneA_ID|operon_geneC_ID (from operon in annotated organism) ...
... TIGR_CMR:gene_A_ID gene A GO:0009102 PMID:15347579 IGC TIGR_GenProp:GenProp0036

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal

Inferred from Sequence Model (ISM)

GO wiki (new pages) - Wed, 01/10/2018 - 12:30

Vanaukenk: Created page with "ISM: Inferred from Sequence Model Prediction methods for non-coding RNA genes such as tRNASCAN-SE, Snoscan, and Rfam Predicted presence of recognized functional domains or mem..."

ISM: Inferred from Sequence Model
Prediction methods for non-coding RNA genes such as tRNASCAN-SE, Snoscan, and Rfam
Predicted presence of recognized functional domains or membership in protein families, as determined by tools such as profile Hidden Markov Models (HMMs), including Pfam and TIGRFAM
Predicted protein features using tools such as TMHMM (transmembrane regions), SignalP (signal peptides on secreted proteins), and TargetP (subcellular localization)
Any other kind of domain modeling tool or collections of them such as SMART, PROSITE, PANTHER, InterPro, etc.
An entry in the with field is required when the model used is an object with an accession number (as found with Pfam, TIGRFAM, InterPro, PROSITE, Rfam, etc.) The with field may be left blank for tools such as tRNAscan and Snoscan where there is not an object with an accession to point to.
The ISM code is a sub-category of the ISS code. The ISM code should be used any time that evidence from some kind of statistical model of a sequence or group of sequences is used to make a prediction about the function of a protein or RNA. Generally, when searching sequences with these modeling tools, the results include statistical scores (such as e values and cutoff scores) that help curators decide when a result is significant enough to warrant making an annotation. If an annotator manually checks these scores and determines if the result makes sense in the context of other information known about the sequence and decides that the evidence warrants a particular annotation, then the evidence code is ISM. However, if a tool that looks only at the scores makes annotations automatically and there is no manual review, the evidence code should be IEA.

It is important to note that some models are more functionally specific than others. In particular this is seen in the profile HMMs and somewhat in PROSITE motifs. Some HMMs are built so that all of the proteins used in building the model and all of the proteins that score well to the model have the exact same function. These models can therefore be used to predict precise functions in match proteins. Other models are built to reflect the shared sequence found among members of superfamiles or subfamilies. These can be used to predict varying levels of functional specificity and may often only provide very general annotations such as identification of a protein as an oxidoreductase. Finally, many models predict the presence of particular domains in a protein which may or may not provide information on the function of a protein, for example the CUB domain is found in a functionally diverse set of proteins and does not allow annotations to function to be made based on its presence alone. Therefore it is very important during the manual annotation process to assess what information it is safe to conclude from a match to any given model.

Some of the sequence-based modeling techniques result in models specific to individual sequence families. The profile HMMs, PROSITE motifs, and InterPro are in this group. In such cases, the with field should be populated with the accession number of the model specific for the functional domain or protein in question. Other sequence-based modeling techniques such as tRNASCAN and Snoscan are methods that result in the prediction of a set of sequences within a particular class (e.g. tRNAs, snoRNAs) and there are not specific models that one can link to each ncRNA. In these cases the with field may be left blank.

If the search for, and evaluation of, the sequence-based model data was described in a published paper, a reference to the paper should be placed in the reference column. However, if the search for and evaluation of the data was performed by the same group that is doing the GO annotation, then a reference should be placed in the reference column that describes the methodology used. If there is no publication for this methodology, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations.

Examples of when to use ISM

A curator performs an HMM search for a query protein. The result is that the query protein scores above the trusted cutoff to the HMM PF05426 alginate lyase. This HMM describes a family of alginate lyases. After review of all documentation associated with the HMM to determine functional specificity, or lack thereof, of the HMM and review of the scores that the query protein received, if the curator is confident that the query protein is indeed an alginate lyase, the appropriate annotations should be made using ISM as the evidence code, and putting Pfam:PF05426 in the with column. Since this search and evaluation was performed by the curator, a GO standard reference should be used to describe the search and evaluation methods (e.g. GO_REF:0000011).
A paper describes using PROSITE searches with the protein of interest and concludes the protein has a particular binding activity based on a match to a particular PROSITE motif. The curator would make the appropriate GO annotations, using ISM as the evidence code, putting the accession number of the PROSTIE motif that provided the evidence in the with column, and the PMID number of the paper that described the work in the reference column.
A curator runs the program tRNAscan (Lowe, T.M. and Eddy, S.R. NAR, 1997) on a newly sequenced bacterial genome to find the tRNAs. tRNAscan produces a list of the tRNA genes contained within that genome. A curator checks the results of the analysis to make sure that the predictions make sense and are consistent with what is known about the organism. Each of theses genes is given appropriate annotations for a tRNA. The evidence code is ISM, and a reference describing the process the curator used (either a published paper or a GO standard reference) should be placed in the reference column. The with column may be left blank.
PMID:10024243 describes the use of a probabilistic model to predict snoRNA genes in yeast. Each of theses genes may be given appropriate annotations for a snoRNA. The evidence code is ISM, and the reference is the paper describing the work. The with column may be left blank.

[http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes Back to: Guide to GO Evidence Codes]

[[Category: Annotation]]
[[Category: Evidence Codes]] Vanaukenk
Categories: GO Internal