Updated November 9, 2007
Note: Annotations using the RCA code should be reviewed after one year, any older than this date will be deleted.
- Predictions based on computational analyses of large-scale experimental data sets
- Predictions based on computational analyses that integrate datasets of several types, including experimental data (e.g. expression data, protein-protein interaction data, genetic interaction data, etc.), sequence data (e.g. promoter sequence, sequence-based structural predictions, etc.), or mathematical models
The RCA code should be used for annotations made from predictions based on computational analyses of large-scale experimental data sets, or on computational analyses that integrate multiple types of data into the analysis. Acceptable experimental data types include protein-protein interaction data (e.g. two-hybrid results, mass spectroscopic identification of proteins identified by affinity tag purifications, etc.) synthetic genetic interactions, microarray expression results. Sequence-based data based on the sequence of the gene product, including structural predictions based on sequence, may be included provided that the analysis included non-sequence-based data as well. Sequence information related to promotor sequence features may also be included as a data type within these analyses. Predictions based on mathematical modelling which attempts to duplicate existing experimental results are also appropriate for use of this evidence code.
Analyses based purely on comparisons of the gene product sequence, including sequence similarity with experimentally characterized gene products, as determined by pairwise or multiple alignment; prediction methods for non-coding RNA genes; recognized functional domains, as determined by tools such as InterPro, Pfam, SMART, etc. and including the use of files such as interpro2go, pfam2go, smart2go to convert the domain hits to GO terms; predicted protein features, e.g., transmembrane regions, signal sequence, etc.; structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction; or analyses combining multiple types of data based on the gene product sequence should use the ISS evidence code (or the IEA code if it is not reviewed by a curator).
Similarly for experimental data, if the annotation was made purely on the basis of an experimental result, e.g. a protein-protein interaction with a characterized protein, a genetic interaction with a characterized gene, or having a similar microarray expression pattern as a characterized gene, then the appropriate experimental evidence code, IPI, IGI, or IEP, respectively, should be used instead.
Examples where the RCA evidence code should be used:
- Samanta and Liang, 2003 (PMID:14566057) analyzed all interactions for S. cerevisiae present in the Database of Interacting Proteins (DIP) and made predictions about the roles of genes that were uncharacterized at the time. GO Annotations resulting from this publication include the process term 'rRNA processing' for both UTP30 and NOP6, neither of which was experimentally characterized at the time. A role for NOP6 in the biogenesis of the small ribosomal subunit has subsequently been indicated via a genetic interaction with the experimentally characterized gene EMG1.
- Troyanskaya et al., 2003 (PMID:12826619) ...
Examples where the RCA evidence code should not be used:
- Annotations based on more than one type of gene product sequence based evidence, including such things as BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, mapping files such as interpro2go etc. should use the ISS code.
- Annotations based on integrated computational analyses, if they have not been reviewed by a curator, should receive the IEA code.