ISS: Inferred from Sequence or Structural Similarity

Updated April 1, 2008

The ISS evidence code or one of its sub-categories should be used whenever a sequence-based analysis forms the basis for an annotation and review of the evidence and annotation has been done manually. If the annotation has not been reviewed manually, the correct evidence code is IEA, even if the evidence supporting the annotation is all sequence based. ISS should be used if a combination of sequence-based tools or methods are used. If only one particular type of sequence-based evidence is used then one of the more specific sub-categories of ISS may be more appropriate for the annotation. There are three specific sub-categories of ISS, mentioned briefly here and described in more detail below:

  • ISA: If the primary piece of evidence is a pairwise or multiple alignment then ISA (Inferred from Sequence Alignment) would likely be the appropriate evidence code to use.
  • ISO: If the primary piece of evidence is the assertion of orthology between the gene product and a gene product in another organism, ISO (Inferred from Sequence Orthology) would likely be the appropriate evidence code to use.
  • ISM: If any kind of sequence modeling method (e.g. Hidden Markov Models) is the primary piece of evidence then the ISM (Inferred from Sequence Model) code is the most appropriate.

ISS can also be used for structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction. In practice, ISS annotations are rarely, if ever, made purely from structural information. When included, structural information is generally at the level of secondary structure modeling or prediction derived from sequence information. Secondary structure information is particularly useful as one component of RNA gene predictions and in some domain models.

Population of the with field is important when using the ISS code or one of its sub-categories. The entry in with is the accession of the object or model to which your query has similarity. It is mandatory for annotators to make an entry in the with field when using the ISS code or one of its sub-categories if the annotation is based on an alignment with other proteins (e.g UniProt) or a sequence model contained in a database (e.g. Pfam, InterPro). If the annotation is based on a method such as tRNASCAN, which cannot be referred to with an accession number, the with field may be left empty. Entries in the with field should be in the format database:accession, where database is one of the abbreviations listed in the GO database abbreviations collection and accession is the accession number of the object the sequence similarity is with. Multiple entries in the with field should be separated by pipes.

If the searches and evaluation of the sequence-based data are described in a published paper, the ID (either one assigned by PubMed or one assigned by another database such as a Model Organism Database) of the paper should be placed in the reference column. However, if the group that is doing the GO annotation performed the searches and evaluation of the sequence-based data, and there is no published reference, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations. In all cases, the ID of the reference describing the methodology of the sequence analysis should be placed in the reference column.

Examples of when to use ISS:

  • An ISS annotation is often based on more than just one type of sequence-based evidence. Often, a host of searches are performed for any given query protein. These searches might include BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, etc. Evaluation of output from these search tools (bear in mind that every search may not yield results for every protein) leads an annotator to a particular ISS annotation for a particular protein. For example, a BLAST search might reveal that a query protein matches an experimentally characterized protein from another species at 50% identity over the full lengths of both proteins. After reading literature about the match protein, the curator sees that the match protein is known to contain a domain located in the plasma membrane and another domain that extends into the cytoplasm. It is also known from the literature that the experimentally characterized match protein requires the binding of ATP to function. TMHMM analysis of the query protein predicts several membrane spanning regions in one half of the protein (consistent with location in a membrane). In addition there are PROSITE and Pfam results which reveal the presence of an ATP-binding domain in the other half of the protein which TMHMM predicts to be cytoplasmic. These four search results taken together point to a probable identification of the query protein as having the function of the match protein.
  • PMID:8674114 describes comparative analysis of several newly identified and previously characterized snoRNAs. They list a number of sequence features, both conserved sequence elements and a region of complementarity to rRNA, and spacings that are characteristic of box C/D snoRNAs. As the authors don't develop a predictive method, the analysis they describe isn't considered to be a model, so ISM is not appropriate. As being a member of the box C/D snoRNA family is predictive for being a methylation guide, one could make annotations for a number of snoRNAs based on this paper. Note that the yeast U24 gene (snR24) is also experimentally characterized in this paper. Thus, for snR24 from S. cerevisiae, it is possible to make annotations using both the ISS and the IMP evidence codes, or one might choose not to make the ISS-based annotation for snR24 since experimental evidence is available.