Tools for Analysis of Data Sets, e.g. gene expression / microarray data
The following tools make use of the GO ontologies or the gene associations provided by Consortium members. Being listed on this page does not represent an endorsement by the GO Consortium, nor has the Consortium tested the tool or found that it uses the Consortium information accurately. This page is provided to promote an exchange of information between users and software developers.
compatible OSs (for downloadable tools)
Unless stated otherwise, tools are free for academic use.
Avadis is a data analysis and visualization tool for gene expression data. Avadis has a built-in Gene Ontology browser to view ontology hierarchies. There are common ontology paths for multiple genes. Genes can be clustered based on ontology terms to identify functional signatures in gene expression clusters.
Note that Avadis is proprietary software.
BiNGO is an open-source Java tool to determine which GO categories are statistically over-represented in a set of genes. BiNGO is implemented as a plugin for Cytoscape, which is a software platform for data integration and visualization of molecular interaction networks. BiNGO maps the predominant functional themes of a given gene set on the GO hierarchy, and outputs this mapping as a Cytoscape graph.
CLASSIFI (Cluster Assignment for Biological Inference) is a data-mining tool that can be used to identify significant co-clustering of genes with similar functional properties (e.g. cellular response to DNA damage). Briefly, CLASSIFI uses the Gene Ontology gene annotation scheme to define the functional properties of all genes/probes in a microarray data set, and then applies a cumulative hypergeometric distribution analysis to determine if any statistically significant gene ontology co-clustering has occurred.
CLENCH (CLuster ENriCHment) allows A. thaliana researchers to perform automated retrieval of GO annotations from TAIR and calculate enrichment of GO terms in gene group with respect to a reference set. Before calculating enrichment, CLENCH allows mapping of the returned annotations to arbitrary coarse levels using GO slim term lists (which can be edited by the user) and a local installation of GO.
ClueGO is a Cytoscape plug-in that visualizes the non-redundant biological terms for large clusters of genes in a functionally grouped network. It can be used in combination with GOlorize. The identifiers can be uploaded from a text file or interactively from a network of Cytoscape. The type of identifiers supported can be easily extended by the user. ClueGO performs single cluster analysis and comparison of clusters. From the ontology sources used, the terms are selected by different filter criteria. The related terms which share similar associated genes can be combined to reduce redundancy. The ClueGO network is created with kappa statistics and reflects the relationships between the terms based on the similarity of their associated genes. On the network, the node colour can be switched between functional groups and clusters distribution. ClueGO charts are underlying the specificity and the common aspects of the biological role. The significance of the terms and groups is automatically calculated. ClueGO is easy updatable with the newest files from Gene Ontology and KEGG.
Database for Annotation, Visualization and Integrated Discovery (DAVID) is a web-based tool that provides integrated solutions for the annotation and analysis of genome-scale datasets derived from high-throughput technologies such as microarray and proteomic platforms. Analysis results and graphical displays remain dynamically linked to primary data and external data repositories, thereby furnishing in-depth as well as broad-based data coverage. The functionality provided by DAVID accelerates the analysis of genome-scale datasets by facilitating the transition from data collection to biological meaning.
EASE is useful for summarizing the predominant biological "theme" of a given gene list. Given a list of genes resulting from a microarray or other genome-scale experiment, EASE can rapidly calculate over-representation statistics for every possible Gene Ontology term with respect to all genes represented in the data set.
EasyGO is designed for GO term enrichment analysis for agricultural species. It covers gene identifiers and microarray probe IDs for 15 species, including crops and farm animals.
EGAN is a software tool that allows a bench biologist to visualize and interpret the results of high-throughput exploratory assays in an interactive hypergraph of genes, relationships (protein-protein interactions, literature co-occurrence, etc.) and meta-data (annotation, signaling pathways, etc.). EGAN provides comprehensive, automated calculation of meta-data coincidence (over-representation, enrichment) for user- and assay-defined gene lists, and provides direct links to web resources and literature (NCBI Entrez Gene, PubMed, KEGG, Gene Ontology, iHOP, Google, etc.).
eGOn V2.0 (explore Gene Ontology) is a web-based tool for mapping microarray data on to the Gene Ontology structure. Several input files may be analyzed simultaneously to compare the distribution of the annotated genes for two or more experiments.
Essential features of eGOn V2.0 are:
- Visualization: gene annotations are visualized in the GO DAG or as a table view. The granularity of the GO DAG can be edited freely by the user.
- Filtering: GO annotations can be filtered on evidence codes.
- Include user defined GO annotations: previously added to the NMC Annotation database.
- Statistical analysis: Several gene lists are analyzed simultaneously to compare the distribution of the annotated genes over the GO hierarchy. Statistical tests are implemented to allow the user to compute GO annotation dissimilarities within or between gene lists.
- Connection to Annotation database: Links to the NMC Annotation database, gene and protein information are offered directly from the GO DAG or in exported data.
- Export: GO DAG information, statistical results and gene and protein information can be exported in Excel, text or XML format.
ermineJ is a tool for the analysis of gene sets (user defined or those defined by GO terms) in expression data. The software is designed to be used by biologists with little or no informatics background. A command-line interface is available for users who wish to script the use of ermineJ. Several different methods for scoring gene sets are implemented, with a focus on methods that don't rely on simple "over-representation" measures.
FIVA aids researchers in the prokaryotic community to quickly identify relevant biological processes following transcriptome analysis. Our software is able to assist in functional profiling of large sets of genes and generates a comprehensive overview of affected biological processes.
FuncAssociate is a web-based tool that accepts as input a list of genes, and returns a list of GO attributes that are over- (or under-) represented among the genes in the input list. Only those over- (or under-) representations that are statistically significant, after correcting for multiple hypotheses testing, are reported. Currently 10 organisms are supported. In addition to the input list of genes, users may specify a) whether this list should be regarded as ordered or unordered; b) the universe of genes to be considered by FuncAssociate; c) whether to report over-, or under-represented attributes, or both; and d) the p-value cutoff.
A new version of FuncAssociate (still at the beta stage!) is now available. This version supports a wider range of naming schemes for input genes, and uses more frequently updated GO associations. However, some features of the original version, such as sorting by LOD or the option to see the gene-attribute table, are not yet implemented.
Iowa State University
FuncExpression is a web-based resource for functional interpretation of large scale genomics data. FuncExpression can be used for the functional comparison of plant, animal, and fungal gene name lists generated from genomics and proteomics experiments. Multiple gene lists can be classified, compared and visualized. FuncExpression supports two way-integration of plant gene functional information and the gene expression data, which allows for further cross-validation with plant microarray data from related experiments at BarleyBase.
Institut National de la Santé et de la Recherche Medicale (INSERM), Centre de Recherche des Cordeliers, Paris, France
[Publication abstracts 1, 2, 3]
FunCluster is a genomic data analysis tool designed to perform a functional analysis of gene expression data obtained from cDNA microarray experiments. Besides automated functional annotation of gene expression data, FunCluster functional analysis allows to detect co-regulated biological processes (i.e. represented by annotating genomic themes) through a specifically designed co-clustering procedure involving biological annotations and gene expression data. FunCluster's functional analysis relies on Gene Ontology and KEGG annotations and is currently available for three organisms: Homo sapiens, Mus musculus and Saccharomyces cerevisiae.
FunCluster is provided as a standalone R package, which can be run on any operating system for which an R environment implementation is available (Windows, Mac OS, various flavors of Linux and Unix). Download it from the FunCluster website, or from the worldwide mirrors of CRAN. FunCluster is provided freely under the GNU General Public License 2.0.
FunNet is designed as an integrative tool for analyzing gene co-expression networks built from microarray expression data. The analytical model implemented in this tool involves two abstraction layers: transcriptional (i.e. gene expression profiles) and functional (i.e. biological themes indicating the roles of the analyzed transcripts). A functional analysis technique, which relies on Gene Ontology and KEGG annotations, is applied to extract a list of relevant biological themes from microarray gene expression data. Afterwards multiple-instance representations are built to relate relevant biological themes to their annotated transcripts. An original non-linear dynamical model is used to quantify the contextual proximity of relevant genomic themes based on their patterns of propagation in the gene co-expression network (i.e. capturing the similarity of the expression profiles of the transcriptional instances of annotating themes). In the end an unsupervised multiple-instance spectral clustering procedure is used to explore the modular architecture of the co-expression network by grouping together biological themes demonstrating a significant relationship in the co-expression network. Functional and transcriptional representations of the co-expression network are provided, together with detailed information on the contextual centrality of related transcripts and genomic themes.
FunNet is provided both as a web-based tool and as a standalone R package. The standalone R implementation can be run on any operating system for which an R environment implementation is available (Windows, Mac OS, various flavors of Linux and Unix) and can be downloaded from the FunNet website, or from the worldwide mirrors of CRAN. Both implementations of the FunNet tool are provided freely under the GNU General Public License 2.0.
G-SESAME contains a set of tools. They are
- Tools for measuring the semantic similarity of GO terms.
- Tools for measuring the functional similarity of genes.
- Tools for clustering genes based on their GO term annotation information.
GARBAN is a tool for analysis and rapid functional annotation of data arising from cDNA microarrays and proteomics techniques. GARBAN has been implemented with bioinformatic tools to rapidly compare, classify, and graphically represent multiple sets of data (genes/ESTs, or proteins), with the specific aim of facilitating the identification of molecular markers in pathological and pharmacological studies. GARBAN has links to the major genomic and proteomic databases (Ensembl, GeneBank, UniProt Knowledgebase, InterPro, etc.), and follows the criteria of the Gene Ontology Consortium (GO) for ontological classifications. Source may be shared: e-mail firstname.lastname@example.org.
GENECODIS is a web-based tool for the functional analysis of gene lists. It integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by their statistical significance. It allows the analysis of annotations from different databases such as GO, KEGG or SwissProt.
GeneMerge returns functional genomic information for a given set of genes and provides statistical rank scores for over-representation of particular functions or categories in the dataset. All GO species are represented in addition to other species and functional genomic data.
GFINDer: Genome Function INtegrated Discoverer is a multi-database system providing large-scale lists of user-classified sequence identifiers with genome-scale biological information and functional profiles biologically characterizing the different gene classes in the list. GFINDer automatically retrieves updated annotations of several functional categories from different sources, identifies the categories enriched in each class of a user-classified gene list, and calculates statistical significance values for each category. Moreover, GFINDer enables to functionally classify genes according to mined functional categories and to statistically analyse the obtained classifications, aiding in better interpreting microarray experiment results.
NYU Bioinformatics Group
GOALIE (Generalized Ontological Algorithmic Logical Invariants Extractor) is a tool for the construction of time-course dependent enrichments. Requires an ODBC connection to an instance of the GO database.
GOdist is a Matlab program that analyzes Affymetrix microarray expression data implementing Kolmogorov-Smirnov (KS) continuous statistics approach. It also implements the discrete approach using Fisher exact test employing a two-tailed hypergeometric distribution. GOdist enables detection of both kinds of changes within specific GO terms represented on the array in relation to different populations: the global array population, the direct parents of the analyzed GO term and the global parent of it (e.g. biological process, molecular function or cellular component).
Gene Ontology Enrichment Analysis Software Toolkit (GOEAST) is a web based software toolkit providing easy to use, visualizable, comprehensive and unbiased Gene Ontology (GO) analysis for high-throughput experimental results, especially for results from microarray hybridization experiments. The main function of GOEAST is to identify significantly enriched GO terms among give lists of genes using accurate statistical methods.
Gene Ontology Explorer (GOEx) combines data from protein fold changes with GO over-representation statistics to help draw conclusions in proteomic experiments. It is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. A details description of these methods is provided in the publication.
University of California, Riverside
To test a sample population of genes for overrepresentation of GO terms, the R/BioC function GOHyperGAll computes for all GO nodes a hypergeometric distribution test and returns the corresponding p-values. A subsequent filter function performs a GO Slim analysis using default or custom GO Slim categories. Basic knowledge about R and BioConductor is required for using this tool.
Genomics and Bioinformatics Group of LMP, NCI, NIH and Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University
High-Throughput GoMiner is an 'industrial-strength' integrative Gene Ontology tool for interpretation of multiple-microarray experiments. GoMiner is a Java-based program package that organizes lists of 'interesting' genes (e.g., up- and down-regulated genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. GoMiner provides quantitative and statistical output files and two useful visualizations: (i) a tree-like structure analogous to that in the AmiGO browser and (ii) a compact, dynamically interactive DAG. Genes displayed in GoMiner are linked to major public bioinformatics resources. A companion tool, MatchMiner, can be used as a preprocessor to obtain gene names for input to GoMiner or other GO tools. For users running under a Unix-based operating system, there is an automated script for easy installation of the local database.
GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. These are determined in a data driven manner. GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. The tool supports several input formats: gene symbol, gene and protein RefSeq, Uniprot, Unigene and Ensembl. Supported organisms include: human, mouse, rat, yeast, D. melanogaster, C. elegans and A. thaliana. The input to GOrilla is either a ranked gene list or target and background sets. The graphical output shows the results in the context of the GO DAG.
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
GOstat, is an easy to use web tool to determine statistically significant over- or under-represented GO categories within lists of genes. Data files are updating monthly.
GoSurfer uses Gene Ontology information in the analysis of gene sets obtained from genome-wide computations, microarray analysis or any other highly parallel method. It includes rigorous statistical testing, interactive graphics and automated updating of the annotation available for common gene identifiers (UniGene, LocusLink) or Affymetrix probe sets.
The GO Term Finder searches for significant shared GO terms, or parents of the GO terms, used to annotate gene products in a given list. A web-based GO Term Finder at Saccharomyces Genome Database searches annotations of budding yeast gene products. A generic GO Term Finder has been created by Stanford Microarray Database and can be downloaded from CPAN. This code has been used to implement a web-based generic GO Term Finder by the Princeton genomics group; this implementation provides analysis, via a web tool, of genes from any species (including human) for which there are GO annotations publicly available through the GO web site.
GOTM is a web-based tool for the analysis and visualization of sets of interesting genes based on Gene Ontology hierarchies. This tool provides user friendly data navigation and visualization. It generates expandable tree for browsing the GO hierarchy, fixed tree as HTML output for archive and Bar charts at different annotation levels for publication. GOTM provides statistical analysis to indicate GO categories with relatively enriched gene numbers and suggest biological areas that warrant further study. Enriched GO categories can be visualized in Sub-trees or DAGs. Subset of genes can be retrieved by GO term or keyword searching. Detailed information for each gene can be retrieved directly from our a local database GeneKeyDB.
GOToolBox is a series of web-based programs allowing the identification of statistically over- or under-represented terms in a gene dataset relative to a reference gene set; the clustering of functionally related genes within a set; and the retrieval of genes sharing annotations with a query gene. GO annotations can also be constrained to a slim hierarchy or a given level of the ontology and terms can be filtered on evidence codes. Updated monthly with GO and gene association files.
GraphWeb allows the detection of modules from biological, heterogeneous and multi-species networks, and the interpretation of detected modules using Gene Ontology, cis-regulatory motifs and biological pathways.
We developed the Genomic Regions Enrichment of Annotations Tool (GREAT) to analyze the functional significance of cis-regulatory regions identified by localized measurements of DNA binding events across an entire genome. Whereas previous methods took into account only binding proximal to genes, GREAT is able to properly incorporate distal binding sites and control for false positives using a binomial test over the input genomic regions. GREAT incorporates annotations from 20 ontologies and is available as a web application. The utility of GREAT extends to data generated for transcription-associated factors, open chromatin, localized epigenomic markers and similar functional data sets, and comparative genomics sets.
Tool submission date: May 2010
L2L is a simple but powerful tool for discovering the hidden biological significance in microarray data. Through an easy-to-use web interface, L2L will mine a list of up- or down-regulated genes for Gene Ontology terms that are significantly enriched. L2L can also compare the list of genes to a database of hundreds of published microarray experiments, in order to identify common patterns of gene regulation. A downloadable command-line version can run customized and batch analyses.
MAPPFinder is an accessory program for GenMAPP. This program allows users to query any existing GenMAPP Expression Dataset Criterion against GO gene associations and GenMAPP MAPPs (microarray pathway profiles). The resulting analysis provides the user with results that can be viewed directly upon the Gene Ontology hierarchy and within GenMAPP, by selecting terms or MAPPs of interest.
Meta Gene Profiler (MetaGP) is a web application tool for discovering differentially expressed gene sets (meta genes) from the gene set library registered in our database. Once user submits gene expression profiles which are categorized into subtypes of conditioned experiments, or a list of genes with the valid pvalues, MetaGP assigns the integrated p-value to each gene set by combining the statistical evidences of genes that are obtained from gene-level analysis of significance. The current version supports the nine Affymetrix GeneChip arrays for the three organisms (human, mouse and rat). The significances of GO terms are graphically mapped onto the directed acyclic graph (DAG). The navigation systems of GO hierarchy enable us to summarize the significance of interesting sub-graphs on the web browser.
(MeV) is a versatile microarray data analysis tool, incorporating sophisticated algorithms for clustering, visualization, classification, statistical analysis and biological theme discovery. Analyze gene expression or CGH microarray data and with MeV's many clustering, statistical analysis and graphical display tools. MeV generates informative and interrelated displays of expression and annotation data from single or multiple experiments.
Onto-Compare is a web based tool that permits comparison of commercial microarrays based on GO. Onto-Compare allows the user to assess the functional bias associated with each array and helps determine the best microarray for a given biological phenomenon described using GO terms.
Onto-Design allows the user to design custom microarrays by selecting a set of UniGene cluster IDs that represent a given subset of biological processes described using GO terms.
Onto-Express searches the public databases and returns tables that correlate expression profiles with the cytogenetic gene locations, the biochemical and molecular functions, the biological processes, cellular components and cellular roles of the translated proteins.
Onto-Miner allows searching of various public bioinformatics databases via clone ID, UniGene gene symbol, LocusLink ID, accession number etc. and can carry out batch mode queries using entire lists of genes. The site can be used as a resource by third party developers who would like to provide detailed gene information for arbitrary lists of genes.
Onto-Translate is a web based tool that allows the user to quickly translate lists of accession IDs, UniGene cluster IDs and Affymetrix probe IDs from one to another. Onto-Translate helps identifying the same information across various databases and reduce the redundancy in arbitrary lists of genes.
OntoGate provides access to GenomeMatrix (GM) entries from Ontology terms and external datasets which have been associated with ontology terms, to find genes from different species in the GM, which have been mapped to the ontology terms. OntoGate includes a BLAST search of amino acid sequences corresponding to annotated genes.
The Ontologizer The Ontologizer is a Java webstart application for GO term enrichment analysis that provides browsing and graph visualization capabilities. The Ontologizer allows users to analyze data with the standard Fisher exact test and also the parent-child method and topology methods.
The tool can be started directly from the web using Java webstart. For graph visualizations, users need to install the GraphViz library. The tool is freely available to all, and source code is available at SourceForge.
Probe Explorer is an open access web-based bioinformatics application designed to show the association between microarray oligonucleotide probes and transcripts in the genomic context, but flexible enough to serve as a simplified genome and transcriptome browser. Coordinates and sequences of the genomic entities (loci, exons, transcripts), including vector graphics outputs, are provided for fifteen metazoa organisms and two yeasts. Alignment tools are used to built the associations between Affymetrix microarrays probe sequences and the transcriptomes (for human, mouse, rat and yeasts). Search by keywords is available and user searches and alignments on the genomes can also be done using any DNA or protein sequence query.
ProfCom is a web-based tool for the functional interpretation of a gene list that was identified to be related by experiments. A trait which makes ProfCom a unique tool is an ability to profile enrichments of not only available Gene Ontology (GO) terms but also of complex function. A complex function is constructed as Boolean combination of available GO terms. The complex functions inferred by ProfCom are more specific in comparison to single terms and describe more accurately the functional role of genes.
SeqExpress is a comprehensive analysis and visualisation package for gene expression experiments. GO is used to assign functional enrichment scores to clusters, using a combination of specially developed techniques and general statistical methods. These results can be explored using the in built ontology browsing tool or through the generated web pages. SeqExpress also supports numerous data transformation, projection, visualisation, file export/import, searching, integration (with R), and clustering options.
SerbGO is a web-based tool intended to assist researchers determine which microarray tools for gene expression analysis which make use of the GO ontologies are best suited to their projects. SerbGO is a bidirectional application. The user can ask for some features by checking on the Query Form to get the appropriate tools for their interests. The user can also compare tools to check which features are implemented in each one.
SOURCE compiles information from several publicly accessible databases, including UniGene, dbEST, UniProt Knowledgebase, GeneMap99, RHdb, GeneCards and LocusLink. GO terms associated with LocusLink entries appear in SOURCE.
The Short Time-series Expression Miner (STEM) is a Java program for clustering, comparing, and visualizing short time series gene expression data (8 time points or less). STEM allows researchers to identify significant temporal expression profiles and the genes associated with these profiles and to compare the behavior of these genes across multiple conditions. STEM is fully integrated with the Gene Ontology (GO) database and supports GO category gene enrichment analyses for sets of genes having the same temporal expression pattern. STEM also supports the ability to easily determine and visualize the behavior of genes belonging to a given GO category, identifying which temporal expression profiles were enriched for these genes.
T-Profiler uses the t-test to score changes in the average activity of pre-defined groups of genes. The gene groups are defined based on Gene Ontology categorization, ChIP-chip experiments, upstream matches to a consensus transcription factor binding motif, and location on the same chromosome, respectively. A jack-knife procedure is used to make calculations more robust against outliers. T-profiler makes it possible to interpret microarray data in a way that is both intuitive and statistically rigorous, without the need to combine experiments or choose parameters.
THEA (Tools for High-throughput Experiments Analysis) is an integrated information processing system dedicated to the analysis of post-genomic data. It allows automatic annotation of data issued from classification systems with selected biological information (including the Gene Ontology). Users can either manually search and browse through these annotations, or automatically generate meaningful generalizations according to statistical criteria (data mining).
Download icon courtesy of mac.axonz.com.