Bengt Sennblad's research at SBC

Comparative genomics tools at SBC

My work at the Stockholm Bioinformatics Center (SBC) has, since I started here as an Assistant Professor in 2000, been aimed at developing new integrated probabilistic models and algorithms for the study of the duplication process. This work has been made largely in collaboration with Jens Lagergren and Lars Arvestad, both at the Royal Institute of Technology (KTH), and is implemented in a software package that we have called PrIME.
In the seminal work by Ohno (1970), the gene duplication process was identified as a key player in the recruitment of new gene functions to the genome that may affect the genome at various levels. Gene duplications may occur as single gene (or part of a gene) duplications, as segmental duplications or even as duplications of whole chromosomes or genomes (polyploidy), producing a a group of more or less similar genes within and between organisms called a gene family. Additionally, auxiliary processes, such as horizontal gene transfer and hybridization, need to be taken into account. Understanding the gene duplication process and reconstructing its path through a gene family’s history as reconciled tree therefore provide us with important information about evolution of genes and the recruitment of new functions to the genome. In particular, it allows us to make functional predictions for genes with previously unknown function in one organism based on identified function in another organism – so-called orthology analysis.

GEM

We have developed the first probabilistic model for gene duplication and loss, the Gene Evolution Model (GEM; Arvestad et al., 2003, 2009). The model was developed from scratch and is formally a generalization of the birth-death process to occur inside a binary tree (the biological rationale comes from previous empirical studies, Nei et al. 1997). To allow efficient analysis, a number of new dynamic programming (DP) algorithms were described, allowing the computation of recon- struction probabilities of individual duplication scenarios, the maximum likelihood scenario, and the sum over all scenarios. This provides an unrivaled means to study the duplication process in gene evolution as a reconciled tree or perform probabilistic orthology analysis that, in contrast to previous methods, supplies statistical confidence values for orthology statements (Sennblad and Lagergren, 2009).

Substitution rates

In a separate collaboration with Tom Britton, Stockholm University, I have developed and evaluated models for substitution rate variation over time (Linder et al., in press); substitutions are point mutations in the gene sequence). These models allow the estimation of the substitution rates and divergence times of genes or organisms. Besides the direct interest for systematics, divergence times constitute an important tool in comparative genomics and enable correlation either with other major events in organism evolution or with known geological events. An efficient DP algorithm for the identification of the maximum a posteriori (MAP) time and rate assignment for given model parameters under these substitution rate models was developed in (Åkerborg et al., 2007).

GSR

Using the results above, we constructed the first integrated model for gene duplication– loss and sequence evolution, the Gene Sequence and Rates model (GSR; Arvestad et al., 2004, Åkerborg et al., 2009). In contrast to GEM, which models genes evolving as atomic units, GSR models gene as sequences that evolves by duplications, losses and substitutions. To obtain an efficient algorithm, we use a discretization of the species tree, with additional vertices representing discrete time points in the tree. The gene tree vertices (e.g., gene duplications), rather than being placed on species tree edges and vertices as in GEM, are now placed on vertices of the discretized species tree. By applying a substitution rate model and a standard model for sequence evolution, we devised a DP algorithm for summing probabilities of possible gene evolution scenarios that allows computing the probability of a gene tree given the input gene sequence data as well as the species tree. In earlier methods, constraints implied by the species tree have been completely ignored when reconstructing the gene tree; additionally, gene evolution had to be inferred from a subsequent reconciliation between the gene tree and the species tree and all information about support from the sequence data was then lost. GSR implemented in a MCMC freamework for parameter estimationq allowed us to study duplication scenarios and gene tree evolution directly from sequence data without loss of information. We demonstrated this using genome-wide yeast data analyzed on the KTH Parallel Computing Center’s Ferlin-cluster.

DTLSR

We have by including lateral gene transfer events extended GSR into the Duplication, Transfer, Loss, Sequence with Rates model (DTLSR) (Ali Tofigh, 2009, Using trees to capture reticulate evolution – lateral gene transfers and cancer progression, PhD thesis), in addition to gene duplications and losses. This allows genes to change lineage in the species tree during its evolution. To compute probabilities of scenarios we now need to use numeric approximations of the resulting differential equations combined with the DP framework from GSR. We are currently working on a manuscript on the implementation and application of the model to genome-wide analysis of several bacterial datasets and a eukaryote case study data set.

Hybridization

Lastly, I have extended our GEM framework to allow species networks allowing hybridization events between species lineages (Sennblad, 2007, ESEB XI). Besides the extension of GEM (GSR can be extended in a similar manner), the major part of this work consists a probabilistic model for the hybridizing species evolution, and a new DP algorithm for the computation of its reconstruction probability. This model is currently being implemented.


  1. M. Nei, X. Gu, and T. Sitnikova. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. U. S. A., 94(15):7799–7806., 1997.
  2. S. Ohno. Evolution by gene duplication. Springer, Berlin, 1970.