SBC logo
photo

Evolutionary Bioinformatics

Bengt Sennblad

Adresses: SBC and CGB
Phone: +46 (0)8 - 5537 8572 (SBC)
Mobile: +46 (0)70 - 674 7480
email: bengt.sennblad@sbc.su.se










Publications
Group members
Current proposals for undergraduate projects
Course material

Research

Genome evolution and comparative genomics

Studies of genome evolution have become increasingly important in comparative genomics, as comparisons over multiple genomes becomes necessary. This includes orthology analysis which provides the fundamental correspondence between genes in different genomes and is an important tool in gene function prediction, as well as studies of selection and genome organization. For such studies, there has emerged a need for evaluation in a probabilistic perspective and, more important, a need to put a reliability measure or a confidence on results.
The long-term aim of this project is to develop a unified probabilistic framework and software system for genome-wide evolutionary Comparative Genomics analyses with possibilities to incorporate data and models relating to various aspects of genome biology, e.g., gene or genome duplications, lateral gene transferes, sequence evolution, adaptation and genome organization. Hierarchical Bayesian analysis provides such a probabilistic framework for Functional Genomics studies. It also provides a powerful reliability measure, 'posterior probability', allowing us to determine the probability for individual hypotheses or to identify confidence sets comprising the most probable hypotheses

Current projects

Probabilistic models for gene duplication and loss

In this collaborative work with Lars Arvestad, Ann-Charlotte Berglund and Jens Lagergren we have developed, the gene evolution model the first probabilistic model for the evolution of genes through duplications and loss. The model has been implemented using MCMC and (Arvestad, Berglund, Lagergren and Sennblad, 2003, Arvestad, Lagergren and Sennblad, accepted, and presented at the ISMB 2003 conference). This model was then extended into the gene sequence evolution model by inclusion of substitution models under a molecular clock assumption( Arvestad, Berglund, Lagergren and Sennblad, 2004, presented at the RECOMB 2004 conference). Recently, we have removed the molecular clock assumption by integrating a model for substitution rate variation over the gene tree (see further below). The resulting model, the Gene Sequence evolution model with iid Rates, GSR, (Sennblad, Arvestad and Lagergren, accepted pending revision) provides an integrated model of evolution and allows us to base our analysis directly on sequence data. We have also developed an new algorithm that greatly improve analysis time and allows for genome-wide analysis; this is demonstrated on a yeast data set with 4500 gene families for 14 taxa (Åkerborg, Sennblad, Arvestad and Lagergrenn, accepted pending revision). Current work include an extension of the GEM model to occur in hybrid networks. Together with a new model for the species evolution that includes hybridization events and an algorithm that allows efficient likelihood computation for species netwroks, this provides a biologically realistic means for the reconstruction of species networks from gene data. This work has been presented on the ESEB 2007 and at the 'Systematikdagarna' 2008 conferences. We also aim at to include other horizontal gene transfer processes; this work is carried out by Ali Tofigh.

Probabilistic orthology analysis

The gene evolution model provides a means to perform orthology analysis in a Bayesian framework (Sennblad, Lagergren, accepted pending revision ). The main advantage of this approach compared to traditional orthology analyses (i.e., based on a parsimony approach) is that it makes possible calculation of actual confidence intervals or probabilities for individual orthology statements, regardless of particular reconciliations or parameter settings. We also provide a sensitivity/specificity-based method for orthology prediction.
Inclusion of substitution models, i.e., using the gene sequence evolution model, will allow us to perform orthology/paralogy prediction directly from sequence data

Modeling time and rates for substitution models

Time is the common denominator in evolutionary processes. However, in models for these processes, time and evolutionary rate is often confounded. Thus, in, e.g., substitution models the 'edge length' is measured in number of substitutions occuring over the branch in question. A common time-scale is key for the integration of different evolutionary models; the natural choice would be some chronological time scale. With this in mind, we have initiated a project aimed at modeling of rate variation, which will enable us to also estimate chronological times. An undergraduate project, performed by Martin Linder (currently at Uppsala University), evaluates different rate models for the substitution process, including a newly developed iid-based model ( Linder 2003). An extended analysis based on this work is under revision (Linder, Britton, Sennblad, in revision).
We have also developed new algorithms to improve the speed of the analysis (Åkerborg, Sennblad, Lagergren, 2008).

Detection of adaptive evolution

This is aimed at providing tools for detection of selection on genes using the rate of non-synonymous to synonymous substitutions. Work here include implementation of a dynamic algorithm for an empirical expected value of Ka/Ks over edges in a tree using ancestral sequences reconstructed with marginal likelihood reconstruction. This project project is performed by Ann-Charlotte Berglund-Sonnhammer, Ninoa Malki and Joerg Lehmann.

PrIME - a software platform for integrated models of evolution

Implementations of models and methods developed in the project are collected in PrIME, Probabilistic Integrated Models of Evolution, a software platform for comparative evolutionary studies. For the current status, see the PrIME website.

Publications