|
|
|
Evolutionary BioinformaticsBengt SennbladAdresses:
SBC and
CGB
|
Publications Group members Current proposals for undergraduate projects Course material |
Studies of genome evolution have become increasingly important
in comparative genomics, as comparisons over multiple genomes becomes
necessary. This includes orthology analysis which provides the
fundamental correspondence between genes in different genomes and
is an important tool in gene function prediction, as well as studies of
selection and genome organization. For such studies, there has emerged
a need for evaluation in a probabilistic perspective and, more
important, a need to put a reliability measure or a confidence on results.
The long-term aim of this project is to develop a unified probabilistic
framework and software system for genome-wide evolutionary Comparative Genomics
analyses with possibilities to incorporate data and models relating to
various aspects of genome biology, e.g., gene or genome duplications,
lateral gene transferes, sequence evolution, adaptation and genome
organization. Hierarchical Bayesian analysis provides such a
probabilistic framework for Functional Genomics studies. It also provides
a powerful reliability measure, 'posterior probability', allowing us to
determine the probability for individual hypotheses or to identify confidence
sets comprising the most probable hypotheses
In this collaborative work with Lars Arvestad, Ann-Charlotte Berglund and Jens Lagergren we have developed, the gene evolution model the first probabilistic model for the evolution of genes through duplications and loss. The model has been implemented using MCMC and (Arvestad, Berglund, Lagergren and Sennblad, 2003, Arvestad, Lagergren and Sennblad, accepted, and presented at the ISMB 2003 conference). This model was then extended into the gene sequence evolution model by inclusion of substitution models under a molecular clock assumption( Arvestad, Berglund, Lagergren and Sennblad, 2004, presented at the RECOMB 2004 conference). Recently, we have removed the molecular clock assumption by integrating a model for substitution rate variation over the gene tree (see further below). The resulting model, the Gene Sequence evolution model with iid Rates, GSR, (Sennblad, Arvestad and Lagergren, accepted pending revision) provides an integrated model of evolution and allows us to base our analysis directly on sequence data. We have also developed an new algorithm that greatly improve analysis time and allows for genome-wide analysis; this is demonstrated on a yeast data set with 4500 gene families for 14 taxa (Åkerborg, Sennblad, Arvestad and Lagergrenn, accepted pending revision). Current work include an extension of the GEM model to occur in hybrid networks. Together with a new model for the species evolution that includes hybridization events and an algorithm that allows efficient likelihood computation for species netwroks, this provides a biologically realistic means for the reconstruction of species networks from gene data. This work has been presented on the ESEB 2007 and at the 'Systematikdagarna' 2008 conferences. We also aim at to include other horizontal gene transfer processes; this work is carried out by Ali Tofigh.
Time is the common denominator in evolutionary processes. However, in
models for these processes, time and evolutionary rate is often
confounded. Thus, in, e.g., substitution models the 'edge length' is
measured in number of substitutions occuring over the branch in
question. A common time-scale is key for the integration of different
evolutionary models; the natural choice would be some chronological
time scale. With this in mind, we have initiated a project aimed at
modeling of rate variation, which will enable us to also estimate
chronological times. An undergraduate project, performed by Martin Linder (currently at Uppsala University), evaluates
different rate models for the substitution process, including a newly
developed iid-based model ( Linder 2003). An extended analysis based on
this work is under revision (Linder, Britton, Sennblad, in revision).
We have also developed new algorithms to improve the speed of the
analysis (Åkerborg, Sennblad, Lagergren, 2008).
This is aimed at providing tools for detection of selection on genes using the rate of non-synonymous to synonymous substitutions. Work here include implementation of a dynamic algorithm for an empirical expected value of Ka/Ks over edges in a tree using ancestral sequences reconstructed with marginal likelihood reconstruction. This project project is performed by Ann-Charlotte Berglund-Sonnhammer, Ninoa Malki and Joerg Lehmann.
Implementations of models and methods developed in the project are collected in PrIME, Probabilistic Integrated Models of Evolution, a software platform for comparative evolutionary studies. For the current status, see the PrIME website.