Bengt Sennblad's research at SBC
Comparative genomics tools at SBC
My work at the Stockholm Bioinformatics Center (SBC) has, since I
started here as an Assistant Professor in 2000, been aimed at
developing new integrated probabilistic models and algorithms for the
study of the duplication process. This work has been made largely in
collaboration with Jens Lagergren and Lars Arvestad, both at the Royal
Institute of Technology (KTH), and is implemented in a software package
that we have called PrIME.
In the seminal work by Ohno (1970), the gene
duplication process was identified as a key player in the recruitment
of new gene functions to the genome that may affect the genome at
various levels. Gene duplications may occur as single gene (or part of
a gene) duplications, as segmental duplications or even as duplications
of whole chromosomes or genomes (polyploidy), producing a a group of
more or less similar genes within and between organisms called a gene
family. Additionally, auxiliary processes, such as horizontal gene
transfer and hybridization, need to be taken into account.
Understanding the gene duplication process and reconstructing its path
through a gene family’s history as reconciled tree therefore
provide us with important
information about evolution of genes and the recruitment of
new functions to the genome. In particular, it allows us to make
functional predictions for genes with previously unknown function in
one organism based on identified function in another organism –
so-called orthology analysis.
GEM
We have developed the first probabilistic model for gene duplication
and loss, the Gene Evolution Model (GEM; Arvestad et al., 2003, 2009). The model was
developed from scratch and is formally a generalization of the
birth-death process to occur inside a binary tree (the biological
rationale comes from previous empirical studies, Nei
et
al.
1997). To allow efficient analysis, a number of new dynamic
programming (DP) algorithms were described, allowing the computation of
recon- struction probabilities of individual duplication scenarios, the
maximum likelihood scenario, and the sum over all scenarios. This
provides an unrivaled means to study the duplication process in gene
evolution as a reconciled tree
or perform probabilistic
orthology analysis that, in contrast
to previous methods, supplies statistical confidence values for
orthology statements (Sennblad
and Lagergren, 2009).
Substitution rates
In a separate collaboration with Tom Britton, Stockholm University, I
have developed and evaluated models for substitution rate variation
over time (Linder et al., in
press);
substitutions are point mutations in the gene sequence). These models
allow the estimation of the substitution rates and divergence times of
genes or organisms. Besides the direct interest for systematics,
divergence times constitute an important tool in comparative genomics
and enable correlation either with other major events in organism
evolution or with known geological events. An efficient DP algorithm
for the identification of the maximum a posteriori (MAP) time and rate
assignment for given model parameters under these substitution rate
models was developed in (Åkerborg
et
al.,
2007).
GSR
Using the results above, we constructed the first integrated model for
gene duplication– loss and sequence evolution, the Gene Sequence and
Rates model (GSR; Arvestad et
al., 2004, Åkerborg
et
al., 2009). In contrast to GEM, which models genes evolving as
atomic units, GSR models gene as sequences that evolves by
duplications,
losses and substitutions. To obtain an efficient algorithm, we use a
discretization of the species tree, with additional vertices
representing discrete time points in the tree. The gene tree vertices
(e.g., gene duplications), rather than being placed on species tree
edges and vertices as in GEM, are now placed on vertices of the
discretized species tree. By applying a substitution rate model and a
standard model for sequence evolution, we devised a DP algorithm for
summing probabilities of possible gene evolution scenarios that allows
computing the probability of a gene tree given the input gene sequence
data as well as the species tree. In earlier methods, constraints
implied by the species tree have been completely ignored when
reconstructing the gene tree; additionally, gene evolution had to be
inferred from a subsequent reconciliation between the gene tree and the
species tree and all information about support from the sequence data
was then lost. GSR implemented in a MCMC freamework for parameter
estimationq allowed us to study duplication scenarios and gene tree
evolution directly from sequence data without loss of information. We
demonstrated this using genome-wide yeast data analyzed on the KTH
Parallel Computing Center’s Ferlin-cluster.
DTLSR
We have by including lateral gene transfer events extended GSR into the
Duplication, Transfer, Loss, Sequence with Rates model (DTLSR) (Ali Tofigh, 2009,
Using trees to capture reticulate evolution – lateral gene transfers
and cancer progression, PhD thesis), in addition to gene
duplications and losses. This allows genes to change lineage in the
species tree during its evolution. To compute probabilities of
scenarios we now need to use numeric approximations of the resulting
differential equations combined with the DP framework from GSR. We are
currently working on a manuscript on the implementation and application
of the model to genome-wide analysis of several bacterial datasets and
a eukaryote case study data set.
Hybridization
Lastly, I have extended our GEM framework to allow species networks
allowing hybridization events between species lineages (Sennblad, 2007,
ESEB XI). Besides the extension of GEM (GSR can be extended in a
similar manner), the major part of this work consists a probabilistic
model for the hybridizing species evolution, and a new DP algorithm for
the computation of its reconstruction probability. This model is
currently being implemented.
- M. Nei, X. Gu, and T. Sitnikova. Evolution
by
the birth-and-death process in multigene families of the vertebrate
immune system. Proc. Natl. Acad. Sci. U. S. A., 94(15):7799–7806., 1997.
- S. Ohno. Evolution by gene
duplication. Springer, Berlin, 1970.