Wed Jan 26 | 15.15 | Isaac Elias | SBC | ||||
A 1.375-Approximation Algorithm for Sorting by Transpositions | |||||||
Sorting permutations by transpositions is an important problem in genome rearrangements. A transposition is a rearrangement operation in which a segment is cut out of the permutation and pasted in a different location. The complexity of this problem is still open and it has been a ten-year-old open problem to improve the best known 1.5-approximation algorithm. We provide a 1.375-approximation algorithm for sorting by transpositions. The algorithm is based on new results regarding the diameter of three subsets of the symmetric group: We determine the exact transposition diameter of 2-permutations and simple permutations, and find an upper bound for the diameter of 3-permutations.
Joint work with Tzvika Hartman, Dept. of Molecular Genetics, Weizmann Institute of Science | |||||||
Wed Feb 16 | 15.15 | Åsa Björklund | SBC | ||||
Multi-domain proteins in the three kingdoms of life | |||||||
Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. We have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, in addition to Pfam-A and SCOP. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we have shown that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have about 65% multi-domain proteins, while the prokaryotes consist of around 40% multi-domain proteins. In conclusion, all eukaryotes have similar fractions of multi-domain proteins and disorder whereas a high fraction of repeats is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions. | |||||||
Wed Feb 23 | 15.15 | Mathias Uhlén, KTH | |||||
The Swedish Human Proteome Resource | |||||||
Status and recent progress in HPR | |||||||
Wed Mar 2 | 15.15 | Håkan Viklund, SBC | |||||
Wed Mar 9 | 15.15 | Erik Sonnhammer, CGB | |||||
New algorithms for sequence distance estimation and for HMM searching | |||||||
Wed Mar 16 | 15.15 | Diana Ekman, SBC | |||||
Evolution of multi-domain proteins | |||||||
Most eukaryotic proteins are multi-domain proteins, i.e. consist of more than one protein domain. We have studied the events that have built these multi-domain proteins. Proteins have been compared based on their domain architectures and a "domain distance" calculated for each pair of proteins. The insertions, repetitions and exchanges of domains that distinguish a protein from its nearest neighbors have been counted and it was found that insertions are somewhat more common than repetitions and that exchanges are not very frequent. Further, insertions and repetitions are approximately equally common at the N- and C-terminals while exchanges have been found more often at the C-terminal. In addition we show that domain distance correlates well with sequence similarity and semantic similarity, based on GO-annotations, and we use this measure to build evolutionary trees. Domain evolution is then exemplified with two modular protein families, non-receptor tyrosine kinases and the Rho GEFs, and a pattern of repetition. | |||||||
Wed Mar 23 | 15.15 | Erik Granseth, SBC | |||||
Halftime seminar: Characterization of the membrane-water interface region of membrane proteins and determination of the membrane proteome of E. coli | |||||||
The amount of membrane proteins with known structure grows exponentially. Today, there are approximately 100 membrane proteins deposited in PDB, which was the number of soluble proteins available in 1974. In my half-time seminar, I will first present a study of the membrane-water interface region. Most statistical studies have focused on the membrane region, but this study moves a few Ångström away from the membrane and discusses the structural constraints imposed on the transmembrane helices. The second part of the presentation will be about global topology analysis of the E coli membrane proteome. By experimentally finding out whether the Cterminal of a membrane protein is located in the cytoplasm or periplasm, high quality topology models for 601 membrane proteins have been produced, which is important for future functional studies. | |||||||
Fri Mar 30 | 10.00 | Johannes Frey-Skött | |||||
Analysis of alternative splicing | |||||||
Halftime seminar | |||||||
Wed Apr 6 | 15.15 | Tomas Ohlson | SBC | ||||
Halftime seminar | |||||||
Wed Apr 20 | 15.15 | Marie Öhman | MolBio/SU | ||||
An approach to find novel sites of mRNA editing | |||||||
The ADAR enzymes (1 and 2) catalyze the conversion of adenosine (A) into inosine (I) in RNA by a hydrolytic deamination. This A to I editing acts on RNA that is double stranded, without a defined consensus recognition sequence. Site selective editing has mainly been found in the pre-mRNA of genes involved in neurotransmission in the mammalian brain. Apart from generation of multiple protein isoforms by codon changes, RNA editing plays important roles in regulating other RNA processing events like splicing. We have developed a method that can be used on various tissues as well as species to detect novel single sites of A to I editing. The method is based on an immunoprecipitation assay followed by analyses on microarray. RNA substrates subjected to site selective editing are retrieved from RNA-protein complexes using ADAR2 antibodies. Combined with computational analysis we anticipate to find novel sites of editing that has been overlooked by other experimental methods and computational analyses. Using this approach it is possible, in a unique way, to discover single sites of selective A to I editing. | |||||||
Monday Apr 25 | 15.00 | Karin Melen | |||||
Half time seminar: Increasing the accuracy of membrane protein topology prediction: Application on whole proteomes of E. coli and S. cerevisiae. | |||||||
In an ideal world the structures of all proteins in every organism would be solved and the functions of all proteins would be identified. If we knew the structures of membrane proteins, drugs would be more easily developed and mankind would hopefully be happier. We are not there yet but on our way efforts are made not only to solve protein structures but also to gain insights about structure and function by other means. Here we are trying to increase the knowledge about structural features of membrane proteins by improving TMHMM, a well-known method for membrane topology prediction, and applying the refined method to whole proteome studies. I will present TMHMMfix, which enables incorporation of experimental information into the predictions, something that has turned out to improve the accuracy significantly. I will also show how we have used TMHMMfix to map the topology of nearly all membrane proteins in E.coli and S.cerevisiae. | |||||||
Wed Apr 27 | 15.15 | Tomas Bergström | LCB, Uppsala | ||||
Genome Divergence between Humans and Chimpanzees: A story about substitutions, indels and alternative splicing | |||||||
The high quality sequence of the chimpanzee chromosome 22 has been compared to the orthologus human chromosome 21. The relative contribution of substitutions and insertions/deletions (indels) was analyzed for the 33 Mpb alignment. A particular focus was made on indels in coding regions and their effect on alternatively spliced transcripts. | |||||||
Wed May 11 | 15.15 in FA31 | Sergei Maslov | Brookhaven National Laboratory | ||||
Detecting topological patterns in protein networks. | |||||||
Bio-molecular networks lack the top-down design.
Instead, selective forces of biological evolution
shape them from raw material provided by random events
such as gene duplications and single gene mutations.
As a result individual connections in these networks
are characterized by a large degree of randomness.
One may wonder which connectivity patterns are indeed random, while
which arose due to networks's growth, evolution,
and/or its fundamental design principles and limitations?
Here we introduce a general method [1,2] allowing one to construct a
random version of a given network while preserving the desired set of
its low-level topological features, such as, e.g., the number of
neighbors of individual nodes, the average level of modularity, numbers
of small network motifs, etc. Such a null-model network can then be used
to detect and quantify non-random topological patterns. In particular,
we measure correlations between numbers of neighbors of interacting
nodes in protein binding and regulatory networks in yeast [1]. It was
found that in both these networks, links between highly connected
proteins are systematically suppressed.
We proceed by presenting a set of empirical findings about how gene
duplications shape protein interaction and genetic regulatory networks
in several organisms [3]. It is shown that molecular networks in yeast
combine the plasticity of regulatory connections with a relative
stability of protein functions manifested in the set of their binding
partners. We believe this to be a general feature affecting the
evolvability of bio-molecular networks.
| |||||||
Wed May 11 | 16.00 at DBB | Bjorn Wallner | SBC | ||||
Pre-dissertation seminar | |||||||
Please notice time, date, and place! | Spring | Siv Andersson, Uppsala | |||||
Computational inference of scenarios for alpha-proteobacterial genome evolution | |||||||
This seminar is postponed! | |||||||
Wed May 18 | 15.15 | Lars Arvestad | SBC | ||||
Querying Pubmed and managing citations from the commandline. | |||||||
I will describe and demo my system for interacting with PubMed from the commandline. This started of as a simple way of extracting BibTeX citations from PubMed, but is now a simple yet powerful tool for submitting searches, archiving selections of articles, and accessing information. | |||||||
Wed May 25 | 15.15 | Thomas Bürglin | Dept. Bioscience and CGB, KI | ||||
Solving C. elegans developmental biology: experimental approaches and development of computational tools | |||||||
Sat-Sun May 28-29 | SBC Workshop | ||||||
Comparative Genomics and Protein Structure | |||||||
Tue May 31 | 15.15 | Henrik Kaessmann, Center for Integrative Genomics, Lausanne | |||||
Wed Aug 31 | 13.00 | Kenta Nakai | |||||
Searching for sequence determinants of the translation efficiency of proteins in a cell-free system | |||||||
Wed Sept 28 | 15.15 | Olof Emanuelsson | SBC | ||||
Genomic tiling microarrays and the ENCODE project | |||||||
Genomic tiling microarrays have recently become a popular platform for interrogating the activity of large genomic regions in an unbiased fashion. I will introduce some key concepts of tiling microarrays, along with some recent applications within the human genome world. The focus will mainly be on how to use different tiling microarray strategies for transcription mapping. I will also give an introduction to the world-wide ENCODE consortium. | |||||||
Wed Oct 5 | 15.15 | Lukasz Huminiecki | KI | ||||
Evolution of Expression Pattern Diversity: a combined computational and experimental approach | |||||||
To examine the process by which duplicated genes diverge in
expression, we studied how transcriptional profiles of orthologous
gene sets in human and mouse were affected by the presence of
additional recent species-specific paralogs. Gene expression profiles
were compared across 16 homologous tissues in human and mouse using
microarray data from the Gene Expression Atlas 1 (integrated with
LocusLink and Ensembl) for 1575 sets of orthologs, including 250 with
species-specific paralogs. Orthologs that have undergone recent
duplication were less likely to have strongly correlated expression
profiles than those that remained in a one-to-one relationship. Our
results suggest that gene expression profiles are surprisingly labile,
especially in lineages where a duplication event has occurred, and
that transcription in a particular tissue may be repeatedly gained or
lost during the evolution of even small gene families [Huminiecki and
Wolfe, 2004].
Other researchers have also noted that orthologs may be poorly correlated in their expression profiles [Jordan IK et al., 2005; Khaitovich et al., 2004]. However, it is yet difficult to resolve how much of the variability reflects real biology, and how much could be attributed to differences in sample ontology, the extraction procedure, RNA isolation, microarray setup, cross-hybridization, or bioinformatics. For example, we have previously shown that internal consistency of many publicly available expression datasets is rather low, and that expression profiles for the same gene derived from different experimental platforms (such as SAGE, ESTs or microarrays) do not in general correlate well [Huminiecki et al. 2003; Huminiecki and Bicknell, 2000].
In collaboration and with generous support from Pfizer UK (Sandwich, Kent), we are currently generating a qPCR/TaqMan-based dataset of expression profiles for a number of G-protein coupled receptors (GPCRs) of pharmaceutical interest. We are focusing on peripherally expressed type-A GPCRs in human, mouse, rat, guinea pig and dog. Quantitative PCR is more specific, sensitive and precise than microarray platforms. Thus, we hope that this novel approach will help to establish the true extent of conservation of gene expression patterns in placental mammals. In addition, we will strive to gain a deeper understanding into the relevance and suitability of animal species used for functional efficacy and toxicological studies at Pfizer.
| |||||||
Mon Oct 10 | 15.30 | Tomas Ohlson | |||||
Improving protein sequence alignments using evolutionary information and machine learning techniques | |||||||
Pre-dissertation seminar
Location: Magnelisalen, Stockholm University
The quality of the alignment might be the most important step in protein structure prediction using homology modeling. For closely related sequences one can use PSI-BLAST to produce satisfying alignments, but when the sequences are more distantly related PSI-BLAST can't make any good alignments. Instead profile-profile alignment methods can be used. I will present a benchmarking study on profile--profile alignment methods and also how the alignments can be further improved using machine learning techniques. | |||||||
Wed Oct 12 | 15.15 | Andrey Alexeyenko | In room FA32, Albanova | ||||
FunCoup: A multi-facetted predictor of functional links between genes of eukaryotic organisms | |||||||
The method called FunCoup is aimed to discover novel functional links
between proteins (genes). FunCoup uses Bayesian networks optimized
with multivariate techniques to integrate genomics and proteomics
information of various types and from different sources. The crucial
novelty is involving data on orthologous genes in well-studied model
organisms. The procedure of finding and treating orthologs employs
method of InParanoid and thus is optimized for eukaryotic genomes,
which often gives strong additional evidence for functional
links. FunCoup uses a range of data sources, from loose associations
like co-expression, to physically interacting proteins and phyletic
profiles: While finding links, FunCoup incorporates available data for
such model organisms as mouse, rat, D. melanogaster, C. elegans, and
yeast. Phyletic profiles are also built with a number of less studied
genomes. A key feature of the Bayesian approach is that each data
source and organism is weighted by its reliability and relevance.
The quality of the predictions has been cross-validated and assessed on test sets. During the testing (assisted with ANOVA techniques) we found that no previously known particular innovation in the field can be trusted without multiple sampling from various genomes and functional classes. The output of FunCoup is the likelihood that two proteins are functionally coupled (compared to the expected background probability for random protein pairs). Found links can be used for focussing work on individual proteins of interest or creating sophisticated gene networks. | |||||||
Wed Oct 26 | 15.15 | Alexander Schliep | Max Planck Inst. for Mol. Genetics, Berlin | ||||
In FA32, Albanova | Analyzing ArrayCGH Data using HMMs with Non-homogenous Markov Chains | ||||||
Comparative genomic hybridization using DNA-microarrays (ArrayCGH) studies can elucidata copy number changes due to diseases or genetic reasons. If BAC clones are used as probes, a positional proximity of probes on the chromosome begins to show a pronounced effect. A natural model is to consider the differential hybridization of probes along the chromosome as a sequence of observation in which the correlation between subsequent positions depends on their distance and overlap. Prior work on analyzing gene expression in presence of chromosomal aberrations focused on more classical statistical approaches, neglecting proximity effects. We present the first approach to model proximity effects explicitely using Hidden Markov Models with an underlying time-inhomogeneous Markov chain. Thus we develop a more realistic model of the events influencing the change of gene expression levels over regions. We will introduce the basic approach, the necessary extensions to the HMM framework and an argument against the use of segmentation approaches. | |||||||
Wed Nov 2 | 15.15 | Abhiman Saraswathi | |||||
Analysis and prediction of functional shifts in protein families | |||||||
Gene duplications are an important phenomenon, whereby genes in an organism could acquire subfunctionalisation or neofunctionalisation. Presently, groups of proteins are clustered in to families based on sequence similarities and have one or more general biochemical functions in common. It is also known that different subgroups within these families have evolved slightly different functions, such as different substrate specificities, activities and mechanisms. It is important to detect such functional differences between members of a protein family for a more accurate annotation of function. Novel measures developed by us for the prediction of functional shifts between protein subfamilies will be presented. These new measures were able to discriminate between subfamily pairs with same enzyme function and subfamily pairs with different enzyme functions. We show that the discrimination is preserved irrespective of the methods used and also improves for larger subfamilies. Moreover, we combine the proposed measures to increase the overall prediction power. FunShift, a database of function shift analysis on protein subfamilies will also be presented. The database can be accessed at http://funshift.cgb.ki.se | |||||||
Wed Nov 9 | 13.00 | Samuel Andersson | |||||
The Motif Yggdrasil sampler: A tree-based Gibbs sampler for detection of transcription factor binding sites. | |||||||
Please note the time change to accomodate Samuel's
teaching.
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, and taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have unrealistically assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. The use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic and biological data. | |||||||
Wed Nov 16 | 15.15 | TBA | |||||
Wed Nov 23 | 15.00 | Håkan Viklund | |||||
FA31 | Transmembrane proteins, from sequence to structure | ||||||
In the popular formulation of the protein folding problem, the goal is to find an algorithm that can predict the three dimensional structure of a protein given its amino acid sequence. And for the last 30 years, this has remained one of the most basic unsolved problems in bioinformatics. This talk will deal with some aspects regarding this problem in the context of transmembrane proteins. Specifically, the fields of topology prediction and topology modeling and their possible role in contributing to solving the complete folding problem will be discussed. | |||||||
Wed Nov 30 | 15.15 | Claes Malmnäs | |||||
FA32 | Models of TF-DNA interactions in S. cerevisiae | ||||||
In order to be able to carry out their function properly,
transcription factors (TFs) must have a high equilibrium binding
probability to their targets. Moreover, the TFs need to find these
targets in a reasonable time. The search involves a combination of 1D
and 3D diffusion. During 1D diffusion, the TF is bound to the DNA -
either specifically or non-specifically - but is able to slide along
it. The need for short search time and high binding probability of
targets imposes constraints on the TF-DNA interactions. We test two
competing models of TF-DNA interactions by using a combination of
experimental data on the number of TFs of different kinds present in
an S. cerevisiae cell and theoretical estimates of binding energies of
TF-DNA interactions.
The work presented has been carried out in collaboration with Erik Aurell and Aymeric Fouquier d'Herouel, KTH and Massimo Vergassola, Pasteur Institute. | |||||||
Mon-Tue | Dec 5-6 | Special Event | |||||
17th Bi-annual Stockholm-Copenhagen Bioinformatics Meeting | |||||||
Place: Geovetenskapliga byggnaden, Frescati. | |||||||
Wed Dec 7 | 15.15 | Mikael Oliveberg | |||||
FA31 | Protein Folding, Misfolding and Neurodegenerative Disease: How proteins maintain their right shapes and what happens when they don't. (Tentative titel) | ||||||
Proteins control our lives down to the smallest detail. Even so, the question of how a protein is formed is one of life' great mysteries. In a split second, the floppy protein chain forms itself into a ball with a unique shape and function. Occasionally, however, they get trapped in a wrong shape and run amok with devastating consequences for the cells. The understanding of these protein folding and misfolding processes are critical for finding rational treatment of many debilitating conditions like Alzheimer' disease, ALS and the prion diseases. Basically, the underlying principle is simple: the tension between fat and water. Just as fat is attracted to fat, and water to water, the proteins are controlled in the cells and join together correctly to form the right shape all by themselves. If one part loosens, it is automatically pulled back. This remarkable ability to self assemble is at least partly orchestrated by amino acids that sit like guards making sure that no wrong knots are made. If you remove them, the proteins distort, stick together by exposure of their greasy interior and kill the cells. Suddenly, the uniting force has been turned against us. But the most fascinating thing about proteins is not that they can go wrong, but that they function at all. What stops chaos from taking over? | |||||||
Wed Dec 14 | 16.00 | Sara Light | |||||
Magnelisalen SU | Pre-dissertation seminar | ||||||