SBC seminars 2010

Wed Feb 316:00 Thomas HelledayStockholm University
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)RNAi screens to determine homologous recombination networks and identify anti-cancer drugs

Previously, we showed that homologous recombination (HR) defective BRCA2 mutated breast cancers are highly sensitive to PARP inhibitors, that inhibit DNA single-strand break (SSB) repair. This show that targeting DNA repair using synthetic lethality is a viable strategy to selectively kill cancers. In preliminary results, we found that prostate cancers cells are defective in SSB repair and rely on HR for survival. Here, we identify proteins involved in HR using RNAi screens and identify inhibitors that may eventually proceed into clinical trials for prostate cancer.

Wed Feb 1016:00 Karin JuleniusSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Intrinsic protein disorder and post-translational modifications

In the recent years, it has been shown that intrinsic disorder is required for the proper function and binding of proteins involved in many key biological processes. A link to post-translational modifications has also been suggested. We have gathered datasets of known positive and negative sites for six different types of post-translational modifications: N-glycosylation, mucin-type O-glycosylation, proteoglycans, C-mannosylation, phosphorylation and N-terminal acetylation. By use of state-of-the-art predictors of protein disorder, secondary structure and surface accessibility we have compared the predicted structures of positive and negative sites for each type of modification. We have also compared sequence conservation of the modified residue as given by the PSI-BLAST profile.

For several of the datasets, there is a positive correlation between modification and i) intrinsic protein disorder, ii) high surface accessibility, iii) loop/coil secondary structure, and iv) sequence conservation. From N-glycosylation, however, no clear correlations can be found to either disorder, surface accessibility or sequence conservation and loop/coil secondary structure has a small but significant negative correlation to N-glycosylation sites. We argue that these differences are due to differences at the stage when the protein is recognized by the modifying enzyme, since N-glycosylation takes place co-translationally before the protein is fully folded.

Mon Feb 1510:00 Andrew FraserDonnelly Centre, Toronto University
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Mapping and predicting genetic interactions in C. elegans development

Most heritable traits, including susceptibility to disease, are affected by the interactions between multiple genes; however, we still understand little about how genes interact to generate phenotypes, and it is certain that we have identified only a small fraction of the phenotypically relevant genetic interactions in any organism. C. elegans is a simple, tractable animal model; if we can understand how genes interact in this animal, this will have key lessons for far more complex animals like humans. I will report two approaches that we are taking to understand genetic interactions in the worm; the first is the systematic experimental mapping of genetic interactions in a relatively unbiased manner; the second is the computational prediction of genetic interactions via integration of diverse large datasets.

In the first approach, we have used high throughput RNAi screening to identify genetic interactions between genes that regulate development of the metazoan C. elegans. We identify ~350 genetic interactions for genes that function in the EGF, Notch, Wnt and other signalling pathways. This is the first global genetic interaction map constructed for any animal and will act as a platform for future mechanistic studies. The genetic interaction network contains several highly connected hub genes; loss of these genes enhances the phenotypic consequences of mutations in components of the majority of examined signalling pathways. The hub genes all encode components of chromatin-modifying complexes, and we find that their activity as genetic buffers appears conserved in other animals. We propose that these chromatin-modifying complexes may function as general buffers of genetic variation and that alterations in their activity may play a significant role in human genetic disease.

In the second, computational approach, we use a modified Bayesian approach to integrate a range of large-scale datasets into a single genetic interaction network. Each dataset identifies functional linkages between individual genes; they include physical interaction maps, co-expression data, and informatic linkages such as gene fusion events. While each dataset is incomplete and noisy, this statistical-based integration yields a network that predicts over 100,000 linkages between genes and covers over 60% of the worm proteome. We show that we can use this network to make accurate predictions of gene interactions including those that are highly tissue-specific. Integrating such noisy, complex datasets to generate accurate predictions of the effects of gene perturbation, as we have here, holds great promise for human biology where predicting the effects of drugs or inherited mutations will be a key problem in the future.

Wed Feb 1716:00 Rossen ApostolovSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Accelerated Molecular Dynamics on GPUs

Molecular Dynamics (MD) simulations of large-scale biomolecular systems demand immense resources of computing power. As an alternative to the typical CPU clusters, the latest generations of Graphics Processing Units (GPUs) have emerged as a viable platform for acceleration of computationally intensive tasks. Optimal utilization of the power of the GPUs for molecular simulations though requires a remake of the classical simulation algorithms to fit in the new stream-computing paradigm, as presented by the GPU architecture.

In this talk I will introduce the streaming architecture of the GPUs; show examples of how existing MD algorithms have been adapted to optimally utilize it; and present the OpenMM library where the algorithms have been implemented. OpenMM is integrated in the next major Gromacs release and provides the core of the popular Folding@Home project.

Wed Mar 1016:00 Marcin SkwarkSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Consensus-based protein structure prediction by means of distance restraints

Predicting 3D protein structures is an important task of contemporary computational biology, due to the high cost and low throughput of experimental methods. It has been empirically shown, that prediction approaches depending on diverse methods and selecting the most suitable models result in greater accuracy than any other approach. One of such methods is, developed locally at SBC/CBR.

The problem behind consensus methods is that they are only able to perform as well as the best of their compound methods. Therefore, in their intrinsic form, their are not able to to devise a model better than any of the ones obtained from the input predictors. Taking into account the vast amount of information contained within the input models, it should be possible to create a method, that would not only select the best model, but rather identify the prevalent structural features and produce a superior model basing on them.

I will talk about one of the attempts of harnessing the structural information contained within the individual protein structure models and its preliminary results. The approach I will talk about is based on inter-atom distance statistics and for some protein sequences it is able to produce models of higher quality than any other method. This, in connection to the method's speed (compound methods' time plus less than 15 minutes) makes it a viable method for real-life structure predictions.

Wed Mar 1716:00 Gunnar von HeijneCBR
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Insertion of membrane proteins into the ER - an update

Despite their impressive structural and functional diversity, most membrane proteins share a common mechanism of membrane insertion. The Sec61 translocon in the ER (and its bacterial homologue, the SecYEG translocon) not only channels secreted proteins across the membrane, but also mediates the co-translational insertion of transmembrane proteins into the lipid bilayer. Even though an X-ray structure of an archaeal Sec61 complex is known, the actual process whereby incipient transmembrane alpha-helices in a nascent polypeptide are recognized by the translocon and shunted into the lipid bilayer remains a mystery.

We have attacked this problem using a 'substrate engineering' approach, in which we challenge the Sec61 translocon with systematically designed potential transmembrane helices built on a very simple Ala-Leu framework. Analysis of a large panel of such designed segments has made it possible to derive a matrix that describes the position-specific contribution of each of the 20 amino acids to the overall free energy of insertion of a transmembrane helix. This experimentally derived matrix is surprisingly similar to the matrix of statistical free energy contributions that one can calculate from the position-specific frequencies of the different amino acids in the known high-resolution membrane protein structures. To a first approximation, the 'molecular code' for transmembrane-helix recognition by the Sec61 translocon appears to be surprisingly simple, suggesting that the translocon allows the nascent polypeptide chain to sample the surrounding bilayer as it passes through the translocation channel.

Wed Apr 716:00 Erik LindahlSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Understanding membrane protein insertion and ion channel function through molecular simulations

While most transmembrane segments in proteins are clearly hydrophobic, there are surprisingly enough a number of exceptions where marginally stable or even hydrophilic segments appear in the hydrophobic region. Many of these are critically important, for instance the S4 segments of voltage-gated ion channels - it is the charged residues inside these protein that causes the channel to open and close in response to voltages, which we need for every nerve impulse and heart beat. There has been significant debate between experimental results that claim insertion for these is quite cheap, and theoretical calculations claiming it is prohibitively expensive.

We use a fairly wide combination methods to study these systems, ranging from bioinformatics through modeling and molecular simulations all the way to in vitro experiments. I will discuss these methods and talk about recent work where we have shown that the hydrophobicity values derived from experimental insertion is amazingly efficient at predicting insertion, how this can be used to understand (and predict) helix-helix interactions in membranes, and how we now likely can explain the molecular step of the insertion. I will also discuss how this related to some of our very recent results on structural changes in ion channel gating, where the charged S4 residues play a crucial role.

Wed Apr 2116:00 Rolf OhlssonMTC, KI
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)The next level of gene regulation

Metastable, reversible gene expression patterns contribute to developmental plasticity, homeostasis and adaptation. Regulatory networks have been successfully captured to describe important principles underlying these features (1, 2). However, while the focus has been on the implementors, i.e. transcription networks, protein interactomes etc, the regulatory machinery at the chromatin/chromosome level is poorly understood. Work during the last few years has indicated that direct physical interactions between chromosomes act as novel regulators of the expressivity of the genome (3). These interactions are not only governed by epigenetic states (4), but also modulate the epigenome by transvecting and stabilizing epigenetic states, such as DNA methylation (5) and replication timing, (6, 7). Here I propose that such networks organize dynamic physical structures consisting of nodes, connectors and outliers to both coordinate and diversify the transcriptome.

  1. A. L. Barabasi, R. Albert, Science 286, 509 (1999).
  2. S. Maslov, K. Sneppen, Science 296, 910 (2002).
  3. A. Gondor, R. Ohlsson, Nature 461, 212 (2009).
  4. Z. Zhao et al., Nat Genet 38, 1341 (2006).
  5. S. Kurukuti et al., Proc Natl Acad Sci U S A 103, 10684 (2006).
  6. A. Gondor, R. Ohlsson, Nat Rev Genet 10, 269 (2009).
  7. K. S. Sandhu et al., Genes Dev 23, 2598 (2009).

Wed Jun 216:00 Kristoffer ForslundSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Domain architecture conservation in orthologs

As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional transfer between species. However, ortholog identification methods do not take changes in domain architecture into account, which are likely to modulate a protein's function.

To assess the level of domain architectural changes among orthologs we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. Using orthology status predicted by InParanoid, we applied a measure of domain architecture similarity designed for this purpose to both orthologs and non-orthologs and contrasted the resulting scores with the fraction of identical aligned residues between the sequences. We also statistically characterized the type and extent of domain swapping events for orthologous as well as for non-orthologous proteins.

The analysis shows that orthologs as well as the closest non-orthologous homologs generally have very similar domain architectures, even at large evolutionary separation, whereas sequence identity decreases much more rapidly. We demonstrate that orthologs exhibit greater conservation of domain architecture than close non-orthologous homologs, even at equivalent separation with regards to sequence identity. We interpret this as an indication of a selective pressure on orthologs, but not paralogs, to specifically retain domain architecture, required for the proteins to perform their conserved function.

Wed Jun 912:15 Gabriel ÖstlundSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Predicting the interactome from proteomics and genomics data

Two of the major tasks of molecular biology is finding both the biochemical function of all proteins as well as finding which proteins interact with each other. Due to the large number of protein pairs that could potentially interact, experimentally determining all interactions is somewhat infeasible. There are several ongoing efforts at reconstructing the human interactome through computational integration of proteomics and genomics data. The reconstructed networks contains predictions as to how likely protein pairs are to interact, however, due to sparseness of input data, the networks are far from complete.

Our work has been towards better understanding protein and gene interactions, both by generating data for reconstructing interaction networks and by analyzing the reconstructed networks to find novel and biologically relevant pathway connections. One way to deal with sparseness of data is to use orthology transfer, we have continued work with InParanoid, deemed one of the top ranked orthology predictors, by adding additional species, improving accuracy and by streamlining the pipeline, in order to better handle the rapidly increasing number of sequenced species. Prediction accuracy of the protein interactions are contingent on proper handling of input data. We are currently working on how mRNA expression corresponds with relative protein amounts, this could be of utmost importance e.g. when using mRNA co-expression as an indicator of protein functional coupling.

Analyzing protein interaction networks in the context of disease can help elucidate disease mechanisms as well as potential markers or drug targets. We have constructed a new generic network-based approach, MaxLink, for predicting novel candidate members to known biomolecular processes and pathways. A typical application is the identification of new disease genes based on a set of known disease genes. Finally, looking at how patterns of expression, coexpression, higher order correlation of expression change between normal and disease states, guided by a protein interaction network, could potentially give insights into disease mechanisms.

Wed Jun 1616:00 Wiktor JurkowskiSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Development of ion channels activity ranking techniques on example of voltage gated potassium channel

Ion channels and transporters are crucial for the generation, propagation and inter-synaptic conduction of nerve impulses and therefore important drug target related to many malfunctions of the central and peripheral neural system. Because ofits specific structural properties standard structure based drug design approach trained on soluble globular enzymes cannot be directly applied. One exemplary problem which has to be addressed is the reliability of the scoring functions in correct ranking of active and inactive binders which consequently is necessary to e.g. elucidate new lead structures.
Multiple virtual screening techniques have been tested in terms of applicability to correctly rank the ligand libraries of known activities. We found that combination of multiple approaches can yield better enrichments and hence be more efficient in ligand classification.
It is clear that there is no one unique mechanism of voltage gated potassium channel blocking. The receptor can be targeted in the central pore or in the voltage sensing domain. I will present various ligand - receptor interacting hypotheses and possible use of them in development of target specific hits.

Fri Sep 1014:00 Dave MessinaSBC
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Biological data exchange and the discovery of new protein families in metagenomic samples

I will be talking about a couple of ways we've approached the issue of effectively gathering and organizing biological information. We developed software for aggregating and viewing protein sequence annotations via DAS, the distributed annotation system ( Also, we created two new standards, one for exchanging orthology predictions and one for protein sequences (, which has been adopted by the Reference Proteomes Project ( The rest of my talk will be about our efforts to predict novel protein families in metagenomics data, in particular addressing the problem of detecting evolutionary signatures in short reads.

Thu Dec 214:00 Christos OuzounisCentre for Research & Technology Hellas
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Case studies in translational bioinformatics

Bioinformatics has developed rapidly and expanded into all areas of biological research. This expansion has created a global arena, transforming a cottage research industry into a multibillion dollar business. We are now faced with the fact that there are thousands of researchers practicing computational biology, driven and supported by the wider genomics industries. The majority of this community is active in the developed world, in larger countries with significant biotechnology sectors. These sectors and the associated technology providers drive the field into what has been called ‘translational bioinformatics’, the application of bioinformatics resources to the solution of ‘real-world ’ problems in health and the environment.

In recent work, we have aimed at the genome-wide application of existing methods to relevant problems of biomedical and environmental research, with two goals in mind: first, to support collaborative projects that address issues of human genetics and genomics, and second, to develop novel approaches that tackle some of these problems using computational tools. We have been involved in the following collaborative projects with experimental collaborators, on which I will report: the comparative analysis of TGF-beta pathway and its emergence in the animal kingdom (evolution and development) [1], the expression analysis of trans-gene forms of iron regulatory proteins IRP1/2 and mutants [2] (molecular genetics with transgenic mice), the simulation of human skin inflammation and immune cell interactions [3] (systems dermatology), the tissue-specific gene network inference of stress response in a sessile animal model species [4] (molecular ecology and health), the genome-wide analysis of gene expression in physiological cardiac hypertrophy [5] (transcriptomics of cardiovascular disease), the ChIP-seq analysis of activation-induced cytidine deaminase (AID)-targeted genes [6] (immunological epigenomics), the discovery of alternative splicing isoforms in AD samples [7] (molecular neuroscience with next-gen sequencing), the expression analysis of hepatocellular carcinoma [8] (cancer transcriptomics), copy number variation and loss of heterozygosity with SNP arrays [9] (oral dysplasia mutations), the analysis of glucosinolate pathways in plants (agricultural biotechnology) [10], and others.

Apart from delving into unknown territories and exploring fascinating problems from a genome-wide perspective, the lessons learnt are as follows: (i) there is a huge scope for expansion of computational biology approaches in the area of translational biomedicine, with novel data types and vast datasets that need bioinformatics support; (ii) the complexity of the problems require a very tight collaboration between computational and experimental research in order to tackle both the technical issues and the domain knowledge; (iii) the development of technology platforms, such as next-gen sequencing, provide bioinformatics with new, significant challenges for data management and (iv) there is an escalating need for sophisticated computational methods that build upon previous experience and register user requirements. All this sounds like the agenda of bioinformatics ten years ago, yet with a far more compelling case for a truly new approach for research in the life sciences, where the boundaries of computation and experiment are no longer distinguishable.


[1] Huminiecki L et al. (2009) Emergence, development and diversification of the TGF-beta signalling pathway within the animal kingdom. BMC Evol. Biol. 9, 28.

[2] Maffettone C et al. (2010) Tumorigenic properties of iron regulatory protein 2 (IRP2) mediated by its specific 73-amino acids insert. PLoS ONE 5(4), e10163

[3]] Valeyev NV et al. (2010) A systems biology model for immune cell interactions unravels the mechanism of inflammation in human skin. PLoS Comput. Biol., in press.

[4] Pantzartzi C et al. (2010) Promoter complexity and tissue- specific expression of stress response components in Mytilus galloprovincialis, a sessile marine invertebrate species. PLoS Comput. Biol. 6(7), e1000847.

[5] Drozdov I et al. (2010) Genome-wide expression patterns in physiological cardiac hypertrophy. BMC Genomics 11, 557.

[6] Tischler V et al. (2010) A Genome-Wide View of Activation-Induced Cytidine Deaminase (AID)-targeted Genes using ChIP-seq Analysis. In preparation.

[7] Blencowe BJ et al. (2010) Manuscript in preparation.

[8] Drozdov I et al. (2010) Gene network modeling relates common progression mechanisms in hepatocellular carcinoma with viral and alcoholic etiologies. In preparation.

[9] Stokes A et al. (2010) Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification. Submitted manuscript.

[10] Psomopoulos FE et al. (2010) Manuscript in preparation.

Kristoffer Forslund
Last modified: Jan 17 2011