SBC seminars 2006

Thu Jan 1215.15 Bill DeGradoU. of Pennsylvania
Wed Jan 1815.15 Siv AnderssonUppsala U
FA32Computational inference of scenarios for alpha-proteobacterial genome evolution
Wed Jan 2515.15 Carsten DaubCGB, KI
FA31Predicting protein-protein interactions from gene co-expression conserved across species
Wed Feb 815.15 Thomas L. CasavantThe UI Center for Bioinfo. and Comp. Biology, U of Iowa
FA32XenoCluster: A Grid Computing Approach to Finding Ancient Evolutionary Genetic Anomalies
This talk describes and evaluates a coarse-grained parallel computational approach to identifying rare evolutionary events often referred to as "horizontal gene transfers". Unlike classical genetic evolution, in which variations in genes accumulate gradually within and among species, horizontal transfer events result in a set of potentially important genes which "jump" directly from the genetic material of one species to another. Such genes, known as xenologs, appear as anomalies when phylogenetic trees are compared for normal and xenologous genes from the same sets of species. However, this has not been previously possible due to a lack of data and computational capacity. With the availability of large numbers of network-connected compute clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, the possibility exists to examine clusters of genes using phylogenetic tree similarity as a distance metric. The full version of this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach. This paper describes the prototype of such a solution and preliminary benchmarking results that show a reduction in total execution time from approximately two years to less than one day. I will also report on several trade-off issues in partitioning the problem across WAN nodes, and LAN/WAN networks of tightly coupled computing clusters.

This talk will conclude with a brief overview of some of Prof. Casavant's other research areas, including some applications of Bioinformatics to Medical Genetics in disease gene mutation finding.

Wed Feb 1515.15 David ArdellLCB, UU
FA32tRNAs: their expression, function, and evolution.
The research of my group is devoted to the computational biology of the basic information-bearing polymerizations in the cell: replication, transcription, and particularly translation. We apply bioinformatic techniques to generate hypotheses about the mechanism and evolution of the cellular actors. A long-term goal is to elucidate co-adaptations, particularly for rate or accuracy efficiency, between the processing machinery and the signals they process. In this talk I will focus on the background and results of recent work from three studies of tRNAs in bacteria. One study (Ardell and Kirsebom 2005, PLoS Comp Biol) analyzes statistically whether tRNA gene expression is a cause or consequence of its genomic location. The results have consequences for theories of growth adaptation in bacteria. Another study (Ardell and Andersson 2006, Nucl. Acids Res.) introduces a profile-based method for tRNA functional classification called TFAM. Results with TFAM will be presented that show evolutionary diversification of the tRNA "identity code" in bacteria. Finally, I will present our recent extensions to sequence logos called function logos and inverse logos (Freyhult et al 2006, Nucl. Acids Res.), and describe their application to the visualization of this tRNA identity code in a restricted set of bacteria.
Wed Feb 2215.15 Wyeth Wasserman U. Brit. Columbia
Bioinformatics of gene regulation
Fri Feb 24 10.00 Tomas OhlsonSBC
FR4Thesis defense: The use of evolutionary information in protein alignments and homology identification
For the vast majority of proteins no experimental information about the three-dimensional structure is known, but only its sequence. Therefore, the easiest way to obtain some understanding of the structure and function of these proteins is by relating them to well studied proteins. This can be done by searching for homologous proteins. It is easy to identify a homologous sequence if the sequence identity is above 30%. However, if the sequence identity drops below 30% then more sophisticated methods have to be used. These methods often use evolutionary information about the sequences, which makes it possible to identify homologous sequences with a low sequence identity. In order to build a three-dimensional model from the sequence based on a protein structure the two sequences have to be aligned. Here the aligned residues serve as a first approximation of the structure. This thesis focuses on the development of fold recognition and alignment methods based on evolutionary information. The use of evolutionary information for both query and target proteins was shown to improve both recognition and alignments. In a benchmark of profile-profile methods it was shown that the probabilistic methods were best, although the difference between several of the methods was quite small once optimal gap-penalties were used. An artificial neural network based alignment method ProfNet was shown to be at least as good as the best profile-profile method, and by adding information from a self-organising map and predicted secondary structure we were able to further improve ProfNet.
Fri Feb 24 13.00 David Wishart
FA31 Integration of chemoinformatics and bioinformatics.
Among professor Wisharts interests, we find:
  • The Human Metabolome Project
  • The Human Metabolome Database
  • Protein-Based Drug Targeting and Medical Diagnostics: the development of new methods for the targeted delivery of drugs or diagnostic agents using peptide or protein-based vehicles;
  • NMR Technology Development: the development novel applications in NMR spectroscopy to facilitate structure-aided drug design;
  • Pharmaceutical Bioinformatics: the development of innovative bioinformatics and modeling software for improved protein/peptide analysis.
Fri Feb 24 14.15 Robert M MacCallumImperial College, UK, and SBC emeritus
FA31 Automatic discovery of cross-family sequence features associated with protein function
Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed.
We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location.
We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription.
Wed Mar 115.15 Andreas BernselSBC
Remote homology detection of integral membrane proteins using hydrophobicity and topology predictions
Functional annotation lags behind in the rapidly increasing amount of sequence data resulting from the numerous ongoing genome sequencing projects. In general, the easiest way to obtain functional information is to transfer this knowledge from well characterized homologous proteins. Accordingly, development of existing algorithms for homology detection is of great importance for better functional characterization of new protein sequences. Integral membrane proteins constitute about 25% of most proteomes and are present in all different kinds of membranes in all cells. As a result of the hydrophobic environment within the membrane, transmembrane proteins are subject to particular structural constraints and, as a consequence, have different amino acid composition and residue exchangeabilities compared to globular proteins. Since algorithms for homology detection are often developed with globular proteins in mind, they may not be optimal to detect distant homology relationships between membrane proteins. Here, we have developed a new method for remote homology detection of transmembrane proteins, that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far for any similarity to be recognized, the hydrophobicity pattern and the topology could still be more conserved. By using this information in parallel with sequence, we show that both sensitivity and specificity can be substantially improved for remote homology detection in a set of 318 G-protein coupled receptors from six subfamilies. Applying the method to the Pfam domain database, we are able to discriminate more accurately between clans, and suggest new putative homology relationships for a few relatively uncharacterized protein domains
Mon April 313.30 Sean Hooper EMBL Heidelberg
FA32Networks: graph theory in biology
- Protein prediction networks using the STRING database
- Domain co-occurrence networks using the SMART database
(- Networks of transcription during Drosophila embryogenesis)
Tor April 615.15 Olof Karlberg CBS Copenhagen
FA32Guilt by association - using networks to catch diseasegenes
Mon April 1010.00 Shoshana Wodak The Hospital for Sick Kids, Toronto Canada
SBCProtein-protein interactions: the challenge of predicting specificity.
Mon April 1011.00 Chen Keaser Ben Gurion University, Israel
SBCCooperative energy terms for molecular modeling: application to homology modeling and functional annotation.
  Energy functions play essential role in molecular modeling. Within current molecular modeling packages, pairwise energy terms enjoy clear dominance. They are easy to program and, most importantly, computationally efficient. In this talk I will present the motivation behind the use of cooperative energy terms and how their limitations can be circumvented. The usability of these terms will be demonstrated by new results in homology modeling and structure based identification of functional residues.
Wed April 1915.15 Anders Lansner CSC, KTH
Computational Neuroscience, goals, methodology and challenges
I will describe the research field we at CBN are mostly involved in - computational neuroscience. I will describe goals, modeling strategies and methodology, and current challenges and give some examples of results from recent work.
Wed May 1015.15 Juni Palmgren MEB, KI
FA32Twin models for complex traits
I will present a frailty model to estimate the relative importance of genetic and environmental factors on age at onset of dementia in a twin design. Modern survival methodology is used to define a model that accounts simultaneously for longitudinal aspects, e.g., left truncation and right censoring in data, and the multivariate nature of twin data. Inference is bases on a hierarchical Bayesian formulation and Gibbs sampling.
Wed May 1715.15 Gustavo Camps-Valls Universitat de València.
FA32Kernel methods in Bioinformatics
Kernel methods is an emerging area of machine learning that have lately provided state-of-the-art results. The talk is split in three parts. First, a theoretical review of kernels in general and the most common algorithms for classification and function approximation in particular. Second, a survey on how to include prior knowledge on the kernel machine in general, and through several examples in bioinformatics. Finally, I'll provide you with some comprehensive demos, along with useful source code and web pointers.


1. Introduction to kernel methods.
1.1. Basic concepts: Learning and similarities.
1.2. Intuitive concept of kernels.
1.3. Kernel trick, Mercer's kernels, and the Representer Theorem. 
1.4. Linear methods again through positive definite kernels. 
1.5. Properties of kernels.

2. Support Vector Classifier (SVM).
2.1. The optimal separating hyperplane.
2.2. Maximum margin classifiers
2.3. Optimization.
2.4. Non-separable problems.
2.5. Hands on training

3. Support Vector Regression (SVR).
3.1. Formulation, the tube, and the kernel trick.
3.2. Tuning the SVR with profiled insensitivity.
3.3. Non-linear regression.
3.4. Hands on training.

4. Designing the kernel.
4.1. Incorporating prior knowledge in the kernel.
4.2. Kernel engineering.
4.3. The profile-dependent SVM.
4.4. Kernelizing a linear method.

5. Examples on Bioinformatics.
5.1. Kernels for protein sequences.
5.2. Kernels in AO and siRNA efficacy prediction.

6. Further information.
6.1. Demos.
6.2. Source Code.
6.3. Useful links.

7. Conclusions and trends.
Fri May 19 9.00 Sara LightSBC
FR4Thesis defense: Investigations into the evolution of biological networks
Wed May 3115.00 Tom Britton Stockholms Universitet
Cramérrummet, rum 306, hus 6 i KräftriketStatistik, DNA och växtarters släktträd
En av naturvetenskapens största landvinningar de senaste decennierna är tekniken att avläsa organismers DNA-kod. Inom den biologin används denna teknik bl.a. till att lära sig mer om hur olika arter har utvecklats under evolutionens lopp och för att bestämma deras släktrelationer. Denna mycket aktiva forskningsverksamhet involverar såväl biologer som matematiker/statistiker/dataloger. På föreläsningen kommer jag ge en kort biologisk bakgrund, översiktligt beskriva hur man matematiskt modellerar evolution/mutation, och slutligen hur man utifrån observerade data, dvs sekvenserade arter, drar slutsatser om troliga släktrelationer mellan desamma.
Mon June 1210.00 Volker Hollich CGB/MBB
CMB auditorium, Berzelius väg 21, KI Campus SolnaThesis defense: Orthology and Protein Domain Architecture Evolution
A major factor behind protein evolution is the ability of proteins to evolve new domain architectures that encode new functions. Protein domains are widely considered to constitute the "atoms" of protein chains, acting as building blocks of proteins as well as evolutionary units. A small number of domains are found in many different domain combinations, while the majority of domains co-occur with very few types of other domains.

Domain architectures are not necessarily created once only during evolution. Cases of convergent evolution show how a favourable domain architecture has evolved multiple times independently. A basic concept for understanding evolution on gene level is orthology.

Two genes are orthologous if they have evolved from the same gene in the last common ancestor of the species and have thus been created by a speciation event. Paralogous genes result from a duplication event that produced two gene copies within the same species.The concept of orthology can be transferred from genes to protein domains and utilised to explain recombination of protein domains and the evolution of domain architectures.

The focus of this work is to augment the understanding of domain architecture evolution and its functional implications. We have examined, evaluated and improved existing methods as well as developed new approaches. The concept of orthology plays a major role in this work. Orthology is often inferred from phylogenetic trees that are based on pairwise distance estimations of protein sequences. The Scoredist protein sequence dis- tance estimator has been developed as one part of this thesis. It combines robustness with low computational complexity and can be calibrated towards various evolutionary models.

Accurate phylogenetic trees are crucial for many applications, hence the appropriate tree reconstruction algorithm should be chosen with care. The strengths and weaknesses of many current tree reconstruction algorithms were assessed, and findings underscore the value of the Scoredist estimator. The Pfam protein families database comprises a large number of protein families and domains. As part of this thesis it has been enhanced by search and query tools, such as PfamAlyzer or the browser-based domain query, that can be applied on whole domain architectures instead of individual domains only.We have developed a Maximum Parsimony algorithm for the prediction of ancestral domain architectures. In contrast to previous approaches, it employs gene trees rather than species trees. The algorithm was a starting point for an extensive study of the do- main architectures present in Pfam for 50 completely sequenced species. Sampling widely across the kingdoms of life, the study sought to find and analyse cases where a domain architecture had been created multiple times. The algorithm proved robust to potential biases from horizontal gene transfer. Convergent evolution of domain architectures was found more frequently than by previous approaches. No strong biases driving convergent evolution were found. It therefore seems to be a random process in much the same way evolution through duplication and recombination, yet less frequent.

Tue June 13 11.00 Erich Bornberg-BauerUni Muenster, Germany
SBC seminar roomThe modular evolution of proteins
The prediction of function for a given protein sequence by comparative and evolutionary analysis is a well established principle. Due to the low conservation of many sequence regions and the modular rearrangements of proteins it is often useful to break down the problem and view proteins as strings of functional units, such as domains (when structurally defined) or motifs, (their signature at the sequence level) which are represented as regular expressions or profiles in many public databases.

These domains (and motifs) are more conserved during evolution and constitute the basic elements of modular evolution within and in between organisms. Most proteins consist of two or more domains, giving rise to a variety of combinations of domain arrangements. Several events can be more easily traced by viewing proteins as strings of domains rather than as strings of amino-acids. For example, circular permutations represent non-linear arrangements of domains (e.g., the string of domains ABCD and CDAB are circularly permutated with respect to each other) such that the sequences they can not be detected using the standard sequence alignment procedures which depend on the recursive scheme but the strings of domains can be easily analysed bacause of the smaller search space. We have analysed the modular evolution of proteins and found that the major events creating novel arrangements are fusion and loss of domains at the ends and, to a lesser extend fission. Recombination seems to play only a minor role.

Mon Aug 28 15:00 Martin RosvallUmeå University and NORDITA
Seminarierummet, Hus 13 (NORDITA), AlbaNovaNetworks and our limited information horizon
(Abstract N/A)
Wed Sep 613:00 Pär BjelkmarSBC/CBR
Seminar room, House 13 (NORDITA), AlbaNovaMolecular dynamics simulations; comparisons of octameric and tetrameric channel models of the peptaibol antiamoebin
Two different models of the channel bundle of the peptaibol antiamoebin were studied using molecular dynamics simulations in an explicitly modeled\ environment consisting of a POPC bilayer surrounded by water. Both the octameric and tetrameric bundles were found to be stable over tenths of nanoseconds\ with characteristic interhelical hydrogen bonding patterns stabilizing the channels. The dimensions of the channels have also been assessed and compared t\ o each other and the modern potassium channels.
Wed Sep 1313:00 Karin JuleniusSBC/KI
Seminar room, House 13 (NORDITA), AlbaNovaPrediction of glycosylation sites in proteins
I will give an overview of the field and the methods. Me and collegues from Center for Biological Sequence Analysis, Danish Technical University have written a book chapter with this title for "Bioinformatics for Glycobiology and Glycomics", soon to be published by Wiley (I'm the main author). I will also talk about my latest project, which is to learn more about proteoglycan sites in C.elegans and develop a predictor for this type of modification. A preliminary predictor has already been developed and we plan to improve the predictor by directed experiments. I will compare with what is known about the site requirements in mammals. This is a collaboration with Dr. Fred Hagen, University of Rochester School of Medicine and Dentistry.
Wed Sep 2013:00 Axel BrandenburgNORDITA
Seminar room, House 13 (NORDITA), AlbaNovaMechanisms for achieving homochirality
Many biomolecules are not mirror-symmetric. Amino acids in living matter are almost always homochiral and occur only in the L-form. On the other hand, amino acids synthesized in the laboratory are a mixture of L- and D-forms. Dead matter gradually loses its preferred handedness. Thus, a preferred handedness of biomolecules is intimately related to the existence and emergence of life. In my talk I will discuss and analyse two quite different chemical reaction mechanisms where, for certain parameters, a symmetric mixture becomes unstable and a homochiral state of either chirality emerges. One involves nucleotides and the other one peptides. The former one may have occured at the beginning of an RNA world while the latter one may have occurred in a peptide world. The latter one is chemically more realistic, but it lacks information-carrying capabilities, making a peptide world therefore more problematic.
Wed Sep 2713:00 Örjan ÅkerborgSBC
Seminar room, House 13 (NORDITA), AlbaNovaPhylogenetic Inference using Maximum Likelihood and Birth-Death
After the introduction of Bayesian MCMC methods into phylogenetic inference a decade ago, this has been a field of active research in which not o\ nly the phylogeny itself, but also side issues such as molecular clock and other rate hypotheses, ancestral state inference, and the roots positioning has \ been addressed. In particular, a number of alternatives to the wildly debated molecular clock hypothesis have been suggested. Among these are models with m\ olecular clocks still operating locally and also a range of methods for which the substitution rates varies over lineages in an autocorrelated fashion, i.e\ . where the rate distribution for a particular branch is dependent on the rate value of the parent branch.

In the group has previously been developed MCMC algorithms for integrated modeling of genome evolution where the rates are instead drawn independently and \ identically from an underlying distribution of choice. In the presented work we have made hill-climbing maximum likelihood (ML) variants of the above algor\ ithms and I will show that the ML approach is enabling accurate simultaneous inference of rates and times on large trees.

Wed Oct 413:00 Johannes Frey SköttSBC/CBR
Seminar room, House 13 (NORDITA), AlbaNovaLarge-scale studies of the impact of alternative splicing on the proteome
(Abstract N/A)
Wed Oct 1113:00 Samuel AnderssonSBC/KTH
Seminar room, House 13 (NORDITA), AlbaNovaFinding transcription factor binding sites using EM and the Motif Yggdrasil model
Efficient methods for finding regulatory elements are important tools for understanding gene regulation. Phylogenetic footprinting is a class of methods that uses orthologous genes in the search for transcription factor binding sites (TFBS). The inclusion of phylogenetic information has been proved to increase the specificity and sensitivity of such searches. We describe a novel phylogenetic footprinting algorithm using Expectation Maximization. The underlying probabilistic model is the Motif Yggdrasil model, which takes a phylogenetic tree of arbitrary topology into account and does not require alignable sequences, evolutionary rates nor branch lengths.
Wed Oct 1813:00 Henrik NielsenCBS, Technical University of Denmark
Seminar room, AlbaNova (Roslagstullsbacken 35)An overabundance of phase 0 introns immediately after the start codon in eukaryotic genes
Background: A knowledge of the positions of introns in eukaryotic genes is important for understanding the evolution of introns. Despite this, there has been relatively little focus on the distribution of intron positions in genes.

Results: In proteins with signal peptides, there is an overabundance of phase 1 introns around the region of the signal peptide cleavage site. This has been described before. But in proteins without signal peptides, a novel phenomenon is observed: There is a sharp peak of phase 0 intron positions immediately following the start codon, i.e. between codons 1 and 2. This effect is seen in a wide range of eukaryotes: Vertebrates, arthropods, fungi, and flowering plants. Proteins carrying this start codon intron are found to comprise a special class of relatively short, lysine-rich and conserved proteins with an overrepresentation of ribosomal proteins. In addition, there is a peak of phase 0 introns at position 5 in Drosophila genes with signal peptides, predominantly representing cuticle proteins.

Conclusion: There is an overabundance of phase 0 introns immediately after the start codon in eukaryotic genes, which has been described before only for human ribosomal proteins. We give a detailed description of these start codon introns and the proteins that contain them.

Wed Oct 2513:00 Bengt SennbladSBC
Seminar room, AlbaNova (Roslagstullsbacken 35)Modeling substitution rates
I will present results from a comparative study of different probabilistic models for substitution rate variation over a phylogenetic tree. The rate models, combined with models on divergence times, provides means to formulate meaningful priors on edge lengths in phylogenetic trees. Applications include estimation of divergence times and of substitution rates, and simultaneous modeling of different evolutionary processes.
Wed Nov 113:15 Andrey AlexeyenkoSBC
Seminar room, AlbaNova (Roslagstullsbacken 35)FunCoup: networks of functional coupling in eukaryotes

FunCoup ( is a statistical framework of data integration for finding functional coupling (FC) between proteins. It is capable of transferring information from model organisms via orthologs found by InParanoid program. Data of different sources and various natures (contacts of whole proteins and individual domains, mRNA co-expression, protein co-occurrence in tissues and cellular compartments, similar phylogenetic profiles etc.) are collected and probabilistically evaluated in a Bayesian network (BN), trained on setsof known FC cases vs. sets of randomly picked protein pairs as background reference. To address known drawbacks of Bayesian estimators, FunCoup was optimized in several aspects and, compared to previous framework configurations of this sort, the net gain in performance is tens of percentage points in either sensitivity or specificity. The number of simultaneously used model organisms (5-8) and individual datasets (50-60) has been estimated as maximal for practical purposes. It means that no further significant gain is expected given the current state of high-throughput data and the standard Bayesian framework.

We are, however, developing a post-processor where evidence that supports individual links is cross-validated by agreement/interaction between data types, model organisms, and network context that two proteins share. Combinations that efficiently augment the FC discovery rate are accepted as terms in a regression model. The model is then used to correct the scores found by the FunCoup BN. The predicted functional links thus become even more confident.

FunCoup is a self-consistent framework that easily incorporates nearly any kind of data (continuous values of any distribution shape, binary data, character labels etc.) from any data source without human curation. It has thus been possible to generate networks for several organisms in respect of different types of functional coupling. A network for Ciona intestinalis, which had neither training sets nor sources of its own data, was created as well.

Since the last year SBC seminar talk, the networks for human, mouse, rat, worm, fly, Ciona, Arabidopsis, and yeast have been made available on the FunCoup website, which has also acquired a spectrum of visual (due to Medusa network applet - Hooper and Bork, 2005) and download functionalities. Each link in FunCoup contains information about underlying evidences and a confidence value.

Wed Nov 813:15 Kristoffer IllergårdSBC/CBR
Seminar room, AlbaNova (Roslagstullsbacken 35)Protein structure evolution
I will talk about my results from my first one and a half years as a PhD-student at SBC. My first project was to investigate the difference between sequence alignments and structural alignments when comparing proteins. During the work we found results that we wanted to explain by an evolutionary model, so far without any interesting results. Another project is a large scale study of insertions and deletions within proteins with known structure. Our results confirm the expectation that most indels are short and occur in loops and surface accessible regions. My future work will be to try to get a larger dataset for recently occurred long indels. A third recently started project which I will mention is a study of how alternative splicing affects the protein structure.
Wed Nov 1513:15 John HertzNORDITA
Seminar room, AlbaNova (Roslagstullsbacken 35)Neural Firing Correlations in the Neocortex
In trying to understand the dynamics of the densely-connected neural networks that comprise the neocortex, a suitable first step is to study the firing statistics of neurons in a simple model of an isolated cortical column: a random network of a few thousand neurons, with any two of them connected with a probability of about 10%. These neurons fire irregularly, at low rates, and these properties have been thought to be understood, for about a decade now, in terms of a self-consistent balance of excitation and inhibition. However, the existing theory requires correlations between neurons to be very small, in apparent disagreement with the experimentally observed degree of correlation. I will present some results of simulations of such models and some new theoretical ideas for a self-consistent treatment of correlations.
Wed Nov 2213:15 Åsa BjörklundSBC/CBR
Seminar room, AlbaNova (Roslagstullsbacken 35)Bias in protein-protein interaction assays

In the last ten years, knowledge about the function of many proteins has been gathered through high-throughput studies of protein-protein interactions (PPI). As a result, proteins of virtually unknown functions found in complexes with a number of proteins of known, and related, function may be tentatively associated with the functionality of its interacting partners. In the beginning of this year two independent studies of genome-wide screens for the complexes of the S. cerevisiae proteome using tandem affinity purifications (TAP) were published. Although the two TAP methods are very similar, the resulting complexes are clearly different. Hence, inferring function from them may render varying results depending on the dataset used.

We have sought to investigate the differences between the various datasets for S. cerevisiae PPI and assess their reliability compared to literature curated datasets. Further, we aim to understand if distinct methods may detect different types of interactions and if such biases can be contributed to specific protein properties. It has been shown that proteins that interacts with many partners (hubs) often are highly conserved and essential. In addition, proteins that interact are often co-expressed. Other properties of hub proteins have been suggested, such as high content of disordered structure and high content of repeating domains. We find that these properties seem too be especially important for interactions that are detected with yeast two hybrid screens. Further, we show that abundant proteins are common in networks derived from TAP studies.

Wed Nov 29 --- CANCELLED- no seminar
Wed Dec 615:15 Johan RockbergKTH (Uhlén group)
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)High Resolution Epitope Mapping Using Bacterial Display

Knowledge and identification of protein-protein interactions and exact antibody epitopes is of great importance for many antibody applications, in particular antibodies aimed at clinical usage.
Here we describe a novel high throughput method for identification of antibody binding epitopes. Currently we produce and screen target libraries of approximately 10^4- 10^5 peptides per antibody target, with an epitope mapping resolution down to 7aa.
The technique involves fragmentation of target DNA and subsequent blunt-end ligation into a gram-positive expression vector. The resulting peptide library is electroporated into Staphylococcus carnosus bacteria and transported for epitope presentation as a membrane anchored fusion protein on the cell surface. Cells are incubated with corresponding antibody and analysed using Fluorescence Activated Cell Sorting (FACS). Both positive cells, presenting epitopes recognised by the antibodies, as well as negative, non-epitopes, are sorted out separately in a 96 well format. These epitope and nonepitope cells are grown and sequenced using Pyrosequencing. Sequence data is interpreted and epitope binding sites mapped back to the target protein sequence using BLASTP and BioPerl.
Our current results include mapping of monoclonal, polyclonal and monospecific antibodies targeting HER2. Our ongoing study includes a large scale monitoring of antibody immune response in rabbits, focusing on identification and prediction of antigenic motifs, using HPR antibodies ( and

[Johan won the Best Poster Award at the 7th Swedish Bioinformatics Workshop (SBW2006)]
Wed Dec 1315:15 Costas PapaloukasCBR and University of Ioannina
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Machine Learning from Biological Data
Machine learning can be realised using a number of methods but the selection of the most appropriate one highly depends on the data under study. Here five machine learning approaches will be presented that were developed for four different biological problems. The first two systems deal with the common problem of protein classification, where a newly discovered protein is assigned to a predetermined class (or fold) of known proteins. For this, a reduced state-space (RSS) HMM and a sequential pattern mining (SPM) methodology were tried. The main purpose of the RSS HMM was to design a simple model but with a comparable performance against other similar but highly complex models. The SPM methodology, on the other hand, uses data mining theories to discover discriminant patterns (or motifs) from protein families. Subsequently, protein classification can be achieved using these patterns and a straightforward scoring function. The other three approaches are based on artificial neural networks (ANNs). The first ANN realizes function approximation and tries to predict the Z-coordinate of each residue of a transmembrane protein, that is its distance from the center of the membrane. Such a system can be used in numerous applications, which will also be discussed. The second ANN, specifically a probabilistic neural network, uses as input microarray data and classifies the samples accordingly. The network was tested in the problem of leukaemia classification. Finally, the third ANN is able to identify the DNA bases using data generated from a carbon nanotube system that interacts with single nucleosides. The ANN achieved perfect discrimination among the four bases, indicating the first step of an overall system that combining machine learning and nanotechnology will eventually lead to ultrafast DNA sequencing.
Wed Dec 2015:15 Erik FransénKTH (Lansner group)
Seminar room RB35 (Roslagstullsbacken 35, the SBC house)Ionic mechanisms in working memory
In learning and memory it has recently become clear that in addition to synaptic changes there are cellular changes in excitability due to changes in ion channels. Short-term/working memory systems, supporting storage over seconds to minutes, are crucial components in memory as well as other cognitive processes. We are studying how specific ion channels affect the properties of a neuron, and how these properties in turn may participate in working memory function. The project focuses on cationic (TRP) ion channels which are known from in vitro studies to produce cellular long-lasting depolarizing plateau potentials. Further, these currents are activated by group I metabotropic glutamate receptors as well as muscarinic type 1 receptors, and blocking of these receptors have been shown to produce behavioral deficits in long-term and working memory experiments. The talk will review how we combine electrophysiological, pharmacological and biophysical modeling techniques.

Olof Emanuelsson
Last modified: Dec 21 2006