SBC seminars 2005

2004

Date	Time	Name	Affiliation	Title
Wed 25 Feb	15.00	Håkan Viklund	SBC
Wed 3 Mar	15.00	Lars Arvestad	SBC
Wed 10 Mar	15.00	Erik Aurell	KTH Physics
Wed 17 Mar	15.00	Olivia Eriksson	SBC
Wed 24 Mar	15.00	Gunnar von Heijne	SBC
Wed 31 Mar	15.00	Bob MacCallum	SBC	Predicting the Nuclear Proteome
Wed 7 Apr	15.00	Björn Ursing	Center for Genomics and Bioinformatics, KI	EXProt/FetchProt - Finding proteins with an experimentally verified function
Wed 21 Apr	15.00	Sara Light	SBC
Wed 28 Apr	15.00	Markus Brameier	SBC
Wed 12 May	10.00	Alexander Schliep	Max Planck Institute for Molecular Genetics, Berlin	Gene expression over time: identification of groups
Wed 19 May	15.00	Nick Braun (NOTE! change of speaker!)	SBC
Wed 26 May	15.00	Björn Wallner	SBC
Wed 9 Jun	15.00	Maria Werner (NOTE! Change of speaker)	SBC	Numerical solution to the master equation using the linear noise approximation
Wed 16 Jun	15.00	Samuel Andersson	SBC
Wed 23 Jun	15.00	Erik Granseth	SBC
Wed Sept 8	15.15	Erik Lindahl	SBC	Protein Folding: Probing Multi-Microsecond Timescales with Distributed Computing
Wed Sept 15	15.15	Karin Julenius	SBC and KI	Prediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites.
Fri Sept 17	14.00	Olivia Eriksson	SBC	Halftime seminar
Wed 22 Sept	15.15	Anthony Poole	Mol. Biol. & Funct. Genomics, SU	The Nature of the Last Universal Common Ancestor
Wed 29 Sept	15.15	Peter Kundrotas	CSB, Novum KI	Relations between some properties of unfolded proteins and their sequences.
Wed Oct 6	15.15	Jens Lagergren	SBC	Probabilistic analysis of gene families from multiple species
	Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves "inside" a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch's original definition, and more generally for reconciling pairs of gene and species trees w.r.t duplications and losses. We develop a Bayesian analysis based on MCMC which facilitates approximation of the posterior distribution for reconciliations. This also gives a way to estimate the probability that a pair of genes are orthologs. The algorithm performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography. When also lateral transfers are considered reconciliation is much harder. We give a combinatorial model and parsimony algorithms for gene duplications and lateral gene transfers. These algorithms detects lateral gene transfers with very low error rates.
Wed Oct 13	15.15	Ann-Charlotte Berglund	SBC	Myostatin sequence evolution in ruminants
	Myostatin (GDF-8) is a negative regulator of skeletal muscle development. This gene has previously been implicated in the double muscling phenotype in mice and cattle. Analysis of nonsynonymous to synonymous nucleotide substitution rate ratios (Ka/Ks) indicates that positive selection may have been operating on this gene during the time of divergence of Bovinae and Antilopinae, starting from approximately 23 million years ago, a period that appears to account for most of the sequence difference between myostatin in this two groups. Sites evolving under positive selection pressures were found both in the propeptide region and the C-terminal region of the gene.

Date	Time	Name	Affiliation
Wed Oct 20	15.15	Arne Elofsson	SBC
	Gene Function predictions a personal overview and future perspective.
	The prediction of the function of a gene is one of the fundamental goals of bioinformatics today. Besides expensive experimental approaches there are three fundamentally different methods to obtain functional information: (i) function is often inferred by the detection of homology to a functionally classified protein, (ii) Some types of functions can be assessed by the predictions of local features of proteins, (iii) recently a number of network centered methods to predict functions has been developed. Here, I will present some of our methods used for all three types of gene functional classifications. This research will focus on three categories of method development (Homology Detection, Structure Predictions and Network analysis) and analysis of three biological problems (Membrane proteins, Comparative Genomics and Proteome Evolution).
Wed Oct 27	13.00	Erik Aurell	Theo. Bio. Phys.
	Simple and not so simple heuristic search on 3-SAT
	Random 3-SAT is the problem to determine if a set of M propositions in N Boolean variables, all of which of the type "X OR Y OR Z" can simultaneously be satisfied. While the problem is hard in worst case, it is easy for most instances unless the ration M/N is close to 4.27. These statements hold as N and M tend to infinity, their ratio fixed, and proper definitions of hard and easy. Close to 4.27 has meant to within about 10%. 3SAT in the hard region is a paradigmatic combinatorial optimization problem. I will present a study of a well-known heuristic search called walksat (Selman, Kautz & Cohen) that in fact has linear in N behaviour in median computation time up to M/N=4.14. (Numerical) concentration of the measure results will be presented in further support of this statement. These results on walksat are appearently new. I will then further compare walksat to a somewhat complex algorithm called "survey propagation", which I will deduce as a variant of Belief Propagation in a fairly odd system of beliefs. The original derivation was framed in a 1-step replica symmetry breaking scenario in an equivalent diluted spin glass (Mezard, Parisi & Zecchina), and will not presented here. Survey propagation in polynomial in N in median computation time up to above 4.20, This is joint work with Scott Kirkpatrick, to be presented at NIPS 2004.
Wed Nov 10	15.15	Ingemar Ernberg	MTC, KI
	Tumour Biology today and KICancer
	Due to the advancement of molecular cell biology we basically understand what makes cancer today. It is a disease of cells and genes. And it is complex. High through put technologies provide large amounts of data which requires computational analysis. But the means to advance knowledge in cell biology with computer simulations are still very limited. An example of a tractable problem of a viral genetic switch with distant resemblance to the lambda switch, but which induces cancer development, will be presented. Finally, a brief summary of the new network for all cancer researchers within KI , KICancer, will be presented.
Wed Nov 17	15.15	Johannes Frey-SkÃ¶tt	SBC
	In silico analysis of the effect on the proteome of alternative splicing
	Alternative splicing is the phenomenon that explains the huge difference between the number of proteins a particular proteome and the number of genes in the corresponding genome. The process occurs post-transcriptionally and alters the pre-mRNA by excising the exons and combining them into mature mRNA. When the exons are spliced together they may be arranged in different order or some may be excluded, thus, giving rise to different forms of mRNA. We study what effects alternative splicing might have on the proteome, both in which patterns the exons are combined and what features the sequences that are added/exchanged have.
Wed Nov 24	15.15	Erik Sandelin	SBC
	Extracting multiple alignments from pairwise alignments: A combinatorial optimization problem
	Multiple Structural Alignments (MSTAs) provide position-specific information on the sequence and structural variability allowed by protein 'folds'. This information can be exploited to better understand the evolution of proteins and the physical chemistry of polypeptide folding. Most MSTA methods relies on a pre-computed library of pairwise alignments. This library will in general contain conflicting residue equivalences which not all can be realized in the final MSTA. Hence to build a consistent MSTA these methods have to select a conflict-free subset of equivalences. Using a dataset with 327 families from SCOP 1.63 we compare the ability of two different methods to select an optimal conflict-free subset of equivalences. One is an implementation of Reinert et al.'s integer linear programming formulation (ILP) of the maximum weight trace problem (Reinert et al., 1997). This ILP formulation is a rigorous approach but its complexity is difficult to predict. The other method is T-Coffee (Notredame et al., 2000) which uses a heuristic enhancement of the equivalence weights which allow it to use the speed and simplicity of the progressive alignment approach while still incorporating information of all alignments in each step of building the MSTA. We find that although the ILP formulation consistently selects a more optimal set of conflict-free equivalences, the differences are small and the quality of the resulting MSTAs are essentially the same for both methods.
Wed Dec 1	15.15	David Fredman	CGB, KI
	Copy Number Polymorphism in the Human Genome
	Copy number variation in phenotypically normal human genomes is a recently described major new form of polymorphism. It is becoming evident that perhaps as much as 5-10% of the human genome is assembled of long segments (kb to Mb in length) that vary in copy number between individuals. These regions contain genes, common repeats, SNPs, and other common features. Duplicated genes may alter gene expression, and are also subject to induced structural rearrangements with potential functional consequences. In a recently published study, we extended this revelation to show that reported SNPs in such domains were often not SNPs at all, but false interpretations of multi-site variability (MSV), reflecting the underlying copy-number differences. Most worryingly of all, MSVs masquerade as SNPs when genotyped with most common genotyping methods in use today. These issues must now be properly addressed in disease association studies and haplotype map construction, in order to avoid missing true signals or drawing invalid conclusions. References: Fredman, D. et al. Complex SNP-related sequence variation in segmental genome duplications. Nature Genetics (2004) Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nature Genetics (2004). Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science (2004)
Wed Dec 8	15.15	Gunnar O Klein	Dept of Medicine Karolinska Institutet Chairman of the global eHealth Standardization Co-ordination Group
	Medical (clinical) informatics and Bioinformatics — Possible synergy

Wed Dec 15	15.15	Thomas Helleday	GMT, SU
	Recombination Repair and a new treatment for BRCA2 tumours?
	A hallmark in cancer treatment is killing of growing cells. The most successful anti-cancer drugs cause DNA damage, which are converted into toxic lesions at replication forks. Although this method to treat cancer is highly successful, we still know very little of the lesions formed at the damaged replication forks or how these lesions are repaired. Here, novel data will be presented on the role of recombination in repair of different replication lesions and the signalling pathways activating this repair pathway. Furthermore, data suggesting a role for poly(ADP-ribose) polymerase (PARP) in replication repair will be presented. These data suggest that inhibition of PARP alone efficiently and specifically leads to killing of BRCA2 defective tumours.