NOVEMBER 21-22, 2019, BASEL
The first and only conference dedicated to Python in the pharma data science world.
Learn More“ Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. ” — Atul Butte
PyPharma will be a meeting, exchange and learning point for industry and academic scientific python users in pharma. At PyPharma, we will learn about the most interesting current challenges in the field and the new and exciting python tools and packages that can be used to tackle them.
PyPharma will be aimed at pharmaceutical python users of all levels working in data science, and it will welcome workshops, talks and posters covering all aspects of the pharmaceutical lifecycle and ecosystem where python tools are applied as well as data modalities.
PyPharma is fully run by volunteers in the pharmaceutical industry and academia and hosted by Roche and the University of Basel. Attendance will be based on invites, and it will be a single track, 2-day conference. Attendance will be purposely kept small to maximize interaction between the audience and the speakers.
Time | Biozentrum Seminarraum 104 | Kollegienhaus Mehrzweckraum | Missionsstrasse 64a Computerraum.00.010 | Milsionsstrasse 64a Seminarraum.00.004 |
---|---|---|---|---|
09:00-10:30 | Deep Learning for Natural Language Processing in Pharma and Biomedical Applications | Building pharmaceutical web applications with Dash | Genomic Data Analysis with Pandas | |
10:30-11:00 | Coffee break | |||
11:00-12:00 | Deep Learning for Natural Language Processing in Pharma and Biomedical Applications | Building pharmaceutical web applications with Dash | Genomic Data Analysis with Pandas | |
12:00-13:00 | Lunch | |||
13:00-14:30 | Interpretability in Machine Learning for Computational Biology | Snakemake for reproducible analyses | Hands-on Bayesian: what, why, how? | Cheminformatics workshop from QSAR to DNN |
14:30-15:00 | Coffee break | |||
15:00-17:00 | Interpretability in Machine Learning for Computational Biology | Snakemake for reproducible analyses | Hands-on Bayesian: what, why, how? | Cheminformatics workshop from QSAR to DNN |
Time | Speaker | Affiliation | Title | Description |
---|---|---|---|---|
08:00-08:45 | Breakfast | |||
08:45-09:00 | Conference Opening | |||
09:00-09:50 | Keynote 1: Greg Landrum | T5 Informatics | Discovering the PyData Ecosystem and RDKit | If you're working with pharma data in Python, odds are pretty good that you're familiar with tools like pandas, numpy, scipy, and matplotlib. But there are a bunch of other interesting and useful tools out there in the collection of software informally known as the "PyData stack". Those are the things I'm going to talk about here.
I'll start with an intro and overview of the RDKit, since I'm going to be using chemical data in my examples and it's a project that's near and deal to my heart. After that, I'll introduce a few additional members of the PyData ecosystem and demonstrate how they can be used together to solve real problems. I'll also highlight how we can integrate the RDKit with these tools to allow working with chemical data as a first-class citizen. I will close with some thoughts about success and sustainability of open-source projects in our field. |
09:50-10:20 | Martin Preusse | University of Freiburg | Integrating Molecular and Clinical Data with Python Knowledge Graphs & Neo4j | Data is everywhere but generating useful knowledge is difficult. To create value from data you need solutions for all steps of the process: Data acquisition, cleaning, structuring, annotation, integration, modeling, validation and much more. Even when all technological challenges are addressed, you have to account for the most important aspect of data applications: The human beings who decide what to do and how to use the technology. There are two key aspects of data applications that are overlooked by the current AI hype. First, integrating heterogenous data is still an issue. Second, getting an overview of data is essential to take informed decisions. Knowledge graphs have the potential to adress these aspects and to serve as the central hub of your data application. They link heterogenous data sources and provide access to structured data for decision making, data analysis and end-user applications. In my talk I will give examples how biomedical knowledge graphs implemented in the graph database Neo4j help to link molecular and patient level data. On top of that, I discuss how knowledge graphs can support the use of real world evidence for early stage drug development. |
10:20-10:40 | Poster Session / Coffee Break | |||
10:40-11:10 | Flaviu Cipcigan | IBM Research | Python for molecular dynamics workflows and conformational sampling of cyclic peptides | Molecular dynamics holds great promise for the pharmaceutical industry. Both targets and ligands are dynamic, exploring multiple conformations. Thus, the typical methods that assume a single binding pose or a single shape for a pocket will have limited ability to predict realistic interactions. As pharmaceutical industry moves to larger drugs such as cyclic peptides or therapeutic proteins, molecular flexibility is starting to be more and more important. For example, cell permeability of cyclic peptides can be modulated by their conformational changes. In order for molecular dynamics to be fully adopted by the industry, better tools are needed to manage computational workflows and explore the resulting data. Here, I will discuss a Python middleware that simplifies the running and analysis of molecular dynamics simulations and allows the user to interactively analyse the resulting multidimensional data. I will also present the use of this tool in the context of a research challenge of predicting the permeability of cyclic peptides. |
11:10-12:00 | Keynote 2: Kai Blin | Technical University of Denmark | Mining microbial genomes for interesting metabolites | Secondary metabolites produced by microorganisms are the main source of bioactive compounds thatare in use as antimicrobial and anticancer drugs, fungicides, herbicides and pesticides. As anincreasing number of microorganisms is becoming resistant against the antimicrobial compoundscurrently in use, there is a dire need for new compounds and compound classes that show newmodes of action to avoid the current resistance mechanisms. In the last decade, the increasingavailability of microbial genomes has established genome mining as a very important method for theidentification of biosynthetic gene clusters (BGCs) responsible for producing such novel compounds.This talk will present antiSMASH, a Python-based tool to mine microbial genomes for interestingBGCs in order to find new bioactive lead compounds. I will show how Python helped antiSMASH tobecome the most widely used tool in the field, with over half a million of processed jobs and over 2000weekly users on the public web service alone. I will also present the up- and downsides of running acomplex analysis pipeline as a web service in Python. |
12:00-13:45 | Lunch / Poster Session | |||
13:45-14:35 | Keynote 3: Maria Rodriguez Martinez | IBM, Zurich, Research Laboratory | Artificial Intelligence approaches for personalized medicine. | In recent years, AI has become a very active field in computer science and models with astounding performances in a broad area of applications such as computer vision, speech recognition and natural language processing have been developed. In computational biology, the recent availability of large amounts of data generated by large international consortia together with technical developments facilitating the implementation and training of more performant models have made possible the broad application of deep learning and machine learning approaches to a vast set of problems.
In this talk, I will present current activities at the Computational Systems Biology group in IBM Research, Zurich, that illustrate the application of AI approaches to unravel disease mechanisms and develop personalized patient models. For instance, I will show how models for text ingestion can be used to automatically extract knowledge from biomedical publications and obtain comprehensive maps of molecular interactions. I will also show how multi-modal neural networks can be trained to ingest disparate data types, such as compound molecular structures, transcriptomic data and prior molecular knowledge, to predict drug sensitivity in cancer cell lines. Finally, I will illustrate how deep learning models can be adapted to characterize tumor heterogeneity in single-cell data. |
14:35-15:05 | Franciska Oschman | ETH Zurich | Applications of machine learning in research | Within the last 20 years machine learning (ML) experienced a boost in its impact on our daily lives. With the help of supervised and unsupervised methods tasks like computer vision, recognition of speech or text have been revolutionized. Due to this high impact of ML ongoing research focuses on the constant improvement of these methods. However, ML is not exclusively the subject of research, but can also be used as a tool for the investigation of research questions. For example, ML is used to uncover hidden patterns in experimental data not detectable with neither the human eye nor standard statistical methods or to train machines so that they can take over repetitive tasks like object recognition. In this talk I will present current applications of ML in research from different domain sciences. I will focus on Python-based techniques for data preparation and analysis applying both standard ML methods and state-of-the-art implementations of deep neural networks. |
15:05-15:25 | Poster Session / Coffee Break | |||
15:25-15:55 | David Marcus | GlaxoSmithKline | Explore, exploit, and extrapolate: How AI-driven SAR navigation facilitates lead optimisation in drug discovery | Small molecule drug discovery involves a complex multi-parameter optimisation process with cycles of design, make and test to establish a desired compound profile. Within this context, machine learning methods, experimental design and de-novo structure generation have all found a place to facilitate and accelerate lead optimisation. However, they have tended to be used in a reactive manner, to address problems posed by program teams rather than as a continual and proactive process. In this presentation we will describe how data-driven cheminformatics-based AI methods have the potential to automate parts of the lead optimisation process which historically has been a very time-consuming task. This strategy adds the ability to explore and exploit multiple paths within chemical space and suggest structural modifications that will gain better understanding of the SAR problem at hand. In addition, by continually improving computational models we can extrapolate to new regions of chemical space and suggest novel compounds. The implications of automation for the human-machine interface will be explored and illustrated with examples from BRADSHAW, GSK's experimental automated design environment. |
15:55-16:45 | Keynote 4: Michał Januszewski | Google Research | Reconstruction of neural wiring diagrams from large-scale volume EM data using Python and machine learning | Dense, synaptic-level mapping of neural circuits requires tracing neurons within volume electron microscopic datasets. With single volumes now reaching the scale of petabytes, manual approaches are no longer viable and automation becomes a necessity. Within the Connectomics team at Google (https://ai.google/research/teams/perception/connectomics/) we have developed infrastructure and algorithms for storage, analysis, and visualization of such large volumetric datasets. Much of this software stack is open source and implemented in Python. In the talk, I will describe how we use this reconstruction pipeline to convert series of EM images into brain connectivity diagrams, and how modern machine learning techniques based on deep neural networks allow us to solve classification and segmentation problems in such datasets. |
16:45-17:00 | Conference Closing |
Day 1 will consist of workshops and will take place at the University of Basel. The room locations are:
Day 2 will consist of keynotes and talks and will take place at the Roche Viaduktstrasse amphitheater (close to the Basel SBB station).
María Rodríguez Martínez did her undergraduate studies in Physical Sciences at Universidad Complutense de Madrid. She then did her PhD in Theoretical Cosmology, at the Institut d’Astrophysique de Paris. Her PhD research focused on developing cosmological models of the early evolution of the universe with additional spatial dimensions. After completing her PhD, she moved to the Hebrew University in Jerusalem, where she focused on setting astrophysical bounds to theories that break Lorentz symmetry using the high-energy emissions from Gamma-Ray Bursts, extremely powerful explosions of gamma rays coming from outside our galaxy. In 2007, she transitioned into the field of Systems Biology as a postdoc at the Weizmann Institute of Science in Rehovot (Israel). Her research was devoted to the development of quantitative descriptions of biological networks and the complex interactions within. In 2009, she moved to Columbia University where she developed quantitative models to understand cancer gene dysregulation. María joined IBM as a Research Staff Member in November 2013. Her research at IBM focuses on integrating different high-throughput molecular datasets in order to build comprehensive molecular models of disease that can help clinicians to provide better diagnoses and suggest personalized therapies.
After obtaining a degree in bioinformatics from the computer science faculty at the University of Tübingen in 2009, Kai switched to Institute for Microbiology and Infection Medicine at the same university to obtain his PhD. During his PhD project, he co-developed the antiSMASH genome mining tool, initially released in 2010. Building on his software engineering knowledge obtained working on Open Source software projects since his undergrad days, Kai has focused on developing translational bioinformatics tools and databases. During his first post-doc at the Max Planck Institute for Biology of Ageing in Cologne, Germany, he worked on the doRiNA database of RNA interactions in post-transcriptional regulation. Returning to natural products research, Kai joined the Novo Nordisk Foundation Center for Biosustainability (NNFCFB) at the Technical University of Denmark, in Lyngby, Denmark. In addition to returning to antiSMASH development, he has driven the development of the antiSMASH database, the CRISPR/Cas sgRNA design tool CRISPy-web, and a number of smaller software projects. Kai is currently working as a Researcher at the NNFCFB, where he heads the bioinformatics group of the New Bioactive Compounds section ran by Sang Yup Lee and Tilmann Weber.
Michał Januszewski is a Staff Software Engineer at Google Research in Zürich, where he currently works on automated methods for high-throughput synaptic-resolution brain mapping. Prior to Google, Michał did research in the field of Computational Fluid Dynamics. He holds a PhD in Physics from University of Silesia in Katowice, Poland.
After getting a Ph.D. in Theoretical Chemistry in the group of Roald Hoffmann at Cornell University and doing a post-doctoral fellowship in Aachen Germany Greg moved to California where he worked at a couple of startups and started applying machine learning to life-sciences problems. In 2006 he moved to Basel to work at the Novartis Insitutes for BioMedical Research. Greg spent about five years in the computer aided drug design group and then shifted to research IT to head the group responsible for chemistry software and data systems. Eventually he was also responsible for an internal initiative to integrate internal and external biological and chemical data and make it available to researchers. He now splits his time between KNIME and T5 Informatics, a consulting company providing support and services related to the open-source cheminformatics toolkit RDKit. Greg has been using Python together with C++ to solve scientific problems for 20 years now.
Title | Workshop Organizer | Affiliation |
---|---|---|
Deep Learning for Natural Language Processing in Pharma and Biomedical Applications | Diego Saldana | Roche |
Genomic Data Analysis with Pandas | Maryan Zaheri and Carsten Magnus | University of Zurich (UZH) |
Bayesian Inference with Python | Elizaveta Semenova | AstraZeneca |
Tutorial on interpretability in machine learning for Computational Biology | An-Phi Nguyen | IBM |
Cheminformatics workshop from QSAR to DNN using RDKit | David Marcus | GlaxoSmithKline |
Snakemake for Reproducible Analyses | Romain Feron and Amina Echchiki | University of Lausanne (UNIL) |
Building pharmaceutical web applications with Dash | Rafal Chojnacki and Agata Figas | Roche |
Diego Saldana is a Data Scientist at Roche Personalized Healthcare (PHC). He has developed models to perform various tasks and analyze diverse data sources. Currently his main applications of interest are in oncology and clinico-genomics.
Otto Fajardo works at Roche in the Biometrics department handling Real World and Clinical Data. He uses python in conjunction with relational databases to perform his job. He authors packages for data handling, most of them internal to Roche but also open source (pyreadstat, pyreadr). He has a background in Neuroscience (PhD) and Dentistry and has been using Python everyday for the last 10 years.
Carsten Magnus is a theoretical and computational biologist at the Institute of Medical Virology, University of Zurich. In his research he develops models and methods (mainly implemented in R and python) to tackle important questions on HIV and Influenza evolution. Check out his webpage webpage for further information.
Maryam Zaheri is a scientific assistant and computational biologist at the Institute of Medical virology, University of Zurich. She develops models and methods mainly in the field of metagenomics, drug resistant mutation detection and research related to HIV virus.
Geoffrey is a computational biologist/bioinformatician with the SIB Swiss Institute of Bioinformatics and the University of Basel. He generally works with biologists engaged in scientific computing at the high-performance computing center (sciCORE), and with the broader research community in data science applications and training.
Elizaveta Semenova is a Post-doctoral Researcher at AstraZeneca working in Bayesian Machine Learning. She has a PhD in Epidemiology/Biostatistics with experience in spatio-temporal modelling, data analysis, theoretical and applied mathematics. Elizaveta is interested in Bayesian statistics, machine learning and probabilistic modelling. She is also a technological innovation and technical education enthusiast.
Matteo is a Research Staff Member in Cognitive Health Care and Life Sciences at IBM Research Zürich. He's currently working on the development of multimodal deep learning models for drug discovery using chemical features and omic data. He also researches in multimodal learning techniques for the analysis of pediatric cancers in a H2020 EU project, iPC, with the aim of creating treatment models for patients. He received his degree in Mathematical Engineering from Politecnico di Milano in 2013. After getting his MSc he worked in a startup, Moxoff spa, as a software engineer and analyst for scientific computing. In 2019 he obtained his doctoral degree at the end of a joint PhD program between IBM Research and the Institute of Molecular Systems Biology, ETH Zürich, with a thesis on multimodal learning approaches for precision medicine.
An-phi is a doctoral student in the Cognitive Health Care and Life Sciences group (IBM Research Zurich) and in the Seminar for Statistics of ETH Zurich. His current research mainly focuses on interpretability for machine learning models with applications to computational biology. From time to time he also wonders about other topics in Machine Learning, Statistics, NLP and Computer Vision. He received his BSc degree in Mathematical Engineering from Politecnico di Milano and his MSc degree in Computational Science and Engineering from ETH Zurich. Before joining IBM Research, he worked in two startups as algorithm/software engineer (Insightness AG, TrueAI Ltd.).
I am a cheminformaticitan working at GlaxoSmithKline on various machine learning projects at data and computational sciences department, GSK R&D Stevenage site, United Kingdom. My background is Pharmacy, Medicinal Chemistry and a PhD in Computational Chemistry at the Hebrew University of Jerusalem, followed up by two postdocs, at the University of Cambridge and at EMBL-EBI. My main research interests are small-molecule libraries selection and project design for hit and lead generation in which I devleop tools for compound generation and Active Learning. I am also co-leading the small-molecule predictive modelling centre of excellence within GSK overlooking models progression and development for small-molecule predictions, from QSAR to DNN models, including commercial platfrom as well as in-house proprietry data driven tools.
Simon Dirmeier is a doctoral student in the Computational Biology Group at D-BSSE, ETH Zurich. His scientific interests revolve mainly around graphical and probabilistic modelling of genetic perturbation screens. Furthermore, he works on scalable methods for analysing large-scale imaging data.
Damian is a Senior Researcher at the Machine Learning and Computational Biology Lab at ETH Zurich. He develops and implements machine learning models to extract knowledge from patients’ clinical and omics data. His research interests lie in the analysis of complex networks, with a current focus on using electrophysiological recordings to model the connectivity of neuronal cultures.
This site is under construction.
This site is under construction.
This site is under construction.
Amina Echchiki is a PhD student in Evolutionary Genomics in Marc Robinson-Rechavi group at the University of Lausanne. She enjoys understanding and benchmarking methods for large-scale genomic data analyisis, learning new approaches and implementing solutions for biological questions. Her current research focuses on the analysis of a basal chordata species. As a life science trainer in SIB and ELIXIR, she organizes, leads and teaches institutional courses and workshops. Twitter: @aechchiki.
Romain Feron is a PhD student in Evolutionary Bioinformatics in Robert Waterhouse group at the University of Lausanne. He investigates the relationship between sequence conservation and function in arthropods using multispecies whole genome alignments. In this context, he's been working with Python and implementing pipelines with Snakemake for analysis of genomic data. Twitter: @RomainFeron
Roche/Genentech is offering to cover travel and accommodation expenses for two conference attendees to support the diversity in the open source Python healthcare community. The eligible candidates have to be currently enrolled as students, or work for a non-profit organization.
Update: Two candidates have been selected and applicants have now been informed of the panel's decision.
Get in touch if you have questions or would like to learn more about PyPharma.
We have an active organizing community in slack with members of many companies and institutions in the pharmaceutical industry and academia. Contact us if you're interested in joining our community.
Alternatively, you can sign up to our mailing list using the form below.