Machine Learning for Science: Learning Representations and Governing Equations from Scientific Data

Event Start
Event End
Location
KAUST

Abstract

In essence, science is about discovering regularities in Nature. It turns out that such regularities (laws) are written in the language of mathematics. For example, fundamental equations such as Maxwells, Newtons, mean-field, and renormalization techniques in statistical mechanics enable deep understanding, often in combination with computational modeling and simulations. In many cases, laws can be formulated and refined from fundamental "first principles" such as the variational principle of least action in mechanics. Yet, in phenomenological areas, including biology, we have data but neither "first principles" nor fundamental equations. Machine learning and deep learning, in particular, are remarkably successful in classification and prediction tasks. However, when trained on data, such systems do not, as a rule, provide compact mathematical laws or fundamental first principles. Here we ask how we can learn representation supporting the identification of interpretable compact mathematical laws from complex data sets when we don't have access to first principles. I will overview this problem and provide some vignettes of our ongoing work in attacking this problem.

Brief Biography

Professor at KAUST (08/2016), affiliated with Bioscience, Computer Science, Statistics, and Bioengineering programs. He had been acknowledged with honors for best teaching; physiology & neuroscience to medical students in Medical School (Karolinska Institute) and machine learning, computational mathematics, and computational neuroscience to Engineers in the Royal Institute of Technology. At KAUST, he teaches "Computational Bioscience and Machine Learning (B322)", "Machine Learning for Genomics and Health (B324)," and upcoming classes in Geometric Machine Learning and Biological Networks (B326) and Fundamental Skills in Bioinformatics (B200). He was nominated for the KAUST distinguishing teaching award 2020.

He received awards and distinctions for his leadership. He serves on the editorial boards of several international journals. He has published over 300 papers (cited >15 000 times, H-index>50). He has founded two BioIT companies, consulted for biotech startups, and collaborated with some major pharmaceutical companies.

He holds three distinct undergraduate degrees, Medicine (Medical School), Pure Mathematics (minor Theoretical Physics), and Theoretical Philosophy (minor Psychology), and two years of postgraduate education in Pure and Applied Computational Mathematics, including machine learning and neural networks. An experimental MD/Ph.D. degree in Neuroscience/Medicine was defended 09/1997 at Karolinska Institutet.  He was appointed as an Assistant Professor of Computer Science 1998-2002 (Royal Institute of Technology) but on leave of absence as Visiting Scientist (as a five-year Wennergren fellow) and an Alfred P Sloan Fellow (Boston, USA, 1998-2001). He was recruited (02/2002) to the first Chaired Full Professorship in Computational Biology in Sweden and Head of the Division for Computational Biology within the Department of Physics. In 2009 he was called to become a Director of the Unit for Computational Medicine. He received a Life Time Named Strategic Professorship in Computational Medicine joint affiliation between Karolinska Institutet and Karolinska University Hospital at the Center for Molecular Medicine. He was an ERC co-investigator (2012-2017) on the causal discovery. He ranked as outstanding (highest distinction, top 5%) in the external research evaluation 2012) among 500 investigators at Karolinska Institutet and Karolinska University Hospital. In 06/2014, he was named Faculty at the Science for Life Laboratory, a National Center for Molecular Biosciences, Stockholm, Sweden. 

Two core questions define his research: How to "construct" (general) reasoning (intelligent) systems to augment scientific discovery of the equations of Nature?  This is translated into an algorithmic construction of an "artificial scientist" for a data-driven construction of generative equation-based explainable models of their "world" of observations/data. The other question is how to "understand" living systems – as one emergent aspect of matter? Here we target cells as natural building blocks of living systems. We develop algorithms, theory, and generation of high-resolution data (single-cell-genomics) to decode and learn data-driven models of cellular networks, aiming for a general field theory for non-equilibrium dissipative non-linear cellular systems. Our multi-disciplinary expertise has enabled us to work closely with clinical scientists for the last 15 years targeting translational questions (>100 publications in this area).

Contact Person