Machine Learning for Science: Learning Representations and Governing Equations from Scientific Data

The biological problem with implications for stem cell therapy and regeneration: In theory human induced pluripotent stem cells (hiPSCs) can differentiate into hematopoietic stem cells, which is the most used cell type in cell therapy. However, de novo generation of HSC from hiPSCs has not be possible. This is due to the lack of a holistic understanding of the molecular differences between authentic HSC and hiPSC-derived hematopoietic progenitors (hiPSC-HPC). Here, we take a systems approach that enabled the identification of gene networks differentially regulated between hiPSC-HPCs and endogenous HSCs through single cell transcriptomic analysis. We generate single cell RNAseq data of CD34+CD43+ hiPSC-HPCs and human cord blood hematopoietic stem cells. Public available single-cell RNAseq data of early human embryos will be used together with our own data to construct a developmental trajectory of human hematopoiesis and to understand where hiPSC-HPCs fall in this trajectory.

The computational challenge: In essence we like to control and steer a temporal process – molecular reprogramming – where we don’t have the governing equations. We aim to develop a fundamental method to address this problem which has a broad applicability in scientific domains where we have limited or no access to the governing equations. We will (1) use representation learning to embed the temporal data (time-stamped) in latent space. To this end we will use a variational autoencoder (VAE).  (2) Using the latent space we will interpolate the trajectories and try to predict, i.e. perform curve continuation towards the desired endpoint (the target of the control problem). (3) Finally we will decode the full temporal trajectory in latent space back to gene-expression space. This construction will constrain a formulation of phenomenological equations capturing the hidden (latent) dynamics of the process. Furthermore, we will also estimate the RNA velocity and its latent projection in order to facilitate the temporal curve prediction in latent space.

Investigator: