Intelligent Design and Discovery of Molecules
Novel molecules and materials are traditionally developed based on experimental screening or a priori theoretical calculations relating molecular structure with desired properties. However, these approaches are expensive, and shortening the timescale of discovery requires a rapid screening methodology. This proposal will develop an artificial intelligence (AI) framework to rapidly explore the chemical space and discover molecules with properties needed for a specific application requirement. To this end, this study has two main objectives. The first is to develop quantitative structure property relationship (QSPR) models to predict how molecular structure affects molecular properties. The second objective is to use variational autoencoders and adversarial training to design new molecules with desired properties.
Artificial intelligence inspired design of non-Hermitian systems
Aim: to develop generative deep learning models to offer on-demand optical functionalities of non-Hermitian structures
Develop GANs models to solve the complex inverse scattering problems in non-Hermitian Systems;
Develop CycleGANs models for retrieval of structure’s geometry to obtain desired far field patterns
Develop Variational Autoencoders (VAEs) architectures for inverse design of non-Hermitian spatial filters
Coupling Encoder-Decoder Representational Learning Architecture with Generative Adversarial Modeling for Mitigating Drug Resistance Against Malaria
Malaria is a global health burden and drug resistance is major hurdle preventing early effective treatment. Malaria parasites show considerable heterogeneity in the gene expression programs when exposed to the most effective antimalarial drug Artemisinin. The cellular mechanisms of Artemisinin tolerance are not fully understood. We aim to unravel the mechanisms of drug resistance by Machine Learning (ML), Artificial Intelligence (AI)-driven modelling of the transcriptional landscapes at single-cell level. This project can naturally be extended in the next phase towards data-driven ML modeling of Plasmodium against other antimalarial drugs and indeed for other protists or bacteria or fungi against anti-infective compounds.
IMCP: An Efficient In-Memory Convolutional Neural Network (CNN) Processor
Even though CNNs achieve great success in performing complex intelligence tasks, this achievement comes with an overwhelming cost. Modern CNN architectures require hundreds of stacked convolution layers, performing billions of operations for a single input. To meet the unique needs of AI-based systems, various architectures were proposed as possible contenders such as GP-GPUs or application-specific processors like Google TPU and Intel Movidius. These architectures focus on utilizing massive parallelism to address the computational needs; however, the issue of memory access is still largely not addressed. As the CNN complexity increases, the architecture becomes more memory dominant since the total number of weights, activations, and operations increases proportionally, leading to an increase in data access and an associated increase in energy consumption. To address these challenging issues, we propose an In-memory CNN Processor (IMCP) for CNN-based AI applications. The IMCP combines processing and storage in the same location by following the principles of associative computing, where the vectorial operations are performed as bitwise operations inside the memory. The architecture inherently lends itself to variable bit-width computing utilizing bit-level sparsity to result in a level of control on resources not available to traditional processors.
Training Machines to Recognize Reliable Protein-Protein Docking Poses
We will develop machine learning methods to classify protein-protein docking poses as correct or incorrect. We will improve the balance of the training set by employing SMOTE and GANs and the variance and size by the Snorkel technique. Our methods will be applicable to life sciences and bioengineering.
In this project we focus on protein-protein docking, a fundamental problem in computational biosciences with numerous applications, such as understanding Alzheimer’s disease, or designing enzymes for the biodegradation of pollutants in the environment. To date, reliably predicting the 3D structure of protein-protein complexes by molecular docking is still an open challenge, with one of the critical steps being the scoring, i.e. the ability to discriminate between correct and incorrect solutions within a wide pool of generated 3D poses. Motivated by this, we plan to utilize machine learning methods to classify protein-protein docking poses as correct or incorrect. We claim that classification accuracy can be improved significantly by improving the characteristics of the training data. We will improve the balance of the training set by employing SMOTE and GANs and the variance and size by the Snorkel technique. Our methods will be applicable to life sciences and bioengineering.
Fish Behavior Recognition and Data-driven Control for Smart Aquaculture
Fisheries play a vital role in global food supply and are becoming key components in countries’ economy.
However, the optimal management of fish production is crucial to ensure economic efficiency of aquaculture. It is known that fish growth rate depends on external environmental conditions, which include food and water quality but also on the fish’s internal physiological and health situations. This project aims at studying the interaction between these internal and external factors and predict the effect of fish behavior on the production quantity and quality. The information on the fish behavior is used to feed the control and monitoring system, which controls the feeding process, regulates the water temperature, monitors the oxygen concentration …etc. Overfeeding of fish incurs the cost of wasted food but also adversely impacts the water quality and increases routine maintenance and cleaning costs. In an ideal system, delivery of feed would be optimized as a response to specific fish behaviors indicative of specific hunger or energy levels.
Inverse CFD design using a Deep Network Surrogate (CFD-Net)
Problem we are facing: Because of the massive computational burden of typical inverse CFD design process, searching a wide variety of input geometry shape to optimize a payoff function (e.g. drag) is infeasible.
How we aim to tackle it: To reduce the search space for new optimized shapes, we leverage the universal approximation property of deep neural networks to estimate a differentiable surrogate to the CFD forward simulation: CFD-Net.
McLaren and Boeing have already
expressed their interest to work
with us on this line of research.
Glycosylation is a post-translational modification widely implicated in structural and functional attributes of the cell. Changes in glycosylation patterns are associated with invasiveness, acquisition of virulence features promoting metastasis, and epithelial-mesenchymal transition in a wide range of solid tumors. Hence, investigation of the glycan diversity within the functional and developmental hierarchy of cancer classifications suggests their great potential and utilization as diagnostic and prognostic biomarkers.
In this proposal, we follow the hypothesis that glycosyltransferases (GTs) play a crucial role in cancer formation and metastasis, and by investigating the regulation of GTs at various levels, we anticipate that they can be used as cancer biomarkers able to decipher cancer hierarchies. We thus propose to develop the first multi-task learning framework for simultaneous GT-based prediction of cancer/normal, cancer type, subtype, and survival.
Algorithmic, Systems and Privacy Aspects of Split Learning
Federated learning (FL) is a new machine learning setting introduced in 2016 in a sequence of papers resulting from a collaboration between a Google team led by Brendan McMahan and Peter Richtarik’s group.
Key idea: Many clients (e.g., mobile phones, IoT devices or organizations) collaboratively train a machine learning model under the orchestration of a central trusted server, while keeping the training data stored on the client devices in a decentralized fashion in order to protect privacy.
Split learning (SL) is a new federated learning tool specifically developed for training of deep neural networks.
Key idea: Cut the network model to be trained at a certain layer, splitting into two parts: one stored at the clients and one stored on a server.
Enhance split learning with communication compression
Develop attacks on split learning systems and measures to mitigate such attacks
Develop a practical split learning system for edge devices
Neural Visiolingual Editor
Learning to Edit 3D Structural variations with Language Guidance with Neural Networks
Goal: Starting from a blank slate or an initial 3D model loaded from disk, a user can iteratively refine the model by issuing commands in the form of natural language. We consider and exploit synergies between two geometric 3D application domains: furniture and whole-cell modeling
Reinforcement Learning and Path Planning for Urban Air Mobility
We focus on the development of safe guidance strategies for aerial vehicles capable of routine and emergency maneuvers, using recent progress in Reinforcement learning, graph neural network and transfer learning. Our study will complement the original automaton concept with “emergency maneuvers” that are adapted to the various kind of failures that can be encountered by the flying machine. We will formulate trajectory planning as a reinforcement learning problem. In this context, we will explore how value functions learned in a context-free environment maybe properly adapted and used in operational, obstacle-ridden environment with risk-prone and risk-free areas. One targeted application is for urban air mobility.
“Next-Gen” Mapping and Monitoring of Coral Reefs with SLAM and SfM
Traditional approaches to characterize and quantify benthic communities and their changes rely on 1D or 2D surveys conducted by divers within a prescribed area. A limited number of survey sites and a relatively small portion of a reef surface can be characterized in a typical day of work, and the identification of the key invertebrates relies on experts and is notoriously time consuming. The Red Sea Development Company (TRSDC) is one of Saudi’s premiere “gigaprojects”. It aims to develop a world-class, hyper-luxury resort complex in the Al-Wajh lagoon, a unique site featuring rich and complex framework geological and biological habitats. As the TRSDC is committed to protecting the natural environment, surveys of the site’s coral reefs are planned at an unprecedented scale and new challenges arise in terms of data acquisition and analyses. The overarching project objective is to develop a framework for large-scale, semi-autonomous implementation of Simultaneous Localization and Mapping (SLAM) and Structure from Motion (SfM) of coral reefs to facilitate: 1) establish a large scale and high resolution 3D quantitative coral community baseline assessment; 2) develop a mechanism for high-resolution detection of change in various environmental indicators; 3) pioneer a large-scale demonstration of automation of the various components (surveys, coral identification, and 3D reconstruction). These goals will be attained through a combination of field surveys, supervised classification of videos, SfM to create 3D reconstructions of reefs and individual corals, and SLAM with autonomous underwater vehicles (AUVs). The key innovation will be to test the capacity of the AUV to capture an SfM dataset. In addition to reconstructing the 3D mesh, machine learning will be used to gradually automate the identification of corals within a reconstruction. Repeated surveys of a site (aided by SLAM and the installation of permanent guidance “beacons”) will provide unprecedented scientific insights, such as colony-specific growth rates or the ability to detect subtle early warning signs in coral health.
Machine Learning for Science: Learning Representations and Governing Equations from Scientific Data
The biological problem with implications for stem cell therapy and regeneration: In theory human induced pluripotent stem cells (hiPSCs) can differentiate into hematopoietic stem cells, which is the most used cell type in cell therapy. However, de novo generation of HSC from hiPSCs has not be possible. This is due to the lack of a holistic understanding of the molecular differences between authentic HSC and hiPSC-derived hematopoietic progenitors (hiPSC-HPC). Here, we take a systems approach that enabled the identification of gene networks differentially regulated between hiPSC-HPCs and endogenous HSCs through single cell transcriptomic analysis. We generate single cell RNAseq data of CD34+CD43+ hiPSC-HPCs and human cord blood hematopoietic stem cells. Public available single-cell RNAseq data of early human embryos will be used together with our own data to construct a developmental trajectory of human hematopoiesis and to understand where hiPSC-HPCs fall in this trajectory.
The computational challenge: In essence we like to control and steer a temporal process – molecular reprogramming – where we don’t have the governing equations. We aim to develop a fundamental method to address this problem which has a broad applicability in scientific domains where we have limited or no access to the governing equations. We will (1) use representation learning to embed the temporal data (time-stamped) in latent space. To this end we will use a variational autoencoder (VAE). (2) Using the latent space we will interpolate the trajectories and try to predict, i.e. perform curve continuation towards the desired endpoint (the target of the control problem). (3) Finally we will decode the full temporal trajectory in latent space back to gene-expression space. This construction will constrain a formulation of phenomenological equations capturing the hidden (latent) dynamics of the process. Furthermore, we will also estimate the RNA velocity and its latent projection in order to facilitate the temporal curve prediction in latent space.