Healthcare data often contains an enormous amount of hidden information that can only be extracted after a sufficiently large amount of them is aggregated. However, since healthcare data are usually generated in a distributed manner and stored at different sites such as hospitals, pharmacies, and biomedical labs, it is extremely difficult to aggregate and share them due to privacy concerns. Thus, it is urgently needed to develop effective techniques to gather biomedical data and, meanwhile, protect their privacy. Besides privacy, there is growing concern regarding ML algorithms for healthcare to inflict unfairness and bias. To provide much more sound solutions to these problems, we propose to develop a set of differentially private and fair learning algorithms for healthcare, which can be used as algorithmic tools to both enhance the data-sharing ability and mitigate bias in healthcare systems. Specifically, we are concerned about the challenges from healthcare data, complex tasks in healthcare, and distributed environment.
Different variants of first-order SGD methods have been the popular optimization methods used in practical machine learning applications, particularly for deep learning tasks. However their well-acknowledged shortcomings including relatively slow convergence, sensitivity to hyper-parameter tuning, stagnation due to ill-conditioning, and difficulties in escaping saddle points in non-convex problems, make them unattractive at scale. Second-order methods, which consider the curvature of the loss landscape, promise to remedy these shortcomings. The dimension-independent convergence rate of second-order methods and their robustness with respect to the condition number of the Hessian make them appealing for large scale training tasks and related problems. The main computational obstacles to the successful development of second-order, Newton-type, methods, in models with large data and parameter dimensions, are the operations on the Hessian. While subsampling the Hessian can reduce the cost of managing large data dimensions, the storage of the formally dense matrix and the solution of the Newton system are prohibitive for large parameter dimensions. The key to successful Newton methods is to approximate the Hessian in a way to make its computation, storage, and inversion manageable. In this work, we explore a hierarchical matrix approximation of the Hessian that allows an accuracy-tunable linear-complexity memory footprint to be used. The hierarchical representation allows the construction, update, and matrix-vector multiplication operations on the Hessian to be done in linear or log-linear complexity, and provide support for efficient Newton-Krylov optimization methods. We plan to demonstrate the effectiveness of the hierarchical matrix representation on a number of representative NN training problems, and consider the use of the Hessian in assessing the sensitivity of the converged solution.
Reverse Osmosis (RO) is commonly utilized for obtaining pure water for drinking and industrial purposes. The RO plant performance is continuously monitored by tracking pressure drop, permeate flow and salt passage. These indicators are directly reflective of growing (bio)fouling, scaling, or membrane deterioration occurring inside the module channels. Numerous input parameters affect the RO (bio)fouling propensity, including membrane properties, feed water quality and process parameters. Continuous monitoring of feed water quality and its functional parameters is essential for RO plant operations. This work aims to establish deep neural networks models to unlock the dependency of RO process performance on various physical, biological and chemical process parameters. The effects of feed and process parameters, such as conductivity, oxidation reduction potential, total dissolved solids, turbidity and chemical oxygen demand, feed pressure, flow rate and temperature on RO performance considering salt passage, permeate flow rate, crossflow pressure drop and the pressure difference across membranes will be investigated using different machine learning tools. In addition, optical coherence tomography scans of growing (bio)films will also be studied by convolutional neural networks and direct quantification of type of biofilm and its characteristic effect on filtration input and output parameters would be investigated. The trained networks can then be utilized in a smart governance framework for efficient real-time performance evaluation and decision-making. As the Kingdom of Saudi Arabia is aiming to double its desalination capacity to approximately 12M m3/day within 6 years, relying on RO exclusively, this artificial intelligent framework will have a substantial impact on the cost and environmental impact of desalination.
Over the last decade, hyperspectral imaging has attracted considerable interest in civil, environmental, aerial, military, and biological applications that require the estimation of physical parameters from complex surfaces and the identification via remote sensing of complex materials having fine spectral signatures. Despite these significant advances, hyperspectral imaging still requires high setup costs, is affected by a slow speed of data processing, and necessitates the use of substantial amounts of computational resources to post-process the large data generated. This project addresses the issues mentioned above by implementing a new concept of hyperspectral imaging based on integrated flat-optics and delivers a new class of low-cost and simple to setup hyperspectral cameras in optoelectronic hardware (HOCULUS system), which does not require the use of spectral analyzers or complex mechanical filters. HOCULUS system can integrate hyperspectral functionalities for pattern recognition, semantic image segmentation, and label-free classification in inexpensive hardware that can retrieve projector's barcodes at camera speed and high resolutions, opening up the possibility for real-time acquisition and processing hyperspectral videos. These results could significantly accelerate different research lines in computer vision, especially in bio-imaging, where they may enable a novel understanding of complex dynamical processes in multicellular organisms and the fast identification of diseases at the point of care.
Drug repurposing, i.e. the use of previously approved drugs for novel clinical applications, may dramatically speed up the development of novel therapeutic agents, a urgent need especially during health emergencies. The recent availability of large collections of drug-induced genome-wide expression profiles made the use of modern AI approaches feasible and promising. With this project, we will use a deep adversarial deconfounding model with the aim of producing a gene-expression based embedding of small molecules that is coherent with their known therapeutic applications. Building on our previous experience on gene-expression based computational drug repurposing, we will thus obtain a next-generation tool for finding novel treatment candidates