Interpretation, Verification and Privacy Techniques for Improving the Trustworthiness of Neural Networks

Event Start
Event End
Building 1, Level 3, Room 3119;


Neural Networks are commonly used in a wide variety of situations to solve difficult problems through Machine Learning. Such networks are trained over collected data and learn to solve problems ranging from applications of classification of biological bodies to self-driving cars, automated management of distributed computer systems, image classification, business decisions, and more.

The vast amount of application domains, however, is limited by how much trust practitioners are willing to lend to Neutral Network models. Such trust is challenged by the limited ability to answer important questions, including "Will this model perform correctly?", "What information is learned by the model?", or "Can I entrust this model with private data?". In the absence of a reliable answer to these questions, Neural Networks are at risk of being considered untrustworthy, and might in turn not be deployed in domains where they would otherwise have been able to provide improvements over existing practices.

One major concern with Neural Networks is that they are generally considered "black-boxes", and it is considered very difficult to inspect the trained parameters or to understand the learned function. This leads to significant challenges in verifying Neural Networks for correctness and privacy, interpreting the cause of their decisions, and improving the model according to expert knowledge.

In this thesis, we develop several new ways in which the black-box nature of Neural Networks can be overcome, and propose methods that focus on the objective of increasing the trustworthiness of Neural Network models. In particular, we employ two approaches to tackle this black-box nature. First, by focusing specifically on Piecewise Linear Neural Networks (a popular flavor that has been used to tackle many difficult and practical problems), we study different techniques to extract the weights of trained networks efficiently and use them to understand and verify the behavior of the models; second, we show how strengthening the training algorithms, we can provide guarantees that are theoretically proven to hold even for the black-box model.

The first part of this thesis shows errors that can exist in trained Neural Networks, which are sometimes overlooked when looking at aggregate results such as testing accuracy and average performance. We use this analysis to identify the importance of domain knowledge, relations between inputs and outputs, desirable properties, and the pitfalls to avoid with trained models.

In the second part of this thesis, we leverage the nature of Piecewise Linear Neural Networks to extract and verify the decisions of the model. Due to the size and complexity of Neural Networks, existing approaches to verification either require approximations that introduce errors in the verification or have very high run times that limit them to explicitly defined points to verify, lacking the ability to explore the input space for unexpected behavior. We address this by adapting the technique of Mixed Integer Linear Programming to efficiently explore the possible states of the Neural Network. We define multiple properties and show how our approach can verify them in practice, yielding guarantees of correctness or counter-examples.

In addition to verification, we also want to be able to interpret the model behavior. In the third part of this thesis, we extend the Linear Programming technique to break down the behavior of a Piecewise Linear Neural Network into its linear components. Through this approach, we return Continuous Exact Explanations of the model, which match continuous subregions of the input space to linear functions of the model inputs without introducing any approximation. Those explanations are useful to help data scientists understand the learned behavior of the model, identifying regions where the neural network follows predetermined behavior, extracting the most important features to a particular decision, and detecting adversarial models that have been trained to fool existing explanation tools.

Finally, we complement the bottom-up approach of interpreting low-level data to provide high-level information presented so far, with a top-down approach, in which another property of trustworthiness, the model privacy, is provided and guaranteed by adding additional properties to the training process, in such a way that we can guarantee the privacy properties will be reflected in the trained model.

The takeaway of this thesis is a collection of techniques that provide stron, theoretically provable guarantees about Neural Networks, despite their black-box nature. Those techniques improve the trustworthiness of models, making them notably more likely to be deployed in the real world.

Brief Biography

Arnaud Dethise is a PhD candidate in Computer Science under the guidance of Professor Marco Canini. He completed his Bachelor's degree in Engineering and Master's degree in Computer Science in 2015 and 2017 respectively, from UCLouvain, Louvain-la-Neuve, Belgium. Arnaud has a strong interest in machine learning applied to distributed systems and is currently pursuing his doctoral research in this field. His specific research interests include Neural Networks, trustworthiness, verifiability and explainability, and model privacy, with a focus on exactness and provable guarantees. Arnaud’s goals are to increase the trust of users in ML tools by extending their evaluation beyond performance.

Contact Person