By Ibrahim M Alabdulmohsin, PhD candidate of professor Xiangliang Zhang (KAUST)
Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthesis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications.
In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. The two methods differ, however, in how they construct their ellipsoidal approximation. The first algorithm is a maximum-determinant method, which aggregates all of the previous queries in a greedy manner, and achieves an optimal sample complexity in practice. The second algorithm is a Markovian algorithm, which relies on Khachiyan's classical update formulas for solving linear programs. It is significantly superior to the previous method in its space and time complexity, at the expense of an increase in the sample complexity. Besides, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning.
Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response.
In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification.
In this thesis, I argue that learning via the empirical kernel maps, also commonly referred to as 1-norm Support Vector Machine (SVM) or Linear Programming (LP) SVM, is the best method for handling indefinite similarities. In particular, the 1-norm SVM method is conceptually simple, which makes it easy to implement and maintain. It is competitive, if not superior to, all other methods in terms of its predictive accuracy. Moreover, it produces solutions that are often sparser than more recent methods by several orders of magnitude. These advantages of 1-norm SVM over other methods are established both theoretically and empirically.
Biography: Ibrahim Alabdulmohsin is a final year Ph.D. student at the Machine Intelligence and Knowledge Engineering group (MINE), co-advised by Prof. Xiangliang Zhang and Prof. Xin Gao. He obtained his B.S. in computer engineering with highest distinction from the University of Nebraska, Lincoln and obtained his M.S. in electrical engineering from Stanford University. He has received the superior scholarship award from the University of Nebraska (2005), the Saudi Aramco - Information Technology recognition award for developing innovative solutions (2007), the Saudi Aramco - Operations Services recognition award (senior VP) for distinguished achievements (2012), the KAUST Provost award (2012), and the Outstanding Reviewer Award at NIPS (2016). His research interest is in data science, such as machine learning, data mining, information theory, and statistics.
For more info contact:
Xiangliang Zhang: email: firstname.lastname@example.org
Date: Sunday 9th
Time:10:00 AM - 12:00 PM
Location: Room 5220 in Building 3
Refreshments: Light lunch will be provided at 11:30am