Analysis and Design of Linear Classifiers for High-Dimensional, Small Sample Size Data Using Asymptotic Random Matrix Theory

Due to a variety of potential barriers to sample acquisition, many of the datasets encountered in important classification applications, ranging from tumor identification to facial recognition, are characterized by small samples of high-dimensional data. In such situations, linear classifiers are popular as they have less risk of overfitting while being faster and more interpretable than non-linear classifiers. They are also easier to understand and implement for the inexperienced practitioner.

Overview

Abstract

Due to a variety of potential barriers to sample acquisition, many of the datasets encountered in important classification applications, ranging from tumor identification to facial recognition, are characterized by small samples of high-dimensional data. In such situations, linear classifiers are popular as they have less risk of overfitting while being faster and more interpretable than non-linear classifiers. They are also easier to understand and implement for the inexperienced practitioner.

In this dissertation, several gaps in the literature regarding the analysis and design of linear classifiers for high-dimensional data are addressed using tools from the field of asymptotic RMT which facilitate the derivation of limits of relevant quantities or distributions, such as the probability of misclassification of a particular classifier or the asymptotic distribution of its discriminant, in the RMT regime where both the sample size and dimensionality of the data grow together. The resulting insights extracted from these limits allow for a deeper understanding of the classifier's behavior as well as lay out the groundwork from which to propose modifications to the classifier in order to improve its performance. Asymptotic RMT is also used in this dissertation to derive estimators of quantities of interest which are consistent in the RMT regime. Besides this, the estimators facilitate the tuning of classifier hyperparameters without resort to empirical methods such as cross-validation which can be very computationally-taxing when high-dimensional data is involved.

This work begins with an asymptotic study of the discriminant-averaging and vote-averaging RP-LDA ensemble classifiers -- two high-dimensional variants of the classical LDA classifier based on random projections. The asymptotically optimal ensemble based on randomly-projected LDA discriminants for Gaussian data is found to be a form of discriminant-averaging and it is shown that selecting projections for inclusion in the ensemble based on some metric of expected performance offers no performance advantage. Furthermore, a closer look at the infinite ensemble version of the discriminant-averaging RP-LDA ensemble classifier, where the Marzetta estimator of the precision matrix arises in the discriminant, reveals that the Marzetta estimator behaves as an inversion of a linear regularization of the sample covariance matrix. This has the implication that the discriminant-averaging RP-LDA ensemble classifier asymptotically behaves as a special case of R-LDA with coarser parameter tuning since its regularization parameter varies with the integer projection dimension. From there, the class of rotationally-invariant estimators--to which the Marzetta estimator belongs--is studied in the context of classification. A modified LDA classifier based on a rotationally-invariant estimator of the sample covariance matrix having non-linear shrinkage which minimizes the probability of misclassification is proposed. Finally, a technique for tuning the weight vector of a generic binary linear classifier is developed. This technique is shown to not only yield performance gains for LDA in a small sample scenario, but to also compensate for the performance loss due to non-optimal native hyperparameters of classifiers such as the SVM with linear kernel, which would otherwise be computationally costly to tune optimally. The dissertation is concluded with an application of the weight vector tuning technique to the transfer learning of a deep neural network, showcasing the ubiquity of linear classifiers and the potential for widespread applicability of the proposed technique. 

Brief Biography

Lama Niyazi is a Ph.D. candidate in the Electrical and Computer Engineering program at KAUST. She obtained her Master's degree in Electrical Engineering from KAUST and her Bachelor's degree in Electrical and Computer Engineering from Effat University. She works on analysis and design of classifiers for high-dimensional data using random matrix theory.

Presenters