Software | MINE | Machine Intelligence & kNowledge Engineering

The software below is for scientific purpose ONLY. Comments are very welcome!
K-AP

Matlab programs for generating a user-specified K clusters with message passing (Affinity Propagation). The algorithm was published in the paper:

Xiangliang Zhang, Wei Wang, Kjetil Nørvåg, Michèle Sebag, "K-AP: Generating Specified K Clusters by Efficient Affinity Propagation", ICDM 2010, Sydney, Australia, December 14-17, 2010. File Size 30.4KB.

The programs are distributed under GNU Lesser General Public License(LGPL).

Software Link

StrAP

Matlab code for StrAP: stream clustering with AP (Affinity Propagation), adding an online mechanism of adaption (1412KB). The algorithm has been published in ECML-PKDD 2008 and SIGKDD 2009.

Xiangliang Zhang, Cyril Furtlehner, Michèle Sebag, "Data streaming with Affinity propagation". Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2008), Antwerp, Belgium, pp. 628-643, Lecture Notes in Computer Science 5212, Springer 2008, September 15-19, 2008

Xiangliang Zhang, Cyril Furtlehner, Julien Perez, Cécile Germain, Michèle Sebag, " Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming". Proceedings of 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2009), pp. 987-996, Paris, France, June 28 –July 1, 2009

The programs are made in INRIA and thus are the property of INRIA. The programs are distributed under GNU Lesser General Public License(LGPL).

Software Link

A PCA-Based Change Detection Framework for Multidimensional Data Streams

Detecting changes in multidimensional data streams is an important and challenging task. In unsupervised change detection, changes are usually detected by comparing the distribution in a current (test) window with a reference window. It is thus essential to design divergence metrics and density estimators for comparing the data distributions, which are mostly done for univariate data. Detecting changes in multidimensional data streams brings difficulties to density estimation and comparisons. In this paper, we propose a framework for detecting changes in multidimensional data streams based on Principal Component Analysis (PCA), which is used for projecting data into a lower-dimensional space, thus facilitating density estimation and change-score calculations. The proposed framework also has advantages over existing approaches by reducing computational costs with an efficient density estimator, promoting the change-score calculation by introducing effective divergence metrics, and by minimizing the efforts required from users on the threshold parameter setting by using the Page-Hinkley test.

More details can be found in the paper:

Abdulhakim A Qahtan, Basma Harbi, Suojin Wang, Xiangliang Zhang, "A PCA-Based Change Detection Framework for Multidimensional Data Streams". In the proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining - KDD 2015.

More results are demonstrated at: https://sites.google.com/site/pcachangedetection/home

Download the code

Software Link

Attack and Protect Classifiers

Adversarial Reverse Engineering and Classifier Robustness

Attack a Classifier:

In security-sensitive applications, e.g., spam filters and intrusion detection systems, the deployed classification algorithms can be attacked by adversaries through generating exploratory attacks such as evasion and reverse engineering. For example, an attacker can probe the classifier with queries in order to reveal some confidential information about the training dataset that was used by the system or model the classifier's decision boundary. How to construct artificial queries from scratch? Query synthesis is a branch of active learning for generating queries in order to reveal sensitive information about the true decision boundary.

The objective of this study is to learn a deterministic noise-free halfspace quite efficiently via query synthesis.

The algorithm was published in the paper:

Ibrahim M Alabdulmohsin, Xin Gao, Xiangliang Zhang, "Efficient Active Learning of Halfspaces via Query Synthesis". In the proceedings of Twenty-Ninth AAAI Conference on Artificial Intelligence - AAAI 2015.

Download the Matlab code of the algorithm

Protect a Classifier:

Under such adversarial environments, adversaries can generate exploratory attacks against the defender such as evasion and reverse engineering. We investigate the use of randomization as a suitable strategy for mitigating their risk. In particular, we derive a semidefinite programming (SDP) formulation for learning a distribution of classifiers subject to the constraint that any single classifier picked at random from such distribution provides reliable predictions with a high probability. We analyze the tradeoff between the variance of the distribution and its predictive accuracy and establish that one can almost always incorporate randomization with large variance without incurring a loss inaccuracy.

More details can be found in the paper:

Ibrahim M Alabdulmohsin, Xin Gao, Xiangliang Zhang, "Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering". Proceedings of the 23rd ACM International Conference on Information and Knowledge Management- CIKM 2014.

Download the Matlab code of the algorithm

Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?

One transfer learning approach that has gained wide popularity lately is attribute-based zero-shot learning. Its goal is to learn novel classes that were never seen during the training stage. The classical route towards realizing this goal is to incorporate prior knowledge, in the form of a semantic embedding of classes, and to learn to predict classes indirectly via their semantic attributes. Despite the amount of research devoted to this subject lately, no known algorithm has yet reported a predictive accuracy that could exceed the accuracy of supervised learning with very few training examples. For instance, the direct attribute prediction (DAP) algorithm, which forms a standard baseline for the task, is known to be as accurate as supervised learning when as few as two examples from each hidden class are used for training on some popular benchmark datasets! In this paper, we argue that this lack of significant results in the literature is not a coincidence; attribute-based zero-shot learning is fundamentally an ill-posed strategy. The key insight is the observation that the mechanical task of predicting an attribute is, in fact, quite different from the epistemological task of learning the “correct meaning” of the attribute itself. This renders attribute-based zero-shot learning fundamentally ill-posed. In more precise mathematical terms, attribute-based zero-shot learning is equivalent to the mirage goal of learning with respect to one distribution of instances, with the hope of being able to predict with respect to any arbitrary distribution.

The provided code displays the decision rule when applying binary relevance with linear SVM to the seven segment display in the zero-shot setting. Please enjoy the code. Note that the code uses the LIBLINEAR package at: https://www.csie.ntu.edu.tw/~cjlin/liblinear/. Thus, the LIBLINEAR package should be installed first.

Download the zip code of the algorithm

StrAP

A PCA-Based Change Detection Framework for Multidimensional Data Streams

Attack and Protect Classifiers

Adversarial Reverse Engineering and Classifier Robustness

More details can be found in the paper:

Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

MINE Research Group