The software below is for scientific purpose ONLY. Comments are very welcome!
Matlab programs for generating a user-specified K clusters with message passing (Affinity Propagation). The algorithm was published in the paper:
Xiangliang Zhang, Wei Wang, Kjetil Nørvåg, Michèle Sebag, "K-AP: Generating Specified K Clusters by Efficient Affinity Propagation", ICDM 2010, Sydney, Australia, December 14-17, 2010. File Size 30.4KB.
The programs are distributed under GNU Lesser General Public License(LGPL).
Matlab code for StrAP: stream clustering with AP (Affinity Propagation), adding an online mechanism of adaption (1412KB). The algorithm has been published in ECML-PKDD 2008 and SIGKDD 2009.
Xiangliang Zhang, Cyril Furtlehner, Michèle Sebag, "Data streaming with Affinity propagation". Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2008), Antwerp, Belgium, pp. 628-643, Lecture Notes in Computer Science 5212, Springer 2008, September 15-19, 2008
Xiangliang Zhang, Cyril Furtlehner, Julien Perez, Cécile Germain, Michèle Sebag, " Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming". Proceedings of 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2009), pp. 987-996, Paris, France, June 28 –July 1, 2009
The programs are made in INRIA and thus are the property of INRIA. The programs are distributed under GNU Lesser General Public License(LGPL).
Detecting changes in multidimensional data streams is an important and challenging task. In unsupervised change detection, changes are usually detected by comparing the distribution in a current (test) window with a reference window. It is thus essential to design divergence metrics and density estimators for comparing the data distributions, which are mostly done for univariate data. Detecting changes in multidimensional data streams brings difficulties to density estimation and comparisons. In this paper, we propose a framework for detecting changes in multidimensional data streams based on Principal Component Analysis (PCA), which is used for projecting data into a lower-dimensional space, thus facilitating density estimation and change-score calculations. The proposed framework also has advantages over existing approaches by reducing computational costs with an efficient density estimator, promoting the change-score calculation by introducing effective divergence metrics, and by minimizing the efforts required from users on the threshold parameter setting by using the Page-Hinkley test.
More details can be found in the paper:
Abdulhakim A Qahtan, Basma Harbi, Suojin Wang, Xiangliang Zhang, "A PCA-Based Change Detection Framework for Multidimensional Data Streams". In the proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining - KDD 2015.
More results are demonstrated at: https://sites.google.com/site/pcachangedetection/home
Attack and Protect Classifiers
Adversarial Reverse Engineering and Classifier Robustness
Attack a Classifier:
In security-sensitive applications, e.g., spam filters and intrusion detection systems, the deployed classification algorithms can be attacked by adversaries through generating exploratory attacks such as evasion and reverse engineering. For example, an attacker can probe the classifier with queries in order to reveal some confidential information about the training dataset that was used by the system or model the classifier's decision boundary. How to construct artificial queries from scratch? Query synthesis is a branch of active learning for generating queries in order to reveal sensitive information about the true decision boundary.
The objective of this study is to learn a deterministic noise-free halfspace quite efficiently via query synthesis.
The algorithm was published in the paper:
Ibrahim M Alabdulmohsin, Xin Gao, Xiangliang Zhang, "Efficient Active Learning of Halfspaces via Query Synthesis". In the proceedings of Twenty-Ninth AAAI Conference on Artificial Intelligence - AAAI 2015.
Protect a Classifier:
Under such adversarial environments, adversaries can generate exploratory attacks against the defender such as evasion and reverse engineering. We investigate the use of randomization as a suitable strategy for mitigating their risk. In particular, we derive a semidefinite programming (SDP) formulation for learning a distribution of classifiers subject to the constraint that any single classifier picked at random from such distribution provides reliable predictions with a high probability. We analyze the tradeoff between the variance of the distribution and its predictive accuracy and establish that one can almost always incorporate randomization with large variance without incurring a loss inaccuracy.
More details can be found in the paper:
Ibrahim M Alabdulmohsin, Xin Gao, Xiangliang Zhang, "Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering". Proceedings of the 23rd ACM International Conference on Information and Knowledge Management- CIKM 2014.
Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?
The provided code displays the decision rule when applying binary relevance with linear SVM to the seven segment display in the zero-shot setting. Please enjoy the code. Note that the code uses the LIBLINEAR package at: https://www.csie.ntu.edu.tw/~cjlin/liblinear/. Thus, the LIBLINEAR package should be installed first.