A Scalable Framework for Data-Driven Subspace Representation and Clustering

Published in Pattern Recognition Letters, 2019

Eunwoo Kim, Minsik Lee, and Songhwai Oh, “A Scalable Framework for Data-Driven Subspace Representation and Clustering”, Pattern Recognition Letters, vol. 125, pp. 742-749, July 2019.

Abstract: This paper considers the problem of subspace clustering which segments data samples into their underlying subspaces. While existing subspace clustering algorithms have been successfully applied to various problems, they are not applicable for large-scale or streaming data due to their expensive computational cost. As a remedy, we propose a unified scalable pipeline to reduce the complexity of all sub-tasks in subspace clustering. We first present a robust incremental summary representation, assuming that a subspace can be represented by sparse factors. Based on the summary representation, we propose a fully scalable learning pipeline by integrating the affinity learning task with post-processing and spectral clustering, such that the overall time complexity is linear in the number of samples. Moreover, the proposed framework is integrated with kernel methods for nonlinear subspace clustering. An extensive set of experimental studies demonstrate that the proposed framework gives an order-of-magnitude speed-up over existing subspace clustering baselines with competitive clustering performance.

[Paper]