Fast Representatives Search as an Initialization for Scalable Sparse Subspace Clustering

Keywords
Thumbnail Image
Date
2015-08-27
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
High-dimensional data clustering is a difficult task due to the sparsity, correlated features and specific subspace structures of high-dimensional data. However, the self-expressiveness property states that data points can be most efficiently represented as data points from their own subspace. This property is successfully used by Elhamifar and Vidal (2013) to cluster high-dimensional data with the sparse subspace clustering (SSC) algorithm. However, the computational complexity of SSC is too high to be applied on datasets with large number of data points. Scalable sparse subspace clustering (SSSC) uses an in-sample out-of-sample approach to speed SSC up. Two steps were taken in this research to improve the random initialization of the in-sample set of SSSC. First, the computational complexity of an algorithm for representative selection called sparse representatives modeling selection (SMRS) was improved using a divide-and-conquer strategy. This new algorithm was called hierarchical sparse representatives (HSR). Secondly, the representatives from SMRS (or HSR) were used to initialize the in-sample set smartly. Theoretical and empirical results indicated that SMRS and HSR had similar results. The representatives from both algorithms overlapped and the importance given to them correlated. However, using representatives or non-representatives as an initialization of the in-sample set of SSSC did not significantly change its performance.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen