Fast Representatives Search as an Initialization for Scalable Sparse Subspace Clustering
Keywords
Loading...
Authors
Issue Date
2015-08-27
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
High-dimensional data clustering is a difficult task due to the sparsity, correlated
features and specific subspace structures of high-dimensional data. However, the
self-expressiveness property states that data points can be most efficiently represented
as data points from their own subspace. This property is successfully used
by Elhamifar and Vidal (2013) to cluster high-dimensional data with the sparse
subspace clustering (SSC) algorithm. However, the computational complexity of
SSC is too high to be applied on datasets with large number of data points.
Scalable sparse subspace clustering (SSSC) uses an in-sample out-of-sample
approach to speed SSC up. Two steps were taken in this research to improve
the random initialization of the in-sample set of SSSC. First, the computational
complexity of an algorithm for representative selection called sparse representatives
modeling selection (SMRS) was improved using a divide-and-conquer strategy.
This new algorithm was called hierarchical sparse representatives (HSR). Secondly,
the representatives from SMRS (or HSR) were used to initialize the in-sample set
smartly.
Theoretical and empirical results indicated that SMRS and HSR had similar
results. The representatives from both algorithms overlapped and the importance
given to them correlated. However, using representatives or non-representatives
as an initialization of the in-sample set of SSSC did not significantly change its
performance.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen