Fast Representatives Search as an Initialization for Scalable Sparse Subspace Clustering

Keywords

Loading...
Thumbnail Image

Issue Date

2015-08-27

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

High-dimensional data clustering is a difficult task due to the sparsity, correlated features and specific subspace structures of high-dimensional data. However, the self-expressiveness property states that data points can be most efficiently represented as data points from their own subspace. This property is successfully used by Elhamifar and Vidal (2013) to cluster high-dimensional data with the sparse subspace clustering (SSC) algorithm. However, the computational complexity of SSC is too high to be applied on datasets with large number of data points. Scalable sparse subspace clustering (SSSC) uses an in-sample out-of-sample approach to speed SSC up. Two steps were taken in this research to improve the random initialization of the in-sample set of SSSC. First, the computational complexity of an algorithm for representative selection called sparse representatives modeling selection (SMRS) was improved using a divide-and-conquer strategy. This new algorithm was called hierarchical sparse representatives (HSR). Secondly, the representatives from SMRS (or HSR) were used to initialize the in-sample set smartly. Theoretical and empirical results indicated that SMRS and HSR had similar results. The representatives from both algorithms overlapped and the importance given to them correlated. However, using representatives or non-representatives as an initialization of the in-sample set of SSSC did not significantly change its performance.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen