Practical Deep Learning for Person Re-Identification in Video Surveillance Systems

Keywords
Loading...
Thumbnail Image
Issue Date
2017-09-22
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In an existing particle filter tracking system at Thales Research & Technology it is required for a deep neural network to perform person re-identification on image pairs, maximizing the TPR and minimizing the FPR, in a situation where we have a relatively small gallery size. Current available literature on the use of deep learning in person re-identification focus on improving large dataset benchmarks with large gallery sizes, while improving the CMC ranking on individual benchmarks using very deep neural networks. While CMC ranking is a good test of the discriminative ability of a network, not much is known about the TPR and FPR performance of smaller deep neural networks in situations with small gallery sizes. In this study we found that as the gallery size increases, the ranking performance of the network goes down while the TPR and FPR remain similar. We show that a relatively small neural network can achieve similar performance on benchmark as very deep neural network, assuming that the gallery is small enough (≤300 IDs). We achieve a performance of 50% on ViPER, 56% on PRID450 and 21% on Market1501. Most literature use simple distance metrics such as the L2 or Euclidean distance between the extracted image features to perform metric learning. However, neural metric learning has not been a topic of investigation, which if used, can results in a more natural approach to person reidentification by having a neural network learn the boundary between (mis)match classes. In this study we compare the Euclidean distance to various neural metric learners and found that the Euclidean distance consistently outperforms the neural metric learners. This study also investigates various training schemes for training a neural network on a mixture distribution consisting of multiple small datasets. However, no consistent improvements could be found compared to training on a single distribution. We did find that large and noisy datasets tend to generalize well to new environments and that the dataset mean can be used to gauge the ability for generalization. Finally we investigate the effectiveness of image data compared to video data, by comparing the performance of SCNN trained on images to the performance of S3DCNN trained on videos. We found that S3DCNN outperforms SCNN in all cases. We also found that when the dataset becomes larger (≥1400 instances), the difference between the performances of SCNN and S3DCNN decreases.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen