Detecting and locating assets at Dutch train stations

Keywords

Loading...
Thumbnail Image

Issue Date

2022-07-19

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This thesis investigates what methods are best suited to detect objects in monovision camera footage, and how to determine the distance between the detected objects and the camera. Previous research used the YOLOv5s model because it is accurate, fast, can detect objects at 140 frames per second, and can be trained on new data well. Therefore this research uses YOLOv5s. YOLOv5s uses regular bounding boxes, unlike the YOLOv5m Orient Bounding Box (OBB) model, which rotates the bounding boxes to fit the rotation of the objects. The Detectron2 model is used because it is accurate and fast like YOLO, but it also uses both bounding boxes and masks to denote detected objects. Rotated bounding boxes and segmentation masks in general have a smaller background to object ratio and provide better input for the distance calculation formula. This formula uses the pixel height of objects in data and compares this with the real world size of the object. If the direction the camera faces is known as well, the exact location in the real world can be determined. The results indicate that the YOLOv5s model is better at recognizing objects than YOLOv5m OBB and Detectron2, but worse to use as input in the formula for calculating the distance. Detectron2 offers the best solution to detect objects and determine the distance and offers the lowest average difference between real and predicted distance. YOLOv5m OBB shows potential but should be trained on more powerful hardware to achieve better performance at detecting objects correctly. Furthermore, the training and testing dataset is small for object detection models. All three models would benefit from more data to train on and better distributed data. Obtaining representative data and many data is a challenge in this domain due to the hardware and privacy. Therefore, synthetic data of trash cans was generated in Unity to investigate if they can be substituted to train object detection models on. The Detectron2 model was trained on synthetic data and tested on both synthetic and real world data. The model recognizes trash cans in synthetic data 98% correctly and 65% in real world data. Synthetic data could be beneficial when real world data is hard to obtain, and can be time saving since the data can be generated with annotations in any format.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen