Detecting and locating assets at Dutch train stations
Keywords
Loading...
Authors
Issue Date
2022-07-19
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This thesis investigates what methods are best suited to detect objects in monovision
camera footage, and how to determine the distance between the detected objects and
the camera. Previous research used the YOLOv5s model because it is accurate, fast,
can detect objects at 140 frames per second, and can be trained on new data well.
Therefore this research uses YOLOv5s. YOLOv5s uses regular bounding boxes, unlike
the YOLOv5m Orient Bounding Box (OBB) model, which rotates the bounding boxes
to fit the rotation of the objects. The Detectron2 model is used because it is accurate
and fast like YOLO, but it also uses both bounding boxes and masks to denote detected
objects. Rotated bounding boxes and segmentation masks in general have a smaller
background to object ratio and provide better input for the distance calculation formula.
This formula uses the pixel height of objects in data and compares this with the real
world size of the object. If the direction the camera faces is known as well, the exact
location in the real world can be determined. The results indicate that the YOLOv5s
model is better at recognizing objects than YOLOv5m OBB and Detectron2, but worse
to use as input in the formula for calculating the distance. Detectron2 offers the best
solution to detect objects and determine the distance and offers the lowest average
difference between real and predicted distance. YOLOv5m OBB shows potential but
should be trained on more powerful hardware to achieve better performance at detecting
objects correctly. Furthermore, the training and testing dataset is small for object
detection models. All three models would benefit from more data to train on and better
distributed data. Obtaining representative data and many data is a challenge in this
domain due to the hardware and privacy. Therefore, synthetic data of trash cans was
generated in Unity to investigate if they can be substituted to train object detection
models on. The Detectron2 model was trained on synthetic data and tested on both
synthetic and real world data. The model recognizes trash cans in synthetic data 98%
correctly and 65% in real world data. Synthetic data could be beneficial when real world
data is hard to obtain, and can be time saving since the data can be generated with
annotations in any format.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen
