Classifying YouTube videos as kid-friendly based on their subtitles

Keywords

No Thumbnail Available

Issue Date

2019-06-01

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

YouTube, one of the most popular websites in the world, is a beloved form of entertainment among children. However, not all of its content is suitable for children. In 2015, YouTube proposed a solution to this problem in the form of an app, YouTube Kids. The intent of the app is to provide a kid-friendly version of YouTube, ltering out all of the inappropriate content and therefore only displaying videos that are suitable for kids. It is apparent that this ltering process is awed, because inappropriate videos often show up in the app, leading to criticism from parents. In this project, I aim to create a classi er that is able to predict whether a YouTube video is suitable for YouTube Kids, using the automatically generated subtitles of the videos. After gathering and preprocessing the subtitles of 500 videos, I create a classi er that is based on the word frequencies of signi cant words from the subtitles. The best results were obtained by creating a multinomial Na ve Bayes classi er. This classi er reached an accuracy of 0.744 and an FP rate of 0.128. Furthermore, I investigate if information about the lexical diversity, the type-token ratio, improves the prediction. From my research it appears that it does not lead to an improvement. However, depending on the classi cation algorithm that is used, lexical diversity might be of value. Better classi cation results can probably be obtained by combining this classi er with classi ers that are based on other information, such as video images, comments, and information about the user that uploaded the video.

Description

Citation

Supervisor

Faculty

Faculteit der Sociale Wetenschappen