Classifying YouTube videos as kid-friendly based on their subtitles
Classifying YouTube videos as kid-friendly based on their subtitles
Keywords
No Thumbnail Available
Authors
Date
2019-06-01
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
YouTube, one of the most popular websites in the world, is a beloved form of
entertainment among children. However, not all of its content is suitable for children.
In 2015, YouTube proposed a solution to this problem in the form of an app, YouTube
Kids. The intent of the app is to provide a kid-friendly version of YouTube, ltering
out all of the inappropriate content and therefore only displaying videos that are suitable
for kids. It is apparent that this ltering process is
awed, because inappropriate
videos often show up in the app, leading to criticism from parents. In this project, I
aim to create a classi er that is able to predict whether a YouTube video is suitable
for YouTube Kids, using the automatically generated subtitles of the videos. After
gathering and preprocessing the subtitles of 500 videos, I create a classi er that is
based on the word frequencies of signi cant words from the subtitles. The best results
were obtained by creating a multinomial Na ve Bayes classi er. This classi er reached
an accuracy of 0.744 and an FP rate of 0.128. Furthermore, I investigate if information
about the lexical diversity, the type-token ratio, improves the prediction. From my
research it appears that it does not lead to an improvement. However, depending on
the classi cation algorithm that is used, lexical diversity might be of value. Better classi
cation results can probably be obtained by combining this classi er with classi ers
that are based on other information, such as video images, comments, and information
about the user that uploaded the video.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen