Classifying YouTube videos as kid-friendly based on their subtitles

No Thumbnail Available
Issue Date
Journal Title
Journal ISSN
Volume Title
YouTube, one of the most popular websites in the world, is a beloved form of entertainment among children. However, not all of its content is suitable for children. In 2015, YouTube proposed a solution to this problem in the form of an app, YouTube Kids. The intent of the app is to provide a kid-friendly version of YouTube, ltering out all of the inappropriate content and therefore only displaying videos that are suitable for kids. It is apparent that this ltering process is awed, because inappropriate videos often show up in the app, leading to criticism from parents. In this project, I aim to create a classi er that is able to predict whether a YouTube video is suitable for YouTube Kids, using the automatically generated subtitles of the videos. After gathering and preprocessing the subtitles of 500 videos, I create a classi er that is based on the word frequencies of signi cant words from the subtitles. The best results were obtained by creating a multinomial Na ve Bayes classi er. This classi er reached an accuracy of 0.744 and an FP rate of 0.128. Furthermore, I investigate if information about the lexical diversity, the type-token ratio, improves the prediction. From my research it appears that it does not lead to an improvement. However, depending on the classi cation algorithm that is used, lexical diversity might be of value. Better classi cation results can probably be obtained by combining this classi er with classi ers that are based on other information, such as video images, comments, and information about the user that uploaded the video.
Faculteit der Sociale Wetenschappen