Classifying YouTube videos as kid-friendly based on their subtitles

dc.contributor.advisorGrootjen, F.A.
dc.contributor.authorHendrix, G.J.M.
dc.date.issued2019-06-01
dc.description.abstractYouTube, one of the most popular websites in the world, is a beloved form of entertainment among children. However, not all of its content is suitable for children. In 2015, YouTube proposed a solution to this problem in the form of an app, YouTube Kids. The intent of the app is to provide a kid-friendly version of YouTube, ltering out all of the inappropriate content and therefore only displaying videos that are suitable for kids. It is apparent that this ltering process is awed, because inappropriate videos often show up in the app, leading to criticism from parents. In this project, I aim to create a classi er that is able to predict whether a YouTube video is suitable for YouTube Kids, using the automatically generated subtitles of the videos. After gathering and preprocessing the subtitles of 500 videos, I create a classi er that is based on the word frequencies of signi cant words from the subtitles. The best results were obtained by creating a multinomial Na ve Bayes classi er. This classi er reached an accuracy of 0.744 and an FP rate of 0.128. Furthermore, I investigate if information about the lexical diversity, the type-token ratio, improves the prediction. From my research it appears that it does not lead to an improvement. However, depending on the classi cation algorithm that is used, lexical diversity might be of value. Better classi cation results can probably be obtained by combining this classi er with classi ers that are based on other information, such as video images, comments, and information about the user that uploaded the video.en_US
dc.embargo.lift10000-01-01
dc.embargo.typePermanent embargoen_US
dc.identifier.urihttps://theses.ubn.ru.nl/handle/123456789/12553
dc.language.isoenen_US
dc.thesis.facultyFaculteit der Sociale Wetenschappenen_US
dc.thesis.specialisationBachelor Artificial Intelligenceen_US
dc.thesis.studyprogrammeArtificial Intelligenceen_US
dc.thesis.typeBacheloren_US
dc.titleClassifying YouTube videos as kid-friendly based on their subtitlesen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Hendrix, G.-s4591097.pdf
Size:
257.14 KB
Format:
Adobe Portable Document Format