Prediction of Age Classification in the Parental Advisory based on the Subtitles of Movies

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
The age classification and indication of the categories sex, violence, fear, drugs & alcohol, discrimination and profanity, in television series and movies, are really important for the parent(s) to be able to check it for their child(ren). Young children are not suited to watch movies with violence, sex and the other categories mentioned above. Over the whole world there are many different age classification systems. From these systems I used the Dutch system for the researches in this article. The first thing I studied was if there is a relation between the different categories like violence, sex, etc. and the Dutch age classification. With the cost sensitive classifier that used the naive bayes multinomial updatable (NB classifier), the percentage that was correctly classified was 45,2%. Because the classes were balanced and the percentage of the correctly classified part was higher than 20%, I could conclude that the categories have an influence on the age classification. In this was studied if the parental advisory of movies could be correctly predicted by a variety of classifiers based on only the word frequency of the subtitles from movies. The result was that the NB classifier was the best algorithm for both the imbalanced and balanced classes. The false positive rate in the balanced classes was 0,28 and is pretty low. The percent correctly classified was 52,28% and 0,52 in proportion. It is important for the false positive rate to be low because classifying a movie for 16 years as a a movie for 6 years can harm the children of 6 years old. On the other hand, misclassifying a movie for 6 years as a movie for 16 years can do no harm to the younger and older children. The percent correctly classified is much higher than the chance rate of 20%. So I concluded that it was possible to predict the parental advisory of movies based on only the word frequency of the subtitles with an accuracy of 52,28%. These results could be influenced for example by classes which were too imbalanced to be able to resolve the imbalance issue. The algorithms I chose to compare could also influence the results. This comparison and the addition of the prediction of the different categories like sex, violence, and the other categories. can be done in future research.
Faculteit der Sociale Wetenschappen