Prediction of Age Classification in the Parental Advisory based on the Subtitles of Movies
Keywords
No Thumbnail Available
Authors
Issue Date
2018-06-24
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
The age classification and indication of the categories sex, violence, fear, drugs & alcohol, discrimination
and profanity, in television series and movies, are really important for the parent(s) to be able to check
it for their child(ren). Young children are not suited to watch movies with violence, sex and the other categories
mentioned above. Over the whole world there are many different age classification systems. From
these systems I used the Dutch system for the researches in this article. The first thing I studied was if
there is a relation between the different categories like violence, sex, etc. and the Dutch age classification.
With the cost sensitive classifier that used the naive bayes multinomial updatable (NB classifier), the percentage
that was correctly classified was 45,2%. Because the classes were balanced and the percentage of
the correctly classified part was higher than 20%, I could conclude that the categories have an influence on
the age classification. In this was studied if the parental advisory of movies could be correctly predicted by
a variety of classifiers based on only the word frequency of the subtitles from movies. The result was that
the NB classifier was the best algorithm for both the imbalanced and balanced classes. The false positive
rate in the balanced classes was 0,28 and is pretty low. The percent correctly classified was 52,28% and
0,52 in proportion. It is important for the false positive rate to be low because classifying a movie for 16
years as a a movie for 6 years can harm the children of 6 years old. On the other hand, misclassifying a
movie for 6 years as a movie for 16 years can do no harm to the younger and older children. The percent
correctly classified is much higher than the chance rate of 20%. So I concluded that it was possible to
predict the parental advisory of movies based on only the word frequency of the subtitles with an accuracy
of 52,28%. These results could be influenced for example by classes which were too imbalanced to be
able to resolve the imbalance issue. The algorithms I chose to compare could also influence the results.
This comparison and the addition of the prediction of the different categories like sex, violence, and the
other categories. can be done in future research.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen