Automating the Detection of Dispositional and Behavioural Phrasing
Keywords
Loading...
Authors
Issue Date
2022-01-24
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Measures are taken in companies worldwide to improve their DEIB strategies
and attract more diverse groups of employees. Creating an inviting atmosphere
starts at the job advertisement. One contributing factor is how on one hand the
job's activities and on the other hand the job's person requirements are worded: behavioural
wording (concrete descriptions of the job including actions and details)
has been linked to being generally understood as being less prone to subjective
judgement than dispositional wording (abstract descriptions of desired characteristics
of the applicant). The unintentional use of negative meta-stereotypes, which
are stereotypes that members of social out-groups expect can be held against them,
is expected to be hidden in dispositional wording. This idea relates to the Construal
Level Theory (CLT), which poses that the mental representation people
have of a subject becomes more abstract with an increased psychological distance
towards it. In this thesis, the Linguistic Category Model (LCM) is applied to
detect dispositionally and behaviourally phrased predicates through various subcategories,
namely `Act', `Process', `Attitude + action', `Attitude', `Innate quality',
and `Learned quality'. An annotation guide was written and improved by running
ve annotation pilots. The nal guide was used to generate an annotated dataset
of predicates labeled according to the subcategories. Two approaches were investigated
for automating the detection of dispositional and behavioural predicates in
text. The rst approach was three-step sequence tagging, where predicate boundaries
were predicted with a rule-based system and the predicates were classi ed
with a Decision Tree, Random Forest, Support Vector Machine, Na ve Bayes, Gradient
boosting, LSTM on Word2Vec embeddings, and ne-tuned model of BERT
and RoBERTa. The predicates were rst classi ed by relevance and then the relevant
predicates were classi ed by their LCM category. BERT and RoBERTa gave
the highest accuracy. The second approach was one-step sequence tagging, where
labels were predicted in-text on token level with a ne-tuned model of BERT. After
applying both approaches on two example texts, it appeared that the three-step
sequence tagging model is better at predicting correct predicate boundaries, while
the one-step sequence tagging model is suggested to give more accurate word-level
label predictions. The word-level class prediction accuracy was respectively .69 and
.77. Automating the detection and labeling of predicates as de ned by the LCM
opens the opportunity for sociologists and psychologists to conduct a wide range
of studies. In the domain of job advertisements, it would help to locate possibly
excluding phrasing and to quantify the job advertisement's level of abstraction automatically.
The presented approaches may inspire to develop tools that increase
awareness of writers of job advertisements about their language use.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen