Automating the Detection of Dispositional and Behavioural Phrasing

Thumbnail Image
Issue Date
Journal Title
Journal ISSN
Volume Title
Measures are taken in companies worldwide to improve their DEIB strategies and attract more diverse groups of employees. Creating an inviting atmosphere starts at the job advertisement. One contributing factor is how on one hand the job's activities and on the other hand the job's person requirements are worded: behavioural wording (concrete descriptions of the job including actions and details) has been linked to being generally understood as being less prone to subjective judgement than dispositional wording (abstract descriptions of desired characteristics of the applicant). The unintentional use of negative meta-stereotypes, which are stereotypes that members of social out-groups expect can be held against them, is expected to be hidden in dispositional wording. This idea relates to the Construal Level Theory (CLT), which poses that the mental representation people have of a subject becomes more abstract with an increased psychological distance towards it. In this thesis, the Linguistic Category Model (LCM) is applied to detect dispositionally and behaviourally phrased predicates through various subcategories, namely `Act', `Process', `Attitude + action', `Attitude', `Innate quality', and `Learned quality'. An annotation guide was written and improved by running ve annotation pilots. The nal guide was used to generate an annotated dataset of predicates labeled according to the subcategories. Two approaches were investigated for automating the detection of dispositional and behavioural predicates in text. The rst approach was three-step sequence tagging, where predicate boundaries were predicted with a rule-based system and the predicates were classi ed with a Decision Tree, Random Forest, Support Vector Machine, Na ve Bayes, Gradient boosting, LSTM on Word2Vec embeddings, and ne-tuned model of BERT and RoBERTa. The predicates were rst classi ed by relevance and then the relevant predicates were classi ed by their LCM category. BERT and RoBERTa gave the highest accuracy. The second approach was one-step sequence tagging, where labels were predicted in-text on token level with a ne-tuned model of BERT. After applying both approaches on two example texts, it appeared that the three-step sequence tagging model is better at predicting correct predicate boundaries, while the one-step sequence tagging model is suggested to give more accurate word-level label predictions. The word-level class prediction accuracy was respectively .69 and .77. Automating the detection and labeling of predicates as de ned by the LCM opens the opportunity for sociologists and psychologists to conduct a wide range of studies. In the domain of job advertisements, it would help to locate possibly excluding phrasing and to quantify the job advertisement's level of abstraction automatically. The presented approaches may inspire to develop tools that increase awareness of writers of job advertisements about their language use.
Faculteit der Sociale Wetenschappen