Abstract
Speaking test scores are increasingly being used to make high-stakes decisions (for employment, immigration, university admissions) about learners in many countries. Ensuring that these scores reflect a learner’s skill fairly and accurately is critical. This mixed-methods study seeks to strengthen the socio-cognitive framework for test validation (Chalhoub-Deville & O’Sullivan, 2020; Taylor, 2011; Weir, 2005) and deepen our understanding of the complexities involved in deriving scores from L2 speaking tests. Adopting an interactionalist perspective, the research considers interview-format speaking tests as co-constructed events between candidate, examiner and rater. The research examines how certain elements of scoring validity (rater characteristics of ‘Agreeableness’, ‘Extraversion’ and ‘Test Experience Level’) change how raters perceive or rate spoken performances and modulate their severity. Native-speaker, English teachers from universities across Japan (n = 86) rated 12 video-recorded speaking test performances and afterwards completed a personality instrument. A Hierarchical Multiple Regression showed that ‘Test Level Experience’ and ‘Agreeableness’ contributed significantly to the regression model, F (6, 79) = 3.126, p = .019 together accounting for 19% of the variation in rater severity. These predictors were negatively correlated with rater severity; higher levels predicted more lenient ratings. Trait ‘Extraversion’ explained an additional 4% of the variation and this was significant, F (7, 78) = 3.426, p = .039. ‘Extraversion’ was positively correlated with rater severity; higher levels predicted more severe ratings. Finally, all raters provided written commentary on their rating procedures and three raters took part in Stimulated Recall Interviews. Thematic analyses of the two types of qualitative data suggested that lenient (experienced, agreeable, introvert) raters perceive different aspects of examiner performance to more severe (inexperienced, disagreeable, extravert) raters and these perceptions sometimes impacted how they cognitively approached the task of rating. In some instances, the differing perceptions and cognitive approaches may have impacted raters’ final proficiency scores. The research findings offer suggestions for updating our understanding of the co-constructed nature of spoken interaction as well as the scoring validity component of the socio-cognitive framework. The study also makes practical recommendations on future rater training procedures that incorporate the findings from this study.Citation
Roger, A. (2022) 'Examining the Role of Rater Personality in L2 Speaking Tests'. PhD thesis. University of Bedfordshire.Publisher
University of BedfordshireType
Thesis or dissertationLanguage
enDescription
A thesis submitted to the University of Bedfordshire in fulfilment of the requirements for the degree of Doctor of Philosophy.Collections
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International