• A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • Establishing test form and individual task comparability: a case study of a semi-direct speaking test

      Weir, Cyril J.; Wu, Jessica R.W.; University of Luton; Language Training and Testing Center, Taiwan (SAGE, 2006-04-01)
      Examination boards are often criticized for their failure to provide evidence of comparability across forms, and few such studies are publicly available. This study aims to investigate the extent to which three forms of the General English Proficiency Test Intermediate Speaking Test (GEPTS-I) are parallel in terms of two types of validity evidence: parallel-forms reliability and content validity. The three trial test forms, each containing three different task types (read-aloud, answering questions and picture description), were administered to 120 intermediate-level EFL learners in Taiwan. The performance data from the different test forms were analysed using classical procedures and Multi-Faceted Rasch Measurement (MFRM). Various checklists were also employed to compare the tasks in different forms qualitatively in terms of content. The results showed that all three test forms were statistically parallel overall and Forms 2 and 3 could also be considered parallel at the individual task level. Moreover, sources of variation to account for the variable difficulty of tasks in Form 1 were identified by the checklists. Results of the study provide insights for further improvement in parallel-form reliability of the GEPTS-I at the task level and offer a set of methodological procedures for other exam boards to consider. © 2006 Edward Arnold (Publishers) Ltd.
    • The relative significance of syntactic knowledge and vocabulary breadth in the prediction of reading comprehension test performance

      Shiotsu, Toshiko; Weir, Cyril J.; Kurume University, Japan; University of Bedfordshire (SAGE, 2007-01-01)
      In the componential approach to modelling reading ability, a number of contributory factors have been empirically validated. However, research on their relative contribution to explaining performance on second language reading tests is limited. Furthermore, the contribution of knowledge of syntax has been largely ignored in comparison with the attention focused on vocabulary. This study examines the relative contribution of knowledge of syntax and knowledge of vocabulary to L2 reading in two pilot studies in different contexts - a heterogeneous population studying at the tertiary level in the UK and a homogenous undergraduate group in Japan - followed by a larger main study, again involving a homogeneous Japanese undergraduate population. In contrast with previous findings in the literature, all three studies offer support for the relative superiority of syntactic knowledge over vocabulary knowledge in predicting performance on a text reading comprehension test. A case is made for the robustness of structural equation modelling compared to conventional regression in accounting for the differential reliabilities of scores on the measures employed. © 2007 SAGE Publications.
    • Repeated test-taking and longitudinal test score analysis: editorial

      Green, Anthony; Van Moere, Alistair; University of Bedfordshire; MetaMetrics Inc. (Sage, 2020-09-27)
    • Topic and background knowledge effects on performance in speaking assessment

      Khabbazbashi, Nahal (Sage, 2015-08-10)
      This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment.
    • What counts as ‘responding’? Contingency on previous speaker contribution as a feature of interactional competence

      Lam, Daniel M. K. (Sage, 2018-05-10)
      The ability to interact with others has gained recognition as part of the L2 speaking construct in the assessment literature and in high- and low-stakes speaking assessments. This paper first presents a review of the literature on interactional competence (IC) in L2 learning and assessment. It then discusses a particular feature – producing responses contingent on previous speaker contribution – that emerged as a de facto construct feature of IC oriented to by both candidates and examiners within the school-based group speaking assessment in the Hong Kong Diploma of Secondary Education (HKDSE) English Language Examination. Previous studies have, similarly, argued for the importance of ‘responding to’ or linking one’s own talk to previous speakers’ contributions as a way of demonstrating comprehension of co-participants’ talk. However, what counts as such a response has yet to be explored systematically. This paper presents a conversation analytic study of the candidate discourse in the assessed group interactions, identifying three conversational actions through which student-candidates construct contingent responses to co-participants. The thick description about the nature of contingent responses lays the groundwork for further empirical investigations on the relevance of this IC feature and its proficiency implications.