• Integrated writing and its correlates: a meta-analysis

      Chan, Sathena Hiu Chong; Yamashita, J. (Elsevier, 2022-07-26)
      Integrated tasks are increasing in popularity, either replacing or complementing writing- only independent tasks in writing assessments. This shift has generated many research interests to investigate the underlying construct and features of integrated writing (IW) performances. However, due to the complexity of the IW construct, there are conflicting findings about whether and the extent to which various language skills and IW text features correlate to IW scores. To understand the construct of IW, we conducted a meta-analysis to synthesize correlation coefficients between scores of IW performances and (1) other language skills and (2) text quality features of IW. We also examined factors that may moderate the correlation of IW scores with these two groups of correlates. Consequently, (1)reading and writing skills showed stronger correlations than listening to IW scores; and (2) text length had a strongest correlation, followed by source integration, organization and syntactic complexity, with a smallest correlation of lexical complexity. Several IW task features affected the magnitude of correlations. The results supported the view that IW is an independent construct, albeit related, from other language skills and IW task features may affect the construct of IW.
    • Eye-tracking L2 students taking online multiple-choice reading tests: benefits and challenges

      Latimer, Nicola; Chan, Sathena Hiu Chong (Cranmore Publishing, 2022-04-10)
      Recently, there has been a marked increase in language testing research involving eye-tracking. It appears to offer a useful methodology for examining cognitive validity in language tests, i.e., the extent to which the mental processes that a language test elicits from test takers resemble those that they would employ in the target language use domains. This article reports on a recent study which examined reading processes of test takers at different proficiency levels on a reading proficiency test. Using a mixed-methods approach, the study collected cognitive validity evidence through eye-tracking and stimulated recall interviews. The study investigated whether there are differences in reading behaviour among test takers at CEFR B1, B2 and C1 levels on an online reading task. The main findings are reported and the implications of the findings are discussed to reflect on some fundamental questions regarding the use of eye-tracking in language testing research.
    • Towards the new construct of academic English in the digital age

      Khabbazbashi, Nahal; Chan, Sathena Hiu Chong; Clark, Tony; University of Bedfordshire; Cambridge University Press and Assessment (Oxford University Press, 2022-03-28)
      The increasing use of digital educational technologies in Higher Education (HE) means that the nature of communication may be shifting. Assessments of English for Academic Purposes (EAP) need to be reconceptualised accordingly, to reflect the new and complex ways in which language is used in HE. With a view to inform EAP assessments, our study set out to identify key trends related to Academic English using a scoping review of the literature. Findings revealed two major trends: (a) a shift towards multimodal communication which has in turn resulted in the emergence of new types of academic assignments, multimodal genres, and the need for students to acquire new skills to operate within this multimodal arena; and (b) the limitations of existing skills-based approaches to assessment and the need to move towards integrated skills assessment. We discuss the implications of these findings for EAP assessments.
    • Book review: Assessing speaking in context: expanding the construct and its applications

      Taylor, Lynda (SAGE, 2022-02-16)
      review of Salaberry MR, Burch AR (2021) Assessing speaking in context: expanding the construct and its applications, Bristol: Multilingual Matters, ISBN 9781788923804
    • The design and validation of an online speaking test for young learners in Uruguay: challenges and innovations

      Khabbazbashi, Nahal; Nakatsuhara, Fumiyo; Inoue, Chihiro; Kaplan, Gabriela; Green, Anthony; University of Bedfordshire; Plan Ceibal (Cranmore Publishing on behalf of the International TESOL Union, 2022-02-10)
      This research presents the development of an online speaking test of English for students at the end of primary and beginning of secondary school education in state schools in Uruguay. Following the success of the Plan Ceibal one computer-tablet per child initiative, there was a drive to further utilize technology to improve the language ability of students, particularly in speaking, where the majority of students are at CEFR levels pre-A1 and A1. The national concern over a lack of spoken communicative skills amongst students led to a decision to develop a new speaking test, specifically tailored to local needs. This paper provides an overview of the speaking test development and validation project designed with the following objectives in mind: to establish, track, and report annually learners’ achievements against the Common European Framework of Reference for Languages (CEFR) targeting CEFR levels pre-A1 to A2, to inform teaching and learning, and to promote speaking practice in classrooms. Results of a three-phase mixed-methods study involving small-scale and large-scale trials with learners and examiners as well as a CEFRlinking exercise with expert panelists will be reported. Different sources of evidence will be brought together to build a validity argument for the test. The paper will also focus on some of the challenges involved in assessing young learners and discuss how design decisions, local knowledge and expertise, and technological innovations can be used to address such challenges with implications for other similar test development projects.
    • Validation of a large-scale task-based test: functional progression in dialogic speaking performance

      Inoue, Chihiro; Nakatsuhara, Fumiyo (Springer Nature, 2022-02-07)
      A list of language functions is usually included in task-based speaking test specifications as a useful tool to describe target output language of test-takers, to define TLU domains, and to specify task demands. Such lists are, however, often constructed intuitively and they also tend to focus solely on the types of function to be elicited and ignore the ways in which each function is realised across different levels of proficiency (Green, 2012). The study reported in this chapter is a part of a larger-scale test revision project for Trinity’s Integrated Skills in English (ISE) spoken examinations. Analysing audio-recordings of 32 performances on the ISE spoken examination both quantitatively and qualitatively, the aims of this study are (a) to empirically validate lists of language functions in the test specifications of the operational, large-scale, task-based examinations, (b) to explore the usefulness and potential of function analysis as a test task validation method, and (c) to contribute to a better understanding of varied test-taker language that is used to generate language functions.
    • Use of innovative technology in oral language assessment

      Nakatsuhara, Fumiyo; Berry, Vivien; ; University of Bedfordshire; British Council (Taylor & Francis, 2021-12-27)
    • Assessing speaking

      Nakatsuhara, Fumiyo; Khabbazbashi, Nahal; Inoue, Chihiro; University of Bedfordshire (Routledge, 2021-12-16)
      In this chapter on assessing speaking, the history of speaking assessment is briefly traced in terms of the various ways in which speaking constructs have been defined and diversified over the past century. This is followed by a discussion of elicitation tasks, test delivery modes, rating methods, and scales that offered opportunities and/or presented challenges in operationalising different constructs of speaking and providing feedback. Several methods utilised in researching speaking assessment are then considered. Informed by recent research and advances in technology, the chapter provides recommendations for practice in both high-stakes and low-stakes contexts.
    • On topic validity in speaking tests

      Khabbazbashi, Nahal; University of Bedfordshire (Cambridge University Press, 2021-11-22)
      Topics are often used as a key speech elicitation method in performance-based assessments of spoken language, and yet the validity and fairness issues surrounding topics are surprisingly under-researched. Are different topics ‘equivalent’ or ‘parallel’? Can some topics bias against or favour individuals or groups of individuals? Does background knowledge of topics have an impact on performance? Might the content of test taker speech affect their scores – and perhaps more importantly, should it? Grounded in the real-world assessment context of IELTS, this volume draws on original data as well as insights from empirical and theoretical research to address these questions against the backdrop of one of the world’s most high-stakes language tests. This volume provides: * an up-to-date review of theoretical and empirical literature related to topic and background knowledge effects on second language performance * an accessible and systematic description of a mixed methods research study with explanations of design, analysis, and interpretation considerations at every stage * a comprehensive and coherent approach for building a validity argument in a given assessment context. The volume also contributes to critiques of recent models of communicative competence with an over-reliance on linguistic features at the expense of more complex aspects of communication, by arguing for an expansion of current definitions of the speaking construct emphasising the role of content of speech as an important – yet often neglected – feature.
    • Exploring the potential for assessing interactional and pragmatic competence in semi-direct speaking tests

      Nakatsuhara, Fumiyo; May, Lyn; Inoue, Chihiro; Willcox-Ficzere, Edit; Westbrook, Carolyn; Spiby, Richard; University of Bedfordshire; Queensland University of Technology; Oxford Brookes University; British Council (British Council, 2021-11-11)
      To explore the potential of a semi-direct speaking test to assess a wider range of communicative language ability, the researchers developed four semi-direct speaking tasks – two designed to elicit features of interactional competence (IC) and two designed to elicit features of pragmatic competence (PC). The four tasks, as well as one benchmarking task, were piloted with 48 test-takers in China and Austria whose proficiency ranged from CEFR B1 to C. A post-test feedback survey was administered to all test-takers, after which selected test-takers were interviewed. A total of 184 task performances were analysed to identify interactional moves utilised by test-takers across three proficiency groups (i.e., B1, B2 and C). Data indicated that test-takers at higher levels employed a wider variety of interactional moves. They made use of concurring concessions and counter views when seeking to persuade a (hypothetical) conversational partner to change opinions in the IC tasks, and they projected upcoming requests and made face-related statements in the PC tasks, seemingly to pre-empt a conversational partner’s negative response to the request. The test-takers perceived the tasks to be highly authentic and found the video input useful in understanding the target audience of simulated interactions.
    • Video-conferencing speaking tests: do they measure the same construct as face-to-face tests?

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D.; ; University of Bedfordshire; British Council; Cambridge Assessment English (Routledge, 2021-08-23)
      This paper investigates the comparability between the video-conferencing and face-to-face modes of the IELTS Speaking Test in terms of scores and language functions generated by test-takers. Data were collected from 10 trained IELTS examiners and 99 test-takers who took two speaking tests under face-to-face and video-conferencing conditions. Many-facet Rasch Model (MFRM) analysis of test scores indicated that the delivery mode did not make any meaningful difference to test-takers’ scores. An examination of language functions revealed that both modes equally elicited the same language functions except asking for clarification. More test-takers made clarification requests in the video-conferencing mode (63.3%) than in the face-to-face mode (26.7%). Drawing on the findings, as well as practical implications, we extend emerging thinking about video-conferencing speaking assessment and the associated features of this modality in its own right.
    • The effects of extended planning time on candidates’ performance, processes and strategy use in the lecture listening-into-speaking tasks of the TOEFL iBT Test

      Inoue, Chihiro; Lam, Daniel M. K.; Educational Testing Service (Wiley, 2021-06-21)
      This study investigated the effects of two different planning time conditions (i.e., operational [20 s] and extended length [90 s]) for the lecture listening-into-speaking tasks of the TOEFL iBT® test for candidates at different proficiency levels. Seventy international students based in universities and language schools in the United Kingdom (35 at a lower level; 35 at a higher level) participated in the study. The effects of different lengths of planning time were examined in terms of (a) the scores given by ETS-certified raters; (b) the quality of the speaking performances characterized by accurately reproduced idea units and the measures of complexity, accuracy, and fluency; and (c) self-reported use of cognitive and metacognitive processes and strategies during listening, planning, and speaking. The results found neither a statistically significant main effect of the length of planning time nor an interaction between planning time and proficiency on the scores or on the quality of the speaking performance. There were several cognitive and metacognitive processes and strategies where significantly more engagement was reported under the extended planning time, which suggests enhanced cognitive validity of the task. However, the increased engagement in planning did not lead to any measurable improvement in the score. Therefore, in the interest of practicality, the results of this study provide justifications for the operational length of planning time for the lecture listening-into-speaking tasks in the speaking section of the TOEFL iBT test.
    • Preparing for admissions tests in English

      Yu, Guoxing; Green, Anthony; University of Bristol; University of Bedfordshire (Taylor & Francis, 2021-05-06)
      Test preparation for admissions to education programmes has always been a contentious issue (Anastasi, 1981; Crocker, 2003; Messick, 1982; Powers, 2012). For Crocker (2006), ‘No activity in educational assessment raises more instructional, ethical, and validity issues than preparation for large-scale, high-stakes tests.’ (p. 115). Debate has often centred around the effectiveness of preparation and how it affects the validity of test score interpretations; equity and fairness of access to opportunity; and impacts on learning and teaching (Yu et al., 2017). A focus has often been preparation for tests originally designed for domestic students, for example, SATs (e.g., Alderman & Powers, 1980; Appelrouth et al., 2017; Montgomery & Lilly, 2012; Powers, 1993; Powers & Rock, 1999; Sesnowitz et al., 1982) and state-wide tests (e.g., Firestone et al., 2004; Jäger et al., 2012), but the increasing internationalisation of higher education has added a new dimension. To enrol in higher education programmes which use English as the medium of instruction, increasing numbers of international students whose first language is not English are now taking English language tests, or academic specialist tests administered in English, or both. The papers in this special issue concern how students prepare for these tests and the roles in this process of the tests themselves and of the organisations that provide them.
    • Towards new avenues for the IELTS Speaking Test: insights from examiners’ voices

      Inoue, Chihiro; Khabbazbashi, Nahal; Lam, Daniel M. K.; Nakatsuhara, Fumiyo (IELTS Partners, 2021-02-19)
      This study investigated the examiners’ views on all aspects of the IELTS Speaking Test, namely, the test tasks, topics, format, interlocutor frame, examiner guidelines, test administration, rating, training and standardisation, and test use. The overall trends of the examiners’ views of these aspects of the test were captured by a large-scale online questionnaire, to which a total of 1203 examiners responded. Based on the questionnaire responses, 36 examiners were carefully selected for subsequent interviews to explore the reasons behind their views in depth. The 36 examiners were representative of a number of differing geographical regions and a range of views and experiences in examining and giving examiner training. While the questionnaire responses exhibited generally positive views from examiners on the current IELTS Speaking Test, the interview responses uncovered various issues that the examiners experienced and suggested potentially beneficial modifications. Many of the issues (e.g. potentially unsuitable topics, rigidity of interlocutor frames) were attributable to the huge candidature of the IELTS Speaking Test, which has vastly expanded since the test’s last revision in 2001, perhaps beyond the initial expectations of the IELTS Partners. This study synthesized the voices from examiners and insights from relevant literature, and incorporated guidelines checks we submitted to the IELTS Partners. This report concludes with a number of suggestions for potential changes in the current IELTS Speaking Test, so as to enhance its validity and accessibility in today’s ever globalising world.
    • Opening the black box: exploring automated speaking evaluation

      Khabbazbashi, Nahal; Xu, Jing; Galaczi, Evelina D. (Springer, 2021-02-10)
      The rapid advances in speech processing and machine learning technologies have attracted language testers’ strong interest in developing automated speaking assessment in which candidate responses are scored by computer algorithms rather than trained human examiners. Despite its increasing popularity, automatic evaluation of spoken language is still shrouded in mystery and technical jargon, often resembling an opaque "black box" that transforms candidate speech to scores in a matter of minutes. Our chapter explicitly problematizes this lack of transparency around test score interpretation and use and asks the following questions: What do automatically derived scores actually mean? What are the speaking constructs underlying them? What are some common problems encountered in automated assessment of speaking? And how can test users evaluate the suitability of automated speaking assessment for their proposed test uses? In addressing these questions, the purpose of our chapter is to explore the benefits, problems, and caveats associated with automated speaking assessment touching on key theoretical discussions on construct representation and score interpretation as well as practical issues such as the infrastructure necessary for capturing high quality audio and the difficulties associated with acquiring training data. We hope to promote assessment literacy by providing the necessary guidance for users to critically engage with automated speaking assessment, pose the right questions to test developers, and ultimately make informed decisions regarding the fitness for purpose of automated assessment solutions for their specific learning and assessment contexts.
    • Don't turn a deaf ear: a case for assessing interactive listening

      Lam, Daniel M. K.; ; University of Bedfordshire (Oxford University Press, 2021-01-11)
      The reciprocal nature of spoken interaction means that participants constantly alternate between speaker and listener roles. However, listener or recipient actions – also known as interactive listening (IL) – are somewhat underrepresented in language tests. In conventional listening tests, they are not directly assessed. In speaking tests, they have often been overshadowed by an emphasis on production features or subsumed under broader constructs such as interactional competence. This paper is an effort to represent the rich IL phenomena that can be found in peer interactive speaking assessments, where the candidate-candidate format and discussion task offer opportunities to elicit and assess IL. Taking a close look at candidate discourse and non-verbal actions through a conversation analytic approach, the analysis focuses on three IL features: 1) listenership displays, 2) contingent responses, and 3) collaborative completions, and unpacks their relative strength in evidencing listener understanding. This paper concludes by making a case for revisiting the role of interactive listening, calling for more explicit inclusion of IL in L2 assessment as well as pedagogy.
    • Exploring language assessment and testing: language in action

      Green, Anthony (Routledge, 2020-12-30)
      Exploring Language Assessment and Testing offers a straightforward and accessible introduction that starts from real-world experiences and uses practical examples to introduce the reader to the academic field of language assessment and testing. Extensively updated, with additional features such as reader tasks (with extensive commentaries from the author), a glossary of key terms and an annotated further reading section, this second edition provides coverage of recent theoretical and technological developments and explores specific purposes for assessment. Including concrete models and examples to guide readers into the relevant literature, this book also offers practical guidance for educators and researchers on designing, developing and using assessments. Providing an inclusive and impartial survey of both classroom-based assessment by teachers and larger-scale testing, this is an indispensable introduction for postgraduate and advanced undergraduate students studying Language Education, Applied Linguistics and Language Assessment.
    • Comparing writing proficiency assessments used in professional medical registration: a methodology to inform policy and practice

      Chan, Sathena Hiu Chong; Taylor, Lynda; University of Bedfordshire (Elsevier, 2020-10-13)
      Internationally trained doctors wishing to register and practise in an English-speaking country typically have to demonstrate that they can communicate effectively in English, including writing proficiency. Various English language proficiency (ELP) tests are available worldwide and are used for such licensing purposes. This means that medical registration bodies face the question of which test(s) will meet their needs, ideally reflecting the demands of their professional environment. This article reports a mixed-methods study to survey the policy and practice of health-care registration organisations in the UK and worldwide. The study aimed to identify ELP tests that were, or could be, considered as suitable for medical registration purposes and to understand the differences between them. The paper discusses what the study revealed about the function and comparability of different writing tests used in professional registration as well as the complex criteria a professional body may prioritise when selecting a test. Although the original study was completed in 2015, the paper takes account of subsequent changes in policy and practice. It offers a practical methodology and worked example which may be of interest and value to other researchers, language testers and policymakers as they face challenges in selecting and making comparisons across tests.
    • Repeated test-taking and longitudinal test score analysis: editorial

      Green, Anthony; Van Moere, Alistair; University of Bedfordshire; MetaMetrics Inc. (Sage, 2020-09-27)
    • Comparing rating modes: analysing live, audio, and video ratings of IELTS Speaking Test performances

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; (Taylor & Francis, 2020-08-26)
      This mixed methods study compared IELTS examiners’ scores when assessing spoken performances under live and two ‘non-live’ testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers’ performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the many-facet Rasch model (MFRM). For all three modes, examiners provided written justifications for their ratings, and verbal reports were also collected to gain insights into examiner perceptions towards performance under the audio and video conditions. Results showed that, for all rating criteria, audio ratings were significantly lower than live and video ratings. Examiners noticed more negative performance features under the two non-live rating conditions, compared to the live condition. However, richer information about test-taker performance in the video mode appeared to cause raters to rely less on such negative evidence than audio raters when awarding scores. Verbal report data showed that having visual information in the video-rating mode helped examiners to understand what the test-takers were saying, to comprehend better what test-takers were communicating using non-verbal means, and to understand with greater confidence the source of test-takers’ hesitation, pauses, and awkwardness.