• Towards new avenues for the IELTS Speaking Test: insights from examiners’ voices

      Inoue, Chihiro; Khabbazbashi, Nahal; Lam, Daniel M. K.; Nakatsuhara, Fumiyo (IELTS Partners, 2021-02-19)
      This study investigated the examiners’ views on all aspects of the IELTS Speaking Test, namely, the test tasks, topics, format, interlocutor frame, examiner guidelines, test administration, rating, training and standardisation, and test use. The overall trends of the examiners’ views of these aspects of the test were captured by a large-scale online questionnaire, to which a total of 1203 examiners responded. Based on the questionnaire responses, 36 examiners were carefully selected for subsequent interviews to explore the reasons behind their views in depth. The 36 examiners were representative of a number of differing geographical regions and a range of views and experiences in examining and giving examiner training. While the questionnaire responses exhibited generally positive views from examiners on the current IELTS Speaking Test, the interview responses uncovered various issues that the examiners experienced and suggested potentially beneficial modifications. Many of the issues (e.g. potentially unsuitable topics, rigidity of interlocutor frames) were attributable to the huge candidature of the IELTS Speaking Test, which has vastly expanded since the test’s last revision in 2001, perhaps beyond the initial expectations of the IELTS Partners. This study synthesized the voices from examiners and insights from relevant literature, and incorporated guidelines checks we submitted to the IELTS Partners. This report concludes with a number of suggestions for potential changes in the current IELTS Speaking Test, so as to enhance its validity and accessibility in today’s ever globalising world.
    • Opening the black box: exploring automated speaking evaluation

      Khabbazbashi, Nahal; Xu, Jing; Galaczi, Evelina D. (Springer, 2021-02-10)
      The rapid advances in speech processing and machine learning technologies have attracted language testers’ strong interest in developing automated speaking assessment in which candidate responses are scored by computer algorithms rather than trained human examiners. Despite its increasing popularity, automatic evaluation of spoken language is still shrouded in mystery and technical jargon, often resembling an opaque "black box" that transforms candidate speech to scores in a matter of minutes. Our chapter explicitly problematizes this lack of transparency around test score interpretation and use and asks the following questions: What do automatically derived scores actually mean? What are the speaking constructs underlying them? What are some common problems encountered in automated assessment of speaking? And how can test users evaluate the suitability of automated speaking assessment for their proposed test uses? In addressing these questions, the purpose of our chapter is to explore the benefits, problems, and caveats associated with automated speaking assessment touching on key theoretical discussions on construct representation and score interpretation as well as practical issues such as the infrastructure necessary for capturing high quality audio and the difficulties associated with acquiring training data. We hope to promote assessment literacy by providing the necessary guidance for users to critically engage with automated speaking assessment, pose the right questions to test developers, and ultimately make informed decisions regarding the fitness for purpose of automated assessment solutions for their specific learning and assessment contexts.
    • Don't turn a deaf ear: a case for assessing interactive listening

      Lam, Daniel M. K.; ; University of Bedfordshire (Oxford University Press, 2021-01-11)
      The reciprocal nature of spoken interaction means that participants constantly alternate between speaker and listener roles. However, listener or recipient actions – also known as interactive listening (IL) – are somewhat underrepresented in language tests. In conventional listening tests, they are not directly assessed. In speaking tests, they have often been overshadowed by an emphasis on production features or subsumed under broader constructs such as interactional competence. This paper is an effort to represent the rich IL phenomena that can be found in peer interactive speaking assessments, where the candidate-candidate format and discussion task offer opportunities to elicit and assess IL. Taking a close look at candidate discourse and non-verbal actions through a conversation analytic approach, the analysis focuses on three IL features: 1) listenership displays, 2) contingent responses, and 3) collaborative completions, and unpacks their relative strength in evidencing listener understanding. This paper concludes by making a case for revisiting the role of interactive listening, calling for more explicit inclusion of IL in L2 assessment as well as pedagogy.
    • Exploring language assessment and testing: language in action

      Green, Anthony (Routledge, 2020-12-30)
      Exploring Language Assessment and Testing offers a straightforward and accessible introduction that starts from real-world experiences and uses practical examples to introduce the reader to the academic field of language assessment and testing. Extensively updated, with additional features such as reader tasks (with extensive commentaries from the author), a glossary of key terms and an annotated further reading section, this second edition provides coverage of recent theoretical and technological developments and explores specific purposes for assessment. Including concrete models and examples to guide readers into the relevant literature, this book also offers practical guidance for educators and researchers on designing, developing and using assessments. Providing an inclusive and impartial survey of both classroom-based assessment by teachers and larger-scale testing, this is an indispensable introduction for postgraduate and advanced undergraduate students studying Language Education, Applied Linguistics and Language Assessment.
    • Comparing writing proficiency assessments used in professional medical registration: a methodology to inform policy and practice

      Chan, Sathena Hiu Chong; Taylor, Lynda; University of Bedfordshire (Elsevier, 2020-10-13)
      Internationally trained doctors wishing to register and practise in an English-speaking country typically have to demonstrate that they can communicate effectively in English, including writing proficiency. Various English language proficiency (ELP) tests are available worldwide and are used for such licensing purposes. This means that medical registration bodies face the question of which test(s) will meet their needs, ideally reflecting the demands of their professional environment. This article reports a mixed-methods study to survey the policy and practice of health-care registration organisations in the UK and worldwide. The study aimed to identify ELP tests that were, or could be, considered as suitable for medical registration purposes and to understand the differences between them. The paper discusses what the study revealed about the function and comparability of different writing tests used in professional registration as well as the complex criteria a professional body may prioritise when selecting a test. Although the original study was completed in 2015, the paper takes account of subsequent changes in policy and practice. It offers a practical methodology and worked example which may be of interest and value to other researchers, language testers and policymakers as they face challenges in selecting and making comparisons across tests.
    • Repeated test-taking and longitudinal test score analysis: editorial

      Green, Anthony; Van Moere, Alistair; University of Bedfordshire; MetaMetrics Inc. (Sage, 2020-09-27)
    • Comparing rating modes: analysing live, audio, and video ratings of IELTS Speaking Test performances

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; (Taylor & Francis, 2020-08-26)
      This mixed methods study compared IELTS examiners’ scores when assessing spoken performances under live and two ‘non-live’ testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers’ performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the many-facet Rasch model (MFRM). For all three modes, examiners provided written justifications for their ratings, and verbal reports were also collected to gain insights into examiner perceptions towards performance under the audio and video conditions. Results showed that, for all rating criteria, audio ratings were significantly lower than live and video ratings. Examiners noticed more negative performance features under the two non-live rating conditions, compared to the live condition. However, richer information about test-taker performance in the video mode appeared to cause raters to rely less on such negative evidence than audio raters when awarding scores. Verbal report data showed that having visual information in the video-rating mode helped examiners to understand what the test-takers were saying, to comprehend better what test-takers were communicating using non-verbal means, and to understand with greater confidence the source of test-takers’ hesitation, pauses, and awkwardness.
    • Applying the socio-cognitive framework: gathering validity evidence during the development of a speaking test

      Nakatsuhara, Fumiyo; Dunlea, Jamie; University of Bedfordshire; British Council (UCLES/Cambridge University Press, 2020-06-18)
      This chapter describes how Weir’s (2005; further elaborated in Taylor (Ed) 2011) socio-cognitive framework for validating speaking tests guided two a priori validation studies of the speaking component of the Test of English for Academic Purposes (TEAP) in Japan. In this chapter, we particularly reflect upon the academic achievements of Professor Cyril J Weir, in terms of: • the effectiveness and value of the socio-cognitive framework underpinning the development of the TEAP Speaking Test while gathering empirical evidence of the construct underlying a speaking test for the target context • his contribution to developing early career researchers and extending language testing expertise in the TEAP development team.
    • Placing construct definition at the heart of assessment: research, design and a priori validation

      Chan, Sathena Hiu Chong; Latimer, Nicola (Cambridge University Press, 2020-04-01)
      In this chapter, we will first highlight Professor Cyril Weir’s major research into the nature of academic reading. Using one of his test development pro- jects as an example, we will describe how the construct of academic reading was operationalised in the local context of a British university by theoretical construct definition together with empirical analyses of students’ reading patterns on the test through eye-tracking. As we progress through the chapter we reflect on how Weir’s various research projects fed into the development of the test and a new method of analysing eye-tracking data in relation to different types of reading.
    • Aspects of fluency across assessed levels of speaking proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (Wiley, 2020-01-25)
      Recent research in second language acquisition suggests that a number of speed, breakdown, repair and composite measures reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterize fluency at each assessed level of proficiency, and which can consistently distinguish one level from the next. This study investigated fluency in 32 speakers’ performing four tasks of the British Council’s Aptis Speaking test, which were awarded four different levels of proficiency (CEFR A2-C1). Using PRAAT, the performances were analysed for various aspects of utterance fluency across different levels of proficiency. The results suggest that speed and composite measures consistently distinguish fluency from the lowest to upper-intermediate levels (A2-B2), and many breakdown measures differentiate between the lowest level (A2) and the rest of the proficiency groups, with a few differentiating between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different levels suggest that a more complex process is at play. The findings imply that a detailed micro-analysis of fluency offers a more reliable understanding of the construct and its relationship with assessment of proficiency.
    • A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • The origins and adaptations of English as a school subject

      Goodwyn, Andrew (Cambridge University Press, 2019-12-31)
      This chapter will consider the particular manifestation of English as a ‘school subject’, principally in the country called England and using some small space for significant international comparisons, and it will mainly focus on the secondary school version. We will call this phenomenon School Subject English (SSE). The chapter will argue that historically SSE has gone through phases of development and adaptation, some aspects of these changes inspired by new theories and concepts and by societal change, some others, especially more recently, entirely reactive to external impositions (for an analysis of the current position of SSE, see Roberts, this volume). This chapter considers SSE to have been ontologically ‘expanded’ between 1870 and (about) 1990, increasing the ambition and scope of the ‘subject’ and the emancipatory ideology of its teachers. This ontological expansion was principally a result of adding ‘models’ of SSE, models that each emphasise different epistemologies of what counts as significant knowledge, and can only exist in a dynamic tension. In relation to this volume, SSE has always incorporated close attention to language but only very briefly (1988–1992) has something akin to Applied Linguistics had any real influence in the secondary classroom. However, with varying emphasis historically, there has been attention (the Adult Needs/Skills model, see later) to the conventions of language, especially ‘secretarial’ issues of spelling and punctuation, some understanding of grammar, and a focus on notions of Standard English, in writing and in speech; but these have never been the driving ideology of SSE. Of the two conceptual giants ‘Language’ and ‘Literature’, it is the latter that has mattered most over those 120 years.
    • Scaling and scheming: the highs and lows of scoring writing

      Green, Anthony; University of Bedfordshire (2019-12-04)
    • Research and practice in assessing academic English: the case of IELTS

      Taylor, Lynda; Saville, N. (Cambridge University Press, 2019-12-01)
      Test developers need to demonstrate they have premised their measurement tools on a sound theoretical framework which guides their coverage of appropriate language ability constructs in the tests they offer to the public. This is essential for supporting claims about the validity and usefulness of the scores generated by the test.  This volume describes differing approaches to understanding academic reading ability that have emerged in recent decades and goes on to develop an empirically grounded framework for validating tests of academic reading ability.  The framework is then applied to the IELTS Academic reading module to investigate a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event.  The authors demonstrate how a systematic understanding and application of the framework and its components can help test developers to operationalise their tests so as to fulfill the validity requirements for an academic reading test.  The book provides:   An up to date review of the relevant literature on assessing academic reading  A clear and detailed specification of the construct of academic reading  An evaluation of what constitutes an adequate representation of the construct of academic reading for assessment purposes  A consideration of the nature of academic reading in a digital age and its implications for assessment research and test development  The volume is a rich source of information on all aspects of testing academic reading ability.  Examination boards and other institutions who need to validate their own academic reading tests in a systematic and coherent manner, or who wish to develop new instruments for measuring academic reading, will find it of interest, as will researchers and graduate students in the field of language assessment, and those teachers preparing students for IELTS (and similar tests) or involved in English for Academic Purpose programmes. 
    • Reflecting on the past, embracing the future

      Hamp-Lyons, Liz; University of Bedfordshire (Elsevier, 2019-10-14)
      In the Call for Papers for this anniversary volume of Assessing Writing, the Editors described the goal as “to trace the evolution of ideas, questions, and concerns that are key to our field, to explain their relevance in the present, and to look forward by exploring how these might be addressed in the future” and they asked me to contribute my thoughts. As the Editor of Assessing Writing between 2002 and 2017—a fifteen-year period—I realised from the outset that this was a very ambitious goal, l, one that no single paper could accomplish. Nevertheless, it seemed to me an opportunity to reflect on my own experiences as Editor, and through some of those experiences, offer a small insight into what this journal has done (and not done) to contribute to the debate about the “ideas, questions and concerns”; but also, to suggest some areas that would benefit from more questioning and thinking in the future. Despite the challenges of the task, I am very grateful to current Editors Martin East and David Slomp for the opportunity to reflect on these 25 years and to view them, in part, through the lens provided by the five articles appearing in this anniversary volume.
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • Towards a model of multi-dimensional performance of C1 level speakers assessed in the Aptis Speaking Test

      Nakatsuhara, Fumiyo; Tavakoli, Parveneh; Awwad, Anas; British Council; University of Bedfordshire; University of Reading; Isra University, Jordan (British Council, 2019-09-14)
      This is a peer-reviewed online research report in the British Council Validation Series (https://www.britishcouncil.org/exam/aptis/research/publications/validation). Abstract The current study draws on the findings of Tavakoli, Nakatsuhara and Hunter’s (2017) quantitative study which failed to identify any statistically significant differences between various fluency features in speech produced by B2 and C1 level candidates in the Aptis Speaking test. This study set out to examine whether there were differences between other aspects of the speakers’ performance at these two levels, in terms of lexical and syntactic complexity, accuracy and use of metadiscourse markers, that distinguish the two levels. In order to understand the relationship between fluency and these other aspects of performance, the study employed a mixed-methods approach to analysing the data. The quantitative analysis included descriptive statistics, t-tests and correlational analyses of the various linguistic measures. For the qualitative analysis, we used a discourse analysis approach to examining the pausing behaviour of the speakers in the context the pauses occurred in their speech. The results indicated that the two proficiency levels were statistically different on measures of accuracy (weighted clause ratio) and lexical diversity (TTR and D), with the C1 level producing more accurate and lexically diverse output. The correlation analyses showed speed fluency was correlated positively with weighted clause ratio and negatively with length of clause. Speed fluency was also positively related to lexical diversity, but negatively linked with lexical errors. As for pauses, frequency of end-clause pauses was positively linked with length of AS-units. Mid-clause pauses also positively correlated with lexical diversity and use of discourse markers. Repair fluency correlated positively with length of clause, and negatively with weighted clause ratio. Repair measures were also negatively linked with number of errors per 100 words and metadiscourse marker type. The qualitative analyses suggested that the pauses mainly occurred a) to facilitate access and retrieval of lexical and structural units, b) to reformulate units already produced, and c) to improve communicative effectiveness. A number of speech exerpts are presented to illustrate these examples. It is hoped that the findings of this research offer a better understanding of the construct measured at B2 and C1 levels of the Aptis Speaking test, inform possible refinements of the Aptis Speaking rating scales, and enhance its rater training programme for the two highest levels of the test.
    • Research and practice in assessing academic reading: the case of IELTS

      Weir, Cyril J.; Chan, Sathena Hiu Chong (Cambridge University Press, 2019-08-29)
      The focus for attention in this volume is the reading component of the IELTS Academic module, which is principally used for admissions purposes into ter- tiary-level institutions throughout the world (see Davies 2008 for a detailed history of the developments in EAP testing leading up to the current IELTS). According to the official website (www.cambridgeenglish.org/exams-and- tests/ielts/test-format/), there are three reading passages in the Academic Reading Module with a total of c.2,150–2,750 words. Individual tasks are not timed. Texts are taken from journals, magazines, books, and newspapers. All the topics are of general interest and the texts have been written for a non-specialist audience. The readings are intended to be about issues that are appropriate to candidates who will enter postgraduate or undergraduate courses. At least one text will contain detailed logical argument. One of the texts may contain non-verbal materials such as graphs, illustrations or diagrams. If there are technical terms, which candidates may not know in the text, then a glossary is provided. The texts and questions become more difficult through the paper. A number of specific critical questions are addressed in applying the socio- cognitive validation framework to the IELTS Academic Reading Module: * Are the cognitive processes required to complete the IELTS Reading test tasks appropriate and adequate in their coverage? (Focus on cognitive validity in Chapter 4.) * Are the contextual characteristics of the test tasks and their administration appropriate and fair to the candidates who are taking them? (Focus on context validity in Chapter 5.) * What effects do the test and test scores have on various stakeholders? (Focus on consequential validity in Chapter 6.) * What external evidence is there that the test is fair? (Focus on criterion- related validity in Chapter 7.)
    • Vocabulary explanations in beginning-level adult ESOL classroom interactions: a conversation analysis perspective

      Tai, Kevin W.H.; Khabbazbashi, Nahal; University College London; University of Bedfordshire (Linguistics and Education, Elsevier, 2019-07-19)
      Re­cent stud­ies have ex­am­ined the in­ter­ac­tional or­gan­i­sa­tion of vo­cab­u­lary ex­pla­na­tions (VEs) in sec­ond lan­guage (L2) class­rooms. Nev­er­the­less, more work is needed to bet­ter un­der­stand how VEs are pro­vided inthese class­rooms, par­tic­u­larly in be­gin­ning-level Eng­lish for Speak­ers of Other Lan­guages (ESOL) class­room con­texts where stu­dents have dif­fer­ent first lan­guages (L1s) and lim­ited Eng­lish pro­fi­ciency and theshared lin­guis­tic re­sources be­tween the teacher and learn­ers are typ­i­cally lim­ited. Based on a cor­pus of be­gin­ning-level adult ESOL lessons, this con­ver­sa­tion-an­a­lytic study of­fers in­sights into how VEs are in­ter­ac­tion­ally man­aged in such class­rooms. Our find­ings con­tribute to the cur­rent lit­er­a­ture in shed­ding light on thena­ture of VEs in be­gin­ning-level ESOL class­rooms.