• Testing four skills in Japan

      Green, Anthony; University of Bedfordshire (Japan Society of English Language Education, 2016-02-01)
      This paper considers arguments for the testing of spoken language skills in Japan and the contribution the use of such tests might make to language education. The Japanese government, recognising the importance of spontaneous social interaction in English to participation in regional and global communities, mandates the development of all ‘four skills’ (Reading, Writing, Listening and Speaking) in schools. However, university entrance tests continue to emphasize the written language. Because they control access to opportunities, entrance tests tend to dominate teaching and learning. They are widely believed to encourage traditional forms of teaching and to inhibit speaking and listening activities in the classroom. Comprehensive testing of spoken language skills should, in contrast, encourage (or at least not discourage) the teaching and learning of these skills. On the other hand, testing spoken language skills also represents a substantial challenge. New organisational structures are needed to support new testing formats and these will be unfamiliar to all involved, resulting in an increased risk of system failures. Introducing radical change to any educational system is likely to provoke a reaction from those who benefit most from the status quo. For this reason, critics will be ready to exploit any perceived shortcomings to reverse innovative policies. Experience suggests that radical changes in approaches to testing are unlikely to deliver benefits for the education system unless they are well supported by teacher training, new materials and public relations initiatives. The introduction of spoken language tests is no doubt essential to the success of Japan’s language policies, but is not without risk and needs to be carefully integrated with other aspects of the education system.
    • Testing speaking skills: why and how?

      Nakatsuhara, Fumiyo; Inoue, Chihiro; University of Bedfordshire (2013-09-16)
    • Three current, interconnected concerns for writing assessment

      Hamp-Lyons, Liz (Elsevier Ltd, 2014-09-26)
      Editorial
    • Topic and background knowledge effects on performance in speaking assessment

      Khabbazbashi, Nahal (Sage, 2015-08-10)
      This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment.
    • Towards a model of multi-dimensional performance of C1 level speakers assessed in the Aptis Speaking Test

      Nakatsuhara, Fumiyo; Tavakoli, Parveneh; Awwad, Anas; British Council; University of Bedfordshire; University of Reading; Isra University, Jordan (British Council, 2019-09-14)
      This is a peer-reviewed online research report in the British Council Validation Series (https://www.britishcouncil.org/exam/aptis/research/publications/validation). Abstract The current study draws on the findings of Tavakoli, Nakatsuhara and Hunter’s (2017) quantitative study which failed to identify any statistically significant differences between various fluency features in speech produced by B2 and C1 level candidates in the Aptis Speaking test. This study set out to examine whether there were differences between other aspects of the speakers’ performance at these two levels, in terms of lexical and syntactic complexity, accuracy and use of metadiscourse markers, that distinguish the two levels. In order to understand the relationship between fluency and these other aspects of performance, the study employed a mixed-methods approach to analysing the data. The quantitative analysis included descriptive statistics, t-tests and correlational analyses of the various linguistic measures. For the qualitative analysis, we used a discourse analysis approach to examining the pausing behaviour of the speakers in the context the pauses occurred in their speech. The results indicated that the two proficiency levels were statistically different on measures of accuracy (weighted clause ratio) and lexical diversity (TTR and D), with the C1 level producing more accurate and lexically diverse output. The correlation analyses showed speed fluency was correlated positively with weighted clause ratio and negatively with length of clause. Speed fluency was also positively related to lexical diversity, but negatively linked with lexical errors. As for pauses, frequency of end-clause pauses was positively linked with length of AS-units. Mid-clause pauses also positively correlated with lexical diversity and use of discourse markers. Repair fluency correlated positively with length of clause, and negatively with weighted clause ratio. Repair measures were also negatively linked with number of errors per 100 words and metadiscourse marker type. The qualitative analyses suggested that the pauses mainly occurred a) to facilitate access and retrieval of lexical and structural units, b) to reformulate units already produced, and c) to improve communicative effectiveness. A number of speech exerpts are presented to illustrate these examples. It is hoped that the findings of this research offer a better understanding of the construct measured at B2 and C1 levels of the Aptis Speaking test, inform possible refinements of the Aptis Speaking rating scales, and enhance its rater training programme for the two highest levels of the test.
    • Towards a profile of the academic listener

      Field, John; University of Bedfordshire (2018-03-14)
    • Towards new avenues for the IELTS Speaking Test: insights from examiners’ voices

      Inoue, Chihiro; Khabbazbashi, Nahal; Lam, Daniel M. K.; Nakatsuhara, Fumiyo (IELTS Partners, 2021-02-19)
      This study investigated the examiners’ views on all aspects of the IELTS Speaking Test, namely, the test tasks, topics, format, interlocutor frame, examiner guidelines, test administration, rating, training and standardisation, and test use. The overall trends of the examiners’ views of these aspects of the test were captured by a large-scale online questionnaire, to which a total of 1203 examiners responded. Based on the questionnaire responses, 36 examiners were carefully selected for subsequent interviews to explore the reasons behind their views in depth. The 36 examiners were representative of a number of differing geographical regions and a range of views and experiences in examining and giving examiner training. While the questionnaire responses exhibited generally positive views from examiners on the current IELTS Speaking Test, the interview responses uncovered various issues that the examiners experienced and suggested potentially beneficial modifications. Many of the issues (e.g. potentially unsuitable topics, rigidity of interlocutor frames) were attributable to the huge candidature of the IELTS Speaking Test, which has vastly expanded since the test’s last revision in 2001, perhaps beyond the initial expectations of the IELTS Partners. This study synthesized the voices from examiners and insights from relevant literature, and incorporated guidelines checks we submitted to the IELTS Partners. This report concludes with a number of suggestions for potential changes in the current IELTS Speaking Test, so as to enhance its validity and accessibility in today’s ever globalising world.
    • Use of innovative technology in oral language assessment

      Nakatsuhara, Fumiyo; Berry, Vivien; ; University of Bedfordshire; British Council (Taylor & Francis, 2021-12-27)
      Editorial
    • Using assessment to promote learning: clarifying constructs, theories, and practices

      Leung, Cyril; Davison, C.; East, M.; Evans, M.; Liu, Y.; Hamp-Lyons, Liz; Purpura, J.E. (Georgetown University Press, 2017-11-22)
    • Using eye-tracking research to inform language test validity and design

      Bax, Stephen; Chan, Sathena Hiu Chong (Elsevier, 2019-02-08)
      This paper reports on a recent study which used eye-tracking methodology to examine the cognitive validity of two level-specific English Proficiency Reading Tests (CEFR B2 and C1). Using a mixed-methods approach, the study investigated test takers’ reading patterns on six item types using eye-tracking, a self-report checklist and stimulated recall interviews. Twenty L2 participants completed 30 items on a computer, with the Tobii X2 Eye Tracker recording their eye movements on screen. Immediately after they had completed each item type, they reported their reading processes by using a Reading Process Checklist. Eight students further participated in a stimulated recall interview while viewing video footage of their gaze patterns on the test. The findings indicate (1) the range of cognitive processes elicited by different reading item types at the two levels; and (2) the differences between stronger and weaker test takers' reading patterns on each item type. The implications of this study to reflect on some fundamental questions regarding the use of eye-tracking in language research are discussed. The paper concludes with recommendations for future research in these areas.
    • Using keystroke logging to understand writers’ processes on a reading-into-writing test

      Chan, Sathena Hiu Chong (Springer Open, 2017-05-19)
      Background Integrated reading-into-writing tasks are increasingly used in large-scale language proficiency tests. Such tasks are said to possess higher authenticity as they reflect real-life writing conditions better than independent, writing-only tasks. However, to effectively define the reading-into-writing construct, more empirical evidence regarding how writers compose from sources both in real-life and under test conditions is urgently needed. Most previous process studies used think aloud or questionnaire to collect evidence. These methods rely on participants’ perceptions of their processes, as well as their ability to report them. Findings This paper reports on a small-scale experimental study to explore writers’ processes on a reading-into-writing test by employing keystroke logging. Two L2 postgraduates completed an argumentative essay on computer. Their text production processes were captured by a keystroke logging programme. Students were also interviewed to provide additional information. Keystroke logging like most computing tools provides a range of measures. The study examined the students’ reading-into-writing processes by analysing a selection of the keystroke logging measures in conjunction with students’ final texts and interview protocols. Conclusions The results suggest that the nature of the writers’ reading-into-writing processes might have a major influence on the writer’s final performance. Recommendations for future process studies are provided.
    • Validating performance on writing test tasks

      Weir, Cyril J.; University of Bedfordshire (2013-07-11)
    • Validating speaking test rating scales through microanalysis of fluency using PRAAT

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie; University of Reading; University of Bedfordshire; St. Mary’s University (2017-07-06)
    • Validating two types of EAP reading-into-writing test tasks

      Chan, Sathena Hiu Chong; University of Bedfordshire (2013-07-11)
    • Video-conferencing speaking tests: do they measure the same construct as face-to-face tests?

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D.; ; University of Bedfordshire; British Council; Cambridge Assessment English (Routledge, 2021-08-23)
      This paper investigates the comparability between the video-conferencing and face-to-face modes of the IELTS Speaking Test in terms of scores and language functions generated by test-takers. Data were collected from 10 trained IELTS examiners and 99 test-takers who took two speaking tests under face-to-face and video-conferencing conditions. Many-facet Rasch Model (MFRM) analysis of test scores indicated that the delivery mode did not make any meaningful difference to test-takers’ scores. An examination of language functions revealed that both modes equally elicited the same language functions except asking for clarification. More test-takers made clarification requests in the video-conferencing mode (63.3%) than in the face-to-face mode (26.7%). Drawing on the findings, as well as practical implications, we extend emerging thinking about video-conferencing speaking assessment and the associated features of this modality in its own right.
    • Vocabulary explanations in beginning-level adult ESOL classroom interactions: a conversation analysis perspective

      Tai, Kevin W.H.; Khabbazbashi, Nahal; University College London; University of Bedfordshire (Linguistics and Education, Elsevier, 2019-07-19)
      Re­cent stud­ies have ex­am­ined the in­ter­ac­tional or­gan­i­sa­tion of vo­cab­u­lary ex­pla­na­tions (VEs) in sec­ond lan­guage (L2) class­rooms. Nev­er­the­less, more work is needed to bet­ter un­der­stand how VEs are pro­vided inthese class­rooms, par­tic­u­larly in be­gin­ning-level Eng­lish for Speak­ers of Other Lan­guages (ESOL) class­room con­texts where stu­dents have dif­fer­ent first lan­guages (L1s) and lim­ited Eng­lish pro­fi­ciency and theshared lin­guis­tic re­sources be­tween the teacher and learn­ers are typ­i­cally lim­ited. Based on a cor­pus of be­gin­ning-level adult ESOL lessons, this con­ver­sa­tion-an­a­lytic study of­fers in­sights into how VEs are in­ter­ac­tion­ally man­aged in such class­rooms. Our find­ings con­tribute to the cur­rent lit­er­a­ture in shed­ding light on thena­ture of VEs in be­gin­ning-level ESOL class­rooms.
    • Washback and writing assessment

      Green, Anthony; University of Bedfordshire (2012-03-15)
    • Washback in language assessment

      Green, Anthony (Wiley-Blackwell, 2012-11-05)
      “Washback” (alternatively“backwash”) is a term used in education to describe the influence, whether beneficial or damaging, of an assessment on the teaching and learning that precedes and prepares for that assessment. Over the past thirty years, washback, often conceived as one instance of “impact” or the range of effects, that assessment may have on society more generally, has become established as a popular topic for applied linguistics research. Studies have covered a variety of contexts from national and international tests administered to millions of test takers to the classroom assessment practices of individual teachers. Researchers have employed a range of methods including small-scale observational studies and much more extensive questionnaire surveys, often making use of mixed methods to access different perspectives on the issues. These have revealed washback to be a complex phenomenon, closely associated with and affected by established practices, beliefs and attitudes. Although test developers increasingly recognize the importance of washback and impact in evaluating assessment use, it remains to be fully integrated into standard validation practice.