• An application of AUA to examining the potential washback of a new test of English for university entrance

      Nakamura, Keita; Green, Anthony; Eiken Foundation of Japan; University of Bedfordshire (2013-11-17)
    • Applying the socio-cognitive framework: gathering validity evidence during the development of a speaking test

      Nakatsuhara, Fumiyo; Dunlea, Jamie; University of Bedfordshire; British Council (UCLES/Cambridge University Press, 2020-06-18)
      This chapter describes how Weir’s (2005; further elaborated in Taylor (Ed) 2011) socio-cognitive framework for validating speaking tests guided two a priori validation studies of the speaking component of the Test of English for Academic Purposes (TEAP) in Japan. In this chapter, we particularly reflect upon the academic achievements of Professor Cyril J Weir, in terms of: • the effectiveness and value of the socio-cognitive framework underpinning the development of the TEAP Speaking Test while gathering empirical evidence of the construct underlying a speaking test for the target context • his contribution to developing early career researchers and extending language testing expertise in the TEAP development team.
    • Assessing English on the global stage : the British Council and English language testing, 1941-2016

      Weir, Cyril J.; O'Sullivan, Barry (Equinox, 2017-07-06)
      This book tells the story of the British Council’s seventy-five year involvement in the field of English language testing. The first section of the book explores the role of the British Council in spreading British influence around the world through the export of British English language examinations and British expertise in language testing. Founded in 1934, the organisation formally entered the world of English language testing with the signing of an agreement with the University of Cambridge Local Examination Syndicate (UCLES) in 1941. This agreement, which was to last until 1993, saw the British Council provide substantial English as a Foreign Language (EFL) expertise and technical and financial assistance to help UCLES develop their suite of English language tests. Perhaps the high points of this phase were the British Council inspired Cambridge Diploma of English Studies introduced in the 1940s and the central role played by the British Council in the conceptualisation and development of the highly innovative English Language Testing Service (ELTS) in the 1970s, the precursor to the present day International English Language Testing System (IELTS). British Council support for the development of indigenous national English language tests around the world over the last thirty years further enhanced the promotion of English and the creation of soft power for Britain. In the early 1990s the focus of the British Council changed from test development to delivery of British examinations through its global network. However, by the early years of the 21st century, the organisation was actively considering a return to test development, a strategy that was realised with the founding of the Assessment Research Group in early 2012. This was followed later that year by the introduction of the Aptis English language testing service; the first major test developed in-house for over thirty years. As well as setting the stage for the re-emergence of professional expertise in language testing within the organisation, these initiatives have resulted in a growing strategic influence for the organisation on assessment in English language education. This influence derives from a commitment to test localisation, the development and provision of flexible, accessible and affordable tests and an efficient delivery, marking and reporting system underpinned by an innovative socio-cognitive approach to language testing. This final period can be seen as a clear return by the British Council to using language testing as a tool for enhancing soft power for Britain: a return to the original raison d’etre of the organisation.
    • Comparing writing proficiency assessments used in professional medical registration: a methodology to inform policy and practice

      Chan, Sathena Hiu Chong; Taylor, Lynda; University of Bedfordshire (Elsevier, 2020-10-13)
      Internationally trained doctors wishing to register and practise in an English-speaking country typically have to demonstrate that they can communicate effectively in English, including writing proficiency. Various English language proficiency (ELP) tests are available worldwide and are used for such licensing purposes. This means that medical registration bodies face the question of which test(s) will meet their needs, ideally reflecting the demands of their professional environment. This article reports a mixed-methods study to survey the policy and practice of health-care registration organisations in the UK and worldwide. The study aimed to identify ELP tests that were, or could be, considered as suitable for medical registration purposes and to understand the differences between them. The paper discusses what the study revealed about the function and comparability of different writing tests used in professional registration as well as the complex criteria a professional body may prioritise when selecting a test. Although the original study was completed in 2015, the paper takes account of subsequent changes in policy and practice. It offers a practical methodology and worked example which may be of interest and value to other researchers, language testers and policymakers as they face challenges in selecting and making comparisons across tests.
    • A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • Contriving authentic interaction: task implementation and engagement in school-based speaking assessment in Hong Kong

      Lam, Daniel M. K.; Yu, Guoxing; Jin, Yan; University of Bedfordshire; University of Bristol; Shanghai Jiaotong University (Palgrave Macmillan, 2016-01-01)
      This chapter examines the validity of the Group Interaction task in a school-based speaking assessment in Hong Kong from the perspectives of task implementation and authenticity of engagement. The new format is intended to offer a more valid assessment than the external examination by eliciting ‘authentic oral language use’ (HKEAA, 2009, p.7) in ‘low-stress conditions’ (p.3), and emphasizes the importance of flexibility and sensitivity to students’ needs in its implementation. Such a policy has then been translated into diverse assessment practices, with considerable variation in the amount of preparation time given to students. The present study draws on three types of data, namely 1) students’ discourse in the assessed interactions, 2) stimulated recall with students and teachers, and 3) a mock assessment, where the group interaction task, the preparation time, and the post-interview were all video-recorded. Results show that while the test discourse exhibits some features that ostensibly suggest authentic interaction, a closer examination of students’ pre-task planning activities reveals the contrived and pre-scripted nature of the interaction. Implications for the assessment of students’ interactional competence and recommendations for task implementation are discussed.
    • The design and validation of an online speaking test for young learners in Uruguay: challenges and innovations

      Khabbazbashi, Nahal; Nakatsuhara, Fumiyo; Inoue, Chihiro; Kaplan, Gabriela; Green, Anthony; University of Bedfordshire; Plan Ceibal (Cranmore Publishing on behalf of the International TESOL Union, 2022-02-10)
      This research presents the development of an online speaking test of English for students at the end of primary and beginning of secondary school education in state schools in Uruguay. Following the success of the Plan Ceibal one computer-tablet per child initiative, there was a drive to further utilize technology to improve the language ability of students, particularly in speaking, where the majority of students are at CEFR levels pre-A1 and A1. The national concern over a lack of spoken communicative skills amongst students led to a decision to develop a new speaking test, specifically tailored to local needs. This paper provides an overview of the speaking test development and validation project designed with the following objectives in mind: to establish, track, and report annually learners’ achievements against the Common European Framework of Reference for Languages (CEFR) targeting CEFR levels pre-A1 to A2, to inform teaching and learning, and to promote speaking practice in classrooms. Results of a three-phase mixed-methods study involving small-scale and large-scale trials with learners and examiners as well as a CEFRlinking exercise with expert panelists will be reported. Different sources of evidence will be brought together to build a validity argument for the test. The paper will also focus on some of the challenges involved in assessing young learners and discuss how design decisions, local knowledge and expertise, and technological innovations can be used to address such challenges with implications for other similar test development projects.
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • The effects of extended planning time on candidates’ performance, processes and strategy use in the lecture listening-into-speaking tasks of the TOEFL iBT Test

      Inoue, Chihiro; Lam, Daniel M. K.; Educational Testing Service (Wiley, 2021-06-21)
      This study investigated the effects of two different planning time conditions (i.e., operational [20 s] and extended length [90 s]) for the lecture listening-into-speaking tasks of the TOEFL iBT® test for candidates at different proficiency levels. Seventy international students based in universities and language schools in the United Kingdom (35 at a lower level; 35 at a higher level) participated in the study. The effects of different lengths of planning time were examined in terms of (a) the scores given by ETS-certified raters; (b) the quality of the speaking performances characterized by accurately reproduced idea units and the measures of complexity, accuracy, and fluency; and (c) self-reported use of cognitive and metacognitive processes and strategies during listening, planning, and speaking. The results found neither a statistically significant main effect of the length of planning time nor an interaction between planning time and proficiency on the scores or on the quality of the speaking performance. There were several cognitive and metacognitive processes and strategies where significantly more engagement was reported under the extended planning time, which suggests enhanced cognitive validity of the task. However, the increased engagement in planning did not lead to any measurable improvement in the score. Therefore, in the interest of practicality, the results of this study provide justifications for the operational length of planning time for the lecture listening-into-speaking tasks in the speaking section of the TOEFL iBT test.
    • The English Benchmarking Study in Maltese Schools: Technical Report 2015

      Khabbazbashi, Nahal; Khalifa, Hanan; Robinson, M.; Ellis, S.; Cambridge English Language Assessment (Cambridge English Language Assessment, 2016-04-15)
      This is a report for a project between Cambridge English Language Assessment and the Maltese Ministry for Education and Employment [Nahal Khabbazbashi was principal investigator for project].
    • English language teacher development in a Russian university: context, problems and implications

      Rasskazova, Tatiana; Guzikova, Maria; Green, Anthony; Ural Federal University; University of Bedfordshire (Elsevier, 2017-02-02)
      The evaluation of teacher professional development efficiency has always been an issue that has attracted attention of professionals in education. This paper reports on the results of a two-year English language teacher professional development programme following a Needs Analysis study conducted by Cambridge ESOL in 2012. Longitudinal research shows that in Russia English language teaching has several problems which exist throughout decades. This article focuses on some of them: class interaction mode; the use of native (Russian) language in class; error correction strategies employed by teachers. A new approach to evaluation was employed by asking students and teachers the same questions from different perspectives on areas identified during the needs analysis study. The results varied in significance, though some positive changes have been noticed in class interaction mode, little has changed in the error correction strategies, the use of Russian in the classroom seems to be quite reasonable and does not interfere with learning. Overall, the study may be useful for general audience, especially for the post-Soviet countries as it provides evidence of change management and their impact on ELT. The findings presented in this paper seek to contribute to the formulation or adjustment of policies related to educational reforms, such as curriculum reform and teacher professional development in non-English-speaking countries.
    • Exploring the value of bilingual language assistants with Japanese English as a foreign language learners

      Macaro, Ernesto; Nakatani, Yasuo; Hayashi, Yuko; Khabbazbashi, Nahal; University of Oxford; Hosei University (Routledge, 2012-04-27)
      We report on a small-scale exploratory study of Japanese students’ reactions to the use of a bilingual language assistant on an EFL study-abroad course in the UK and we give an insight into the possible effect of using bilingual assistants on speaking production. First-year university students were divided into three groups all taught by a monolingual (native) speaker of English. Two teachers had monolingual assistants to help them; the third group had a bilingual (Japanese–English) assistant. In the third group, students were encouraged to ask the assistant for help with English meanings and to provide English equivalents for Japanese phrases, especially during student-centred activities. Moreover, the students in the third group were encouraged to code-switch rather than speak hesitantly or clam up in English. In the first two groups, the students were actively discouraged from using Japanese among themselves in the classroom. The data from an open-ended questionnaire suggest that attitudes to having a bilingual assistant were generally positive. Moreover the ‘bilingual’ group made the biggest gains over the three week period in fluency and in overall speaking scores although these gains were not statistically significant. Suggestions for further research are explored particularly in relation to whether a bilingual assistant may provide support with the cross-cultural challenges faced by EFL learners.
    • Integrated writing and its correlates: a meta-analysis

      Chan, Sathena Hiu Chong; Yamashita, J. (Elsevier, 2022-07-26)
      Integrated tasks are increasing in popularity, either replacing or complementing writing- only independent tasks in writing assessments. This shift has generated many research interests to investigate the underlying construct and features of integrated writing (IW) performances. However, due to the complexity of the IW construct, there are conflicting findings about whether and the extent to which various language skills and IW text features correlate to IW scores. To understand the construct of IW, we conducted a meta-analysis to synthesize correlation coefficients between scores of IW performances and (1) other language skills and (2) text quality features of IW. We also examined factors that may moderate the correlation of IW scores with these two groups of correlates. Consequently, (1)reading and writing skills showed stronger correlations than listening to IW scores; and (2) text length had a strongest correlation, followed by source integration, organization and syntactic complexity, with a smallest correlation of lexical complexity. Several IW task features affected the magnitude of correlations. The results supported the view that IW is an independent construct, albeit related, from other language skills and IW task features may affect the construct of IW.
    • International assessment and local contexts: a case study of an English language initiative in higher education institutes in Egypt

      Khalifa, Hanan; Khabbazbashi, Nahal; Abdelsalam, Samar; Said, Mohsen Elmahdy; Cambridge English Language Assessment; Cairo University (Association for Language Testing and Assessment of Australia and New Zealand, 2015-11-07)
      Within the long-term objectives of English language reform in higher education (HE) institutes across Egypt and increasing employability in the global job market, the Center for Advancement of Postgraduate Studies and Research in Cairo University (CAPSCU), Cambridge English Language Assessment and the British Council (Egypt) have implemented a multi-phase upskilling program aimed at enhancing the workplace language skills of socially disadvantaged undergraduates, developing teachers’ pedagogical knowledge and application, providing both students and teachers with a competitive edge in the job markets through internationally recognised certification and the introduction of 21st century skills such as digital-age literacy and effective communication in HE, and, lastly, integrating international standards for teaching, learning and assessment within the local context. This paper reports on a mixed methods research study aimed at evaluating the effectiveness of this initiative and its impact at the micro and macro levels. The research focused on language progression, learner autonomy, motivation towards digital learning and assessment, improvements in pedagogical knowledge and teaching practices. Standardised assessment, attitudinal and perceptions surveys, and observational data were used. Findings suggested a positive impact of the upskilling program, illustrated how international collaborations can provide the necessary skills for today’s global job market, and highlighted areas for consideration for upscaling the initiative.
    • Opposing tensions of local and international standards for EAP writing: programmes: who are we assessing for?

      Bruce, Emma Louise; Hamp-Lyons, Liz; City University of Hong Kong; University of Bedfordshire (Elsevier, 2015-04-24)
      In response to recent curriculum changes in secondary schools in Hong Kong including the implementation of the 3e3e4 education structure, with one year less at high school and one year more at university and the introduction of a new school leavers' exam, the Hong Kong Diploma of Secondary Education (HKDSE), universities in the territory have revisited their English language curriculums. At City University a new EAP curriculum and assessment framework was developed to fit the re-defined needs of the new cohort of students. In this paper we describe the development and benchmarking process of a scoring instrument for EAP writing assessment at City University. We discuss the opposing tensions of local (HKDSE) and international (CEFR and IELTS) standards, the problems of aligning EAP needs-based domain scales and standards with the CEFR and the issues associated with attempting to fulfil the institutional expectation that the EAP programme would raise students' scores by a whole CEFR scale step. Finally, we consider the political tensions created by the use of external, even international, reference points for specific levels of writing performance from all our students and suggest the benefits of a specific, locallydesigned, fit-for-purpose tool over one aligned with universal standards.
    • Paper-based vs computer-based writing assessment: divergent, equivalent or complementary?

      Chan, Sathena Hiu Chong (Elsevier, 2018-05-16)
      Writing on a computer is now commonplace in most post-secondary educational contexts and workplaces, making research into computer-based writing assessment essential. This special issue of Assessing Writing includes a range of articles focusing on computer-based writing assessments. Some of these have been designed to parallel an existing paper-based assessment, others have been constructed as computer-based from the beginning. The selection of papers addresses various dimensions of the validity of computer-based writing assessment use in different contexts and across levels of L2 learner proficiency. First, three articles deal with the impact of these two delivery modes, paper-baser-based or computer-based, on test takers’ processing and performance in large-scale high-stakes writing tests; next, two articles explore the use of online writing assessment in higher education; the final two articles evaluate the use of technologies to provide feedback to support learning.
    • Preparing for admissions tests in English

      Yu, Guoxing; Green, Anthony; University of Bristol; University of Bedfordshire (Taylor & Francis, 2021-05-06)
      Test preparation for admissions to education programmes has always been a contentious issue (Anastasi, 1981; Crocker, 2003; Messick, 1982; Powers, 2012). For Crocker (2006), ‘No activity in educational assessment raises more instructional, ethical, and validity issues than preparation for large-scale, high-stakes tests.’ (p. 115). Debate has often centred around the effectiveness of preparation and how it affects the validity of test score interpretations; equity and fairness of access to opportunity; and impacts on learning and teaching (Yu et al., 2017). A focus has often been preparation for tests originally designed for domestic students, for example, SATs (e.g., Alderman & Powers, 1980; Appelrouth et al., 2017; Montgomery & Lilly, 2012; Powers, 1993; Powers & Rock, 1999; Sesnowitz et al., 1982) and state-wide tests (e.g., Firestone et al., 2004; Jäger et al., 2012), but the increasing internationalisation of higher education has added a new dimension. To enrol in higher education programmes which use English as the medium of instruction, increasing numbers of international students whose first language is not English are now taking English language tests, or academic specialist tests administered in English, or both. The papers in this special issue concern how students prepare for these tests and the roles in this process of the tests themselves and of the organisations that provide them.
    • Researching L2 writers’ use of metadiscourse markers at intermediate and advanced levels

      Bax, Stephen; Nakatsuhara, Fumiyo; Waller, Daniel; University of Bedfordshire; University of Central Lancashire (Elsevier, 2019-02-20)
      Metadiscourse markers refer to aspects of text organisation or indicate a writer’s stance towards the text’s content or towards the reader (Hyland, 2004:109). The CEFR (Council of Europe, 2001) indicates that one of the key areas of development anticipated between levels B2 and C1 is an increasing variety of discourse markers and growing acknowledgement of the intended audience by learners. This study represents the first large-scale project of the metadiscourse of general second language learner writing, through the analysis of 281 metadiscourse markers in 13 categories, from 900 exam scripts at CEFR B2-C2 levels. The study employed the online text analysis tool Text Inspector (Bax, 2012), in conjunction with human analysts. The findings revealed that higher level writers used fewer metadiscourse markers than lower level writers, but used a significantly wider range of 8 of the 13 classes of markers. The study also demonstrated the crucial importance of analysing not only the behaviour of whole classes of metadiscourse items but also the individual items themselves. The findings are of potential interest to those involved in the development of assessment scales at different levels of the CEFR, or to teachers interested in aiding the development of learners. 
    • Restoring perspective on the IELTS test

      Green, Anthony (Oxford University Press, 2019-03-18)
      This article presents a response to William Pearson’s article, ‘Critical Perspectives on the IELTS Test’. It addresses his critique of the role of IELTS as a test for regulating international mobility and access to English medium education and evaluates his more specific prescriptions for the improvements to the quality of the test itself.
    • The role of the L1 in testing L2 English

      Nakatsuhara, Fumiyo; Taylor, Lynda; Jaiyote, Suwimol (Cambridge University Press, 2018-11-28)
      This chapter compares and contrasts two research studies that addressed the role of L1 in the assessment of L2 spoken English. The first research is a small-scale, mixed-methods study which explored the impact of test-takers’ L1 backgrounds in the paired speaking task of a standardised test of general English provided by an international examination board (Nakatsuhara and Jaiyote, 2015). The key question in the research was how we can ensure fairness to test-takers who perform paired tests in shared and non-shared L1 pairs. The second research is a large-scale, a priori test validation study conducted as a part of the development of a new EAP (English for academic purposes) test offered by a national examination board, targeting only single L1 users (Nakatsuhara, 2014). Of particular interest is the way in which its pronunciation rating scale was developed and validated in the single L1 context. In light of these examples of research into international and locally-developed tests, this chapter aims to demonstrate the importance of the construct of a test and its score usage when reconsidering a) whether specific English varieties are considered to be construct-relevant or construct-irrelevant and b) what Englishes (rather than ‘standard’ English) should be elicited and assessed. Nakatsuhara, F. (2014). A Research Report on the Development of the Test of English for Academic Purposes (TEAP) Speaking Test for Japanese University Entrants – Study 1 & Study 2, available on line at: www.eiken.or.jp/teap/group/pdf/teap_speaking_report1.pdf Nakatsuhara, F. and Jaiyote, S. (2015). Exploring the impact of test-takers’ L1 backgrounds on paired speaking test performance: how do they perform in shared and non-shared L1 pairs? BAAL / Cambridge University Press Applied Linguistics Seminar, York St John University, UK (24-26/06/2015).