• An application of AUA to examining the potential washback of a new test of English for university entrance

      Nakamura, Keita; Green, Anthony; Eiken Foundation of Japan; University of Bedfordshire (2013-11-17)
    • Applying the socio-cognitive framework: gathering validity evidence during the development of a speaking test

      Nakatsuhara, Fumiyo; Dunlea, Jamie; University of Bedfordshire; British Council (UCLES/Cambridge University Press, 2020-06-18)
      This chapter describes how Weir’s (2005; further elaborated in Taylor (Ed) 2011) socio-cognitive framework for validating speaking tests guided two a priori validation studies of the speaking component of the Test of English for Academic Purposes (TEAP) in Japan. In this chapter, we particularly reflect upon the academic achievements of Professor Cyril J Weir, in terms of: • the effectiveness and value of the socio-cognitive framework underpinning the development of the TEAP Speaking Test while gathering empirical evidence of the construct underlying a speaking test for the target context • his contribution to developing early career researchers and extending language testing expertise in the TEAP development team.
    • Assessing English on the global stage : the British Council and English language testing, 1941-2016

      Weir, Cyril J.; O'Sullivan, Barry (Equinox, 2017-07-06)
      This book tells the story of the British Council’s seventy-five year involvement in the field of English language testing. The first section of the book explores the role of the British Council in spreading British influence around the world through the export of British English language examinations and British expertise in language testing. Founded in 1934, the organisation formally entered the world of English language testing with the signing of an agreement with the University of Cambridge Local Examination Syndicate (UCLES) in 1941. This agreement, which was to last until 1993, saw the British Council provide substantial English as a Foreign Language (EFL) expertise and technical and financial assistance to help UCLES develop their suite of English language tests. Perhaps the high points of this phase were the British Council inspired Cambridge Diploma of English Studies introduced in the 1940s and the central role played by the British Council in the conceptualisation and development of the highly innovative English Language Testing Service (ELTS) in the 1970s, the precursor to the present day International English Language Testing System (IELTS). British Council support for the development of indigenous national English language tests around the world over the last thirty years further enhanced the promotion of English and the creation of soft power for Britain. In the early 1990s the focus of the British Council changed from test development to delivery of British examinations through its global network. However, by the early years of the 21st century, the organisation was actively considering a return to test development, a strategy that was realised with the founding of the Assessment Research Group in early 2012. This was followed later that year by the introduction of the Aptis English language testing service; the first major test developed in-house for over thirty years. As well as setting the stage for the re-emergence of professional expertise in language testing within the organisation, these initiatives have resulted in a growing strategic influence for the organisation on assessment in English language education. This influence derives from a commitment to test localisation, the development and provision of flexible, accessible and affordable tests and an efficient delivery, marking and reporting system underpinned by an innovative socio-cognitive approach to language testing. This final period can be seen as a clear return by the British Council to using language testing as a tool for enhancing soft power for Britain: a return to the original raison d’etre of the organisation.
    • Comparing writing proficiency assessments used in professional medical registration: a methodology to inform policy and practice

      Chan, Sathena Hiu Chong; Taylor, Lynda; University of Bedfordshire (Elsevier, 2020-10-13)
      Internationally trained doctors wishing to register and practise in an English-speaking country typically have to demonstrate that they can communicate effectively in English, including writing proficiency. Various English language proficiency (ELP) tests are available worldwide and are used for such licensing purposes. This means that medical registration bodies face the question of which test(s) will meet their needs, ideally reflecting the demands of their professional environment. This article reports a mixed-methods study to survey the policy and practice of health-care registration organisations in the UK and worldwide. The study aimed to identify ELP tests that were, or could be, considered as suitable for medical registration purposes and to understand the differences between them. The paper discusses what the study revealed about the function and comparability of different writing tests used in professional registration as well as the complex criteria a professional body may prioritise when selecting a test. Although the original study was completed in 2015, the paper takes account of subsequent changes in policy and practice. It offers a practical methodology and worked example which may be of interest and value to other researchers, language testers and policymakers as they face challenges in selecting and making comparisons across tests.
    • A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • Contriving authentic interaction: task implementation and engagement in school-based speaking assessment in Hong Kong

      Lam, Daniel M. K.; Yu, Guoxing; Jin, Yan; University of Bedfordshire; University of Bristol; Shanghai Jiaotong University (Palgrave Macmillan, 2016-01-01)
      This chapter examines the validity of the Group Interaction task in a school-based speaking assessment in Hong Kong from the perspectives of task implementation and authenticity of engagement. The new format is intended to offer a more valid assessment than the external examination by eliciting ‘authentic oral language use’ (HKEAA, 2009, p.7) in ‘low-stress conditions’ (p.3), and emphasizes the importance of flexibility and sensitivity to students’ needs in its implementation. Such a policy has then been translated into diverse assessment practices, with considerable variation in the amount of preparation time given to students. The present study draws on three types of data, namely 1) students’ discourse in the assessed interactions, 2) stimulated recall with students and teachers, and 3) a mock assessment, where the group interaction task, the preparation time, and the post-interview were all video-recorded. Results show that while the test discourse exhibits some features that ostensibly suggest authentic interaction, a closer examination of students’ pre-task planning activities reveals the contrived and pre-scripted nature of the interaction. Implications for the assessment of students’ interactional competence and recommendations for task implementation are discussed.
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • The English Benchmarking Study in Maltese Schools: Technical Report 2015

      Khabbazbashi, Nahal; Khalifa, Hanan; Robinson, M.; Ellis, S.; Cambridge English Language Assessment (Cambridge English Language Assessment, 2016-04-15)
      This is a report for a project between Cambridge English Language Assessment and the Maltese Ministry for Education and Employment [Nahal Khabbazbashi was principal investigator for project].
    • English language teacher development in a Russian university: context, problems and implications

      Rasskazova, Tatiana; Guzikova, Maria; Green, Anthony; Ural Federal University; University of Bedfordshire (Elsevier, 2017-02-02)
      The evaluation of teacher professional development efficiency has always been an issue that has attracted attention of professionals in education. This paper reports on the results of a two-year English language teacher professional development programme following a Needs Analysis study conducted by Cambridge ESOL in 2012. Longitudinal research shows that in Russia English language teaching has several problems which exist throughout decades. This article focuses on some of them: class interaction mode; the use of native (Russian) language in class; error correction strategies employed by teachers. A new approach to evaluation was employed by asking students and teachers the same questions from different perspectives on areas identified during the needs analysis study. The results varied in significance, though some positive changes have been noticed in class interaction mode, little has changed in the error correction strategies, the use of Russian in the classroom seems to be quite reasonable and does not interfere with learning. Overall, the study may be useful for general audience, especially for the post-Soviet countries as it provides evidence of change management and their impact on ELT. The findings presented in this paper seek to contribute to the formulation or adjustment of policies related to educational reforms, such as curriculum reform and teacher professional development in non-English-speaking countries.
    • Exploring the value of bilingual language assistants with Japanese English as a foreign language learners

      Macaro, Ernesto; Nakatani, Yasuo; Hayashi, Yuko; Khabbazbashi, Nahal; University of Oxford; Hosei University (Routledge, 2012-04-27)
      We report on a small-scale exploratory study of Japanese students’ reactions to the use of a bilingual language assistant on an EFL study-abroad course in the UK and we give an insight into the possible effect of using bilingual assistants on speaking production. First-year university students were divided into three groups all taught by a monolingual (native) speaker of English. Two teachers had monolingual assistants to help them; the third group had a bilingual (Japanese–English) assistant. In the third group, students were encouraged to ask the assistant for help with English meanings and to provide English equivalents for Japanese phrases, especially during student-centred activities. Moreover, the students in the third group were encouraged to code-switch rather than speak hesitantly or clam up in English. In the first two groups, the students were actively discouraged from using Japanese among themselves in the classroom. The data from an open-ended questionnaire suggest that attitudes to having a bilingual assistant were generally positive. Moreover the ‘bilingual’ group made the biggest gains over the three week period in fluency and in overall speaking scores although these gains were not statistically significant. Suggestions for further research are explored particularly in relation to whether a bilingual assistant may provide support with the cross-cultural challenges faced by EFL learners.
    • International assessment and local contexts: a case study of an English language initiative in higher education institutes in Egypt

      Khalifa, Hanan; Khabbazbashi, Nahal; Abdelsalam, Samar; Said, Mohsen Elmahdy; Cambridge English Language Assessment; Cairo University (Association for Language Testing and Assessment of Australia and New Zealand, 2015-11-07)
      Within the long-term objectives of English language reform in higher education (HE) institutes across Egypt and increasing employability in the global job market, the Center for Advancement of Postgraduate Studies and Research in Cairo University (CAPSCU), Cambridge English Language Assessment and the British Council (Egypt) have implemented a multi-phase upskilling program aimed at enhancing the workplace language skills of socially disadvantaged undergraduates, developing teachers’ pedagogical knowledge and application, providing both students and teachers with a competitive edge in the job markets through internationally recognised certification and the introduction of 21st century skills such as digital-age literacy and effective communication in HE, and, lastly, integrating international standards for teaching, learning and assessment within the local context. This paper reports on a mixed methods research study aimed at evaluating the effectiveness of this initiative and its impact at the micro and macro levels. The research focused on language progression, learner autonomy, motivation towards digital learning and assessment, improvements in pedagogical knowledge and teaching practices. Standardised assessment, attitudinal and perceptions surveys, and observational data were used. Findings suggested a positive impact of the upskilling program, illustrated how international collaborations can provide the necessary skills for today’s global job market, and highlighted areas for consideration for upscaling the initiative.
    • Opposing tensions of local and international standards for EAP writing: programmes: who are we assessing for?

      Bruce, Emma Louise; Hamp-Lyons, Liz; City University of Hong Kong; University of Bedfordshire (Elsevier, 2015-04-24)
      In response to recent curriculum changes in secondary schools in Hong Kong including the implementation of the 3e3e4 education structure, with one year less at high school and one year more at university and the introduction of a new school leavers' exam, the Hong Kong Diploma of Secondary Education (HKDSE), universities in the territory have revisited their English language curriculums. At City University a new EAP curriculum and assessment framework was developed to fit the re-defined needs of the new cohort of students. In this paper we describe the development and benchmarking process of a scoring instrument for EAP writing assessment at City University. We discuss the opposing tensions of local (HKDSE) and international (CEFR and IELTS) standards, the problems of aligning EAP needs-based domain scales and standards with the CEFR and the issues associated with attempting to fulfil the institutional expectation that the EAP programme would raise students' scores by a whole CEFR scale step. Finally, we consider the political tensions created by the use of external, even international, reference points for specific levels of writing performance from all our students and suggest the benefits of a specific, locallydesigned, fit-for-purpose tool over one aligned with universal standards.
    • Paper-based vs computer-based writing assessment: divergent, equivalent or complementary?

      Chan, Sathena Hiu Chong (Elsevier, 2018-05-16)
      Writing on a computer is now commonplace in most post-secondary educational contexts and workplaces, making research into computer-based writing assessment essential. This special issue of Assessing Writing includes a range of articles focusing on computer-based writing assessments. Some of these have been designed to parallel an existing paper-based assessment, others have been constructed as computer-based from the beginning. The selection of papers addresses various dimensions of the validity of computer-based writing assessment use in different contexts and across levels of L2 learner proficiency. First, three articles deal with the impact of these two delivery modes, paper-baser-based or computer-based, on test takers’ processing and performance in large-scale high-stakes writing tests; next, two articles explore the use of online writing assessment in higher education; the final two articles evaluate the use of technologies to provide feedback to support learning.
    • Preparing for admissions tests in English

      Yu, Guoxing; Green, Anthony; University of Bristol; University of Bedfordshire (Taylor & Francis, 2021-05-06)
      Test preparation for admissions to education programmes has always been a contentious issue (Anastasi, 1981; Crocker, 2003; Messick, 1982; Powers, 2012). For Crocker (2006), ‘No activity in educational assessment raises more instructional, ethical, and validity issues than preparation for large-scale, high-stakes tests.’ (p. 115). Debate has often centred around the effectiveness of preparation and how it affects the validity of test score interpretations; equity and fairness of access to opportunity; and impacts on learning and teaching (Yu et al., 2017). A focus has often been preparation for tests originally designed for domestic students, for example, SATs (e.g., Alderman & Powers, 1980; Appelrouth et al., 2017; Montgomery & Lilly, 2012; Powers, 1993; Powers & Rock, 1999; Sesnowitz et al., 1982) and state-wide tests (e.g., Firestone et al., 2004; Jäger et al., 2012), but the increasing internationalisation of higher education has added a new dimension. To enrol in higher education programmes which use English as the medium of instruction, increasing numbers of international students whose first language is not English are now taking English language tests, or academic specialist tests administered in English, or both. The papers in this special issue concern how students prepare for these tests and the roles in this process of the tests themselves and of the organisations that provide them.
    • Researching L2 writers’ use of metadiscourse markers at intermediate and advanced levels

      Bax, Stephen; Nakatsuhara, Fumiyo; Waller, Daniel; University of Bedfordshire; University of Central Lancashire (Elsevier, 2019-02-20)
      Metadiscourse markers refer to aspects of text organisation or indicate a writer’s stance towards the text’s content or towards the reader (Hyland, 2004:109). The CEFR (Council of Europe, 2001) indicates that one of the key areas of development anticipated between levels B2 and C1 is an increasing variety of discourse markers and growing acknowledgement of the intended audience by learners. This study represents the first large-scale project of the metadiscourse of general second language learner writing, through the analysis of 281 metadiscourse markers in 13 categories, from 900 exam scripts at CEFR B2-C2 levels. The study employed the online text analysis tool Text Inspector (Bax, 2012), in conjunction with human analysts. The findings revealed that higher level writers used fewer metadiscourse markers than lower level writers, but used a significantly wider range of 8 of the 13 classes of markers. The study also demonstrated the crucial importance of analysing not only the behaviour of whole classes of metadiscourse items but also the individual items themselves. The findings are of potential interest to those involved in the development of assessment scales at different levels of the CEFR, or to teachers interested in aiding the development of learners. 
    • Restoring perspective on the IELTS test

      Green, Anthony (Oxford University Press, 2019-03-18)
      This article presents a response to William Pearson’s article, ‘Critical Perspectives on the IELTS Test’. It addresses his critique of the role of IELTS as a test for regulating international mobility and access to English medium education and evaluates his more specific prescriptions for the improvements to the quality of the test itself.
    • The role of the L1 in testing L2 English

      Nakatsuhara, Fumiyo; Taylor, Lynda; Jaiyote, Suwimol (Cambridge University Press, 2018-11-28)
      This chapter compares and contrasts two research studies that addressed the role of L1 in the assessment of L2 spoken English. The first research is a small-scale, mixed-methods study which explored the impact of test-takers’ L1 backgrounds in the paired speaking task of a standardised test of general English provided by an international examination board (Nakatsuhara and Jaiyote, 2015). The key question in the research was how we can ensure fairness to test-takers who perform paired tests in shared and non-shared L1 pairs. The second research is a large-scale, a priori test validation study conducted as a part of the development of a new EAP (English for academic purposes) test offered by a national examination board, targeting only single L1 users (Nakatsuhara, 2014). Of particular interest is the way in which its pronunciation rating scale was developed and validated in the single L1 context. In light of these examples of research into international and locally-developed tests, this chapter aims to demonstrate the importance of the construct of a test and its score usage when reconsidering a) whether specific English varieties are considered to be construct-relevant or construct-irrelevant and b) what Englishes (rather than ‘standard’ English) should be elicited and assessed. Nakatsuhara, F. (2014). A Research Report on the Development of the Test of English for Academic Purposes (TEAP) Speaking Test for Japanese University Entrants – Study 1 & Study 2, available on line at: www.eiken.or.jp/teap/group/pdf/teap_speaking_report1.pdf Nakatsuhara, F. and Jaiyote, S. (2015). Exploring the impact of test-takers’ L1 backgrounds on paired speaking test performance: how do they perform in shared and non-shared L1 pairs? BAAL / Cambridge University Press Applied Linguistics Seminar, York St John University, UK (24-26/06/2015).
    • Scoring validity of the Aptis speaking test : investigating fluency across tasks and levels of proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (British Council, 2017-11-16)
      Second language oral fluency has long been considered as an important construct in communicative language ability (e.g. de Jong et al, 2012) and many speaking tests are designed to measure fluency aspect(s) of candidates’ language (e.g. IELTS, TOEFL iBT, PTE Academic). Current research in second language acquisition suggests that a number of measures of speed, breakdown and repair fluency can reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterise fluency at each level of proficiency, and which can consistently distinguish one proficiency level from the next. This study is an attempt to help answer these questions. This study investigated fluency constructs across four different levels of proficiency (A2–C1) and four different semi-direct speaking test tasks performed by 32 candidates taking the Aptis Speaking test. Using PRAAT (Boersma & Weenik, 2013), we analysed 120 task performances on different aspects of utterance fluency including speed, breakdown and repair measures across different tasks and levels of proficiency. The results suggest that speed measures consistently distinguish fluency across different levels of proficiency, and many of the breakdown measures differentiate between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different proficiency levels and tasks suggest that a more complex process is at play. The non-significant differences between most of fluency measures in the four tasks suggest that fluency is not affected by task type in the Aptis Speaking test. The implications of the findings are discussed in relation to the Aptis Speaking test fluency rating scales and rater training materials. 
    • Topic and background knowledge effects on performance in speaking assessment

      Khabbazbashi, Nahal (Sage, 2015-08-10)
      This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment.
    • Towards new avenues for the IELTS Speaking Test: insights from examiners’ voices

      Inoue, Chihiro; Khabbazbashi, Nahal; Lam, Daniel M. K.; Nakatsuhara, Fumiyo (IELTS Partners, 2021-02-19)
      This study investigated the examiners’ views on all aspects of the IELTS Speaking Test, namely, the test tasks, topics, format, interlocutor frame, examiner guidelines, test administration, rating, training and standardisation, and test use. The overall trends of the examiners’ views of these aspects of the test were captured by a large-scale online questionnaire, to which a total of 1203 examiners responded. Based on the questionnaire responses, 36 examiners were carefully selected for subsequent interviews to explore the reasons behind their views in depth. The 36 examiners were representative of a number of differing geographical regions and a range of views and experiences in examining and giving examiner training. While the questionnaire responses exhibited generally positive views from examiners on the current IELTS Speaking Test, the interview responses uncovered various issues that the examiners experienced and suggested potentially beneficial modifications. Many of the issues (e.g. potentially unsuitable topics, rigidity of interlocutor frames) were attributable to the huge candidature of the IELTS Speaking Test, which has vastly expanded since the test’s last revision in 2001, perhaps beyond the initial expectations of the IELTS Partners. This study synthesized the voices from examiners and insights from relevant literature, and incorporated guidelines checks we submitted to the IELTS Partners. This report concludes with a number of suggestions for potential changes in the current IELTS Speaking Test, so as to enhance its validity and accessibility in today’s ever globalising world.