• An application of AUA to examining the potential washback of a new test of English for university entrance

      Nakamura, Keita; Green, Anthony; Eiken Foundation of Japan; University of Bedfordshire (2013-11-17)
    • Applying the socio-cognitive framework: gathering validity evidence during the development of a speaking test

      Nakatsuhara, Fumiyo; Dunlea, Jamie; University of Bedfordshire; British Council (UCLES/Cambridge University Press, 2020-06-18)
      This chapter describes how Weir’s (2005; further elaborated in Taylor (Ed) 2011) socio-cognitive framework for validating speaking tests guided two a priori validation studies of the speaking component of the Test of English for Academic Purposes (TEAP) in Japan. In this chapter, we particularly reflect upon the academic achievements of Professor Cyril J Weir, in terms of: • the effectiveness and value of the socio-cognitive framework underpinning the development of the TEAP Speaking Test while gathering empirical evidence of the construct underlying a speaking test for the target context • his contribution to developing early career researchers and extending language testing expertise in the TEAP development team.
    • Assessing English on the global stage : the British Council and English language testing, 1941-2016

      Weir, Cyril J.; O'Sullivan, Barry (Equinox, 2017-08-05)
      This book tells the story of the British Council’s seventy-five year involvement in the field of English language testing. The first section of the book explores the role of the British Council in spreading British influence around the world through the export of British English language examinations and British expertise in language testing. Founded in 1934, the organisation formally entered the world of English language testing with the signing of an agreement with the University of Cambridge Local Examination Syndicate (UCLES) in 1941. This agreement, which was to last until 1993, saw the British Council provide substantial English as a Foreign Language (EFL) expertise and technical and financial assistance to help UCLES develop their suite of English language tests. Perhaps the high points of this phase were the British Council inspired Cambridge Diploma of English Studies introduced in the 1940s and the central role played by the British Council in the conceptualisation and development of the highly innovative English Language Testing Service (ELTS) in the 1970s, the precursor to the present day International English Language Testing System (IELTS). British Council support for the development of indigenous national English language tests around the world over the last thirty years further enhanced the promotion of English and the creation of soft power for Britain. In the early 1990s the focus of the British Council changed from test development to delivery of British examinations through its global network. However, by the early years of the 21st century, the organisation was actively considering a return to test development, a strategy that was realised with the founding of the Assessment Research Group in early 2012. This was followed later that year by the introduction of the Aptis English language testing service; the first major test developed in-house for over thirty years. As well as setting the stage for the re-emergence of professional expertise in language testing within the organisation, these initiatives have resulted in a growing strategic influence for the organisation on assessment in English language education. This influence derives from a commitment to test localisation, the development and provision of flexible, accessible and affordable tests and an efficient delivery, marking and reporting system underpinned by an innovative socio-cognitive approach to language testing. This final period can be seen as a clear return by the British Council to using language testing as a tool for enhancing soft power for Britain: a return to the original raison d’etre of the organisation.
    • A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • Contriving authentic interaction: task implementation and engagement in school-based speaking assessment in Hong Kong

      Lam, Daniel M. K.; Yu, Guoxing; Jin, Yan; University of Bedfordshire; University of Bristol; Shanghai Jiaotong University (Palgrave Macmillan, 2015-11-01)
      This chapter examines the validity of the Group Interaction task in a school-based speaking assessment in Hong Kong from the perspectives of task implementation and authenticity of engagement. The new format is intended to offer a more valid assessment than the external examination by eliciting ‘authentic oral language use’ (HKEAA, 2009, p.7) in ‘low-stress conditions’ (p.3), and emphasizes the importance of flexibility and sensitivity to students’ needs in its implementation. Such a policy has then been translated into diverse assessment practices, with considerable variation in the amount of preparation time given to students. The present study draws on three types of data, namely 1) students’ discourse in the assessed interactions, 2) stimulated recall with students and teachers, and 3) a mock assessment, where the group interaction task, the preparation time, and the post-interview were all video-recorded. Results show that while the test discourse exhibits some features that ostensibly suggest authentic interaction, a closer examination of students’ pre-task planning activities reveals the contrived and pre-scripted nature of the interaction. Implications for the assessment of students’ interactional competence and recommendations for task implementation are discussed.
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • The English Benchmarking Study in Maltese Schools: Technical Report 2015

      Khabbazbashi, Nahal; Khalifa, Hanan; Robinson, M.; Ellis, S.; Cambridge English Language Assessment (Cambridge English Language Assessment, 2016-04-15)
      This is a report for a project between Cambridge English Language Assessment and the Maltese Ministry for Education and Employment [Nahal Khabbazbashi was principal investigator for project].
    • English language teacher development in a Russian university: context, problems and implications

      Rasskazova, Tatiana; Guzikova, Maria; Green, Anthony; Ural Federal University; University of Bedfordshire (Elsevier, 2017-02-02)
      The evaluation of teacher professional development efficiency has always been an issue that has attracted attention of professionals in education. This paper reports on the results of a two-year English language teacher professional development programme following a Needs Analysis study conducted by Cambridge ESOL in 2012. Longitudinal research shows that in Russia English language teaching has several problems which exist throughout decades. This article focuses on some of them: class interaction mode; the use of native (Russian) language in class; error correction strategies employed by teachers. A new approach to evaluation was employed by asking students and teachers the same questions from different perspectives on areas identified during the needs analysis study. The results varied in significance, though some positive changes have been noticed in class interaction mode, little has changed in the error correction strategies, the use of Russian in the classroom seems to be quite reasonable and does not interfere with learning. Overall, the study may be useful for general audience, especially for the post-Soviet countries as it provides evidence of change management and their impact on ELT. The findings presented in this paper seek to contribute to the formulation or adjustment of policies related to educational reforms, such as curriculum reform and teacher professional development in non-English-speaking countries.
    • Exploring the value of bilingual language assistants with Japanese English as a foreign language learners

      Macaro, Ernesto; Nakatani, Yasuo; Hayashi, Yuko; Khabbazbashi, Nahal; University of Oxford; Hosei University (Routledge, 2012-04-27)
      We report on a small-scale exploratory study of Japanese students’ reactions to the use of a bilingual language assistant on an EFL study-abroad course in the UK and we give an insight into the possible effect of using bilingual assistants on speaking production. First-year university students were divided into three groups all taught by a monolingual (native) speaker of English. Two teachers had monolingual assistants to help them; the third group had a bilingual (Japanese–English) assistant. In the third group, students were encouraged to ask the assistant for help with English meanings and to provide English equivalents for Japanese phrases, especially during student-centred activities. Moreover, the students in the third group were encouraged to code-switch rather than speak hesitantly or clam up in English. In the first two groups, the students were actively discouraged from using Japanese among themselves in the classroom. The data from an open-ended questionnaire suggest that attitudes to having a bilingual assistant were generally positive. Moreover the ‘bilingual’ group made the biggest gains over the three week period in fluency and in overall speaking scores although these gains were not statistically significant. Suggestions for further research are explored particularly in relation to whether a bilingual assistant may provide support with the cross-cultural challenges faced by EFL learners.
    • International assessment and local contexts: a case study of an English language initiative in higher education institutes in Egypt

      Khalifa, Hanan; Khabbazbashi, Nahal; Abdelsalam, Samar; Said, Mohsen Elmahdy; Cambridge English Language Assessment; Cairo University (Association for Language Testing and Assessment of Australia and New Zealand, 2015-11-07)
      Within the long-term objectives of English language reform in higher education (HE) institutes across Egypt and increasing employability in the global job market, the Center for Advancement of Postgraduate Studies and Research in Cairo University (CAPSCU), Cambridge English Language Assessment and the British Council (Egypt) have implemented a multi-phase upskilling program aimed at enhancing the workplace language skills of socially disadvantaged undergraduates, developing teachers’ pedagogical knowledge and application, providing both students and teachers with a competitive edge in the job markets through internationally recognised certification and the introduction of 21st century skills such as digital-age literacy and effective communication in HE, and, lastly, integrating international standards for teaching, learning and assessment within the local context. This paper reports on a mixed methods research study aimed at evaluating the effectiveness of this initiative and its impact at the micro and macro levels. The research focused on language progression, learner autonomy, motivation towards digital learning and assessment, improvements in pedagogical knowledge and teaching practices. Standardised assessment, attitudinal and perceptions surveys, and observational data were used. Findings suggested a positive impact of the upskilling program, illustrated how international collaborations can provide the necessary skills for today’s global job market, and highlighted areas for consideration for upscaling the initiative.
    • Opposing tensions of local and international standards for EAP writing: programmes: who are we assessing for?

      Bruce, Emma Louise; Hamp-Lyons, Liz; City University of Hong Kong; University of Bedfordshire (Elsevier, 2015-04-24)
      In response to recent curriculum changes in secondary schools in Hong Kong including the implementation of the 3e3e4 education structure, with one year less at high school and one year more at university and the introduction of a new school leavers' exam, the Hong Kong Diploma of Secondary Education (HKDSE), universities in the territory have revisited their English language curriculums. At City University a new EAP curriculum and assessment framework was developed to fit the re-defined needs of the new cohort of students. In this paper we describe the development and benchmarking process of a scoring instrument for EAP writing assessment at City University. We discuss the opposing tensions of local (HKDSE) and international (CEFR and IELTS) standards, the problems of aligning EAP needs-based domain scales and standards with the CEFR and the issues associated with attempting to fulfil the institutional expectation that the EAP programme would raise students' scores by a whole CEFR scale step. Finally, we consider the political tensions created by the use of external, even international, reference points for specific levels of writing performance from all our students and suggest the benefits of a specific, locallydesigned, fit-for-purpose tool over one aligned with universal standards.
    • Paper-based vs computer-based writing assessment: divergent, equivalent or complementary?

      Chan, Sathena Hiu Chong (Elsevier, 2018-05-16)
      Writing on a computer is now commonplace in most post-secondary educational contexts and workplaces, making research into computer-based writing assessment essential. This special issue of Assessing Writing includes a range of articles focusing on computer-based writing assessments. Some of these have been designed to parallel an existing paper-based assessment, others have been constructed as computer-based from the beginning. The selection of papers addresses various dimensions of the validity of computer-based writing assessment use in different contexts and across levels of L2 learner proficiency. First, three articles deal with the impact of these two delivery modes, paper-baser-based or computer-based, on test takers’ processing and performance in large-scale high-stakes writing tests; next, two articles explore the use of online writing assessment in higher education; the final two articles evaluate the use of technologies to provide feedback to support learning.
    • Researching L2 writers’ use of metadiscourse markers at intermediate and advanced levels

      Bax, Stephen; Nakatsuhara, Fumiyo; Waller, Daniel; University of Bedfordshire; University of Central Lancashire (Elsevier, 2019-02-20)
      Metadiscourse markers refer to aspects of text organisation or indicate a writer’s stance towards the text’s content or towards the reader (Hyland, 2004:109). The CEFR (Council of Europe, 2001) indicates that one of the key areas of development anticipated between levels B2 and C1 is an increasing variety of discourse markers and growing acknowledgement of the intended audience by learners. This study represents the first large-scale project of the metadiscourse of general second language learner writing, through the analysis of 281 metadiscourse markers in 13 categories, from 900 exam scripts at CEFR B2-C2 levels. The study employed the online text analysis tool Text Inspector (Bax, 2012), in conjunction with human analysts. The findings revealed that higher level writers used fewer metadiscourse markers than lower level writers, but used a significantly wider range of 8 of the 13 classes of markers. The study also demonstrated the crucial importance of analysing not only the behaviour of whole classes of metadiscourse items but also the individual items themselves. The findings are of potential interest to those involved in the development of assessment scales at different levels of the CEFR, or to teachers interested in aiding the development of learners. 
    • Restoring perspective on the IELTS test

      Green, Anthony (Oxford University Press, 2019-03-18)
      This article presents a response to William Pearson’s article, ‘Critical Perspectives on the IELTS Test’. It addresses his critique of the role of IELTS as a test for regulating international mobility and access to English medium education and evaluates his more specific prescriptions for the improvements to the quality of the test itself.
    • The role of the L1 in testing L2 English

      Nakatsuhara, Fumiyo; Taylor, Lynda; Jaiyote, Suwimol (Cambridge University Press, 2018-11-28)
      This chapter compares and contrasts two research studies that addressed the role of L1 in the assessment of L2 spoken English. The first research is a small-scale, mixed-methods study which explored the impact of test-takers’ L1 backgrounds in the paired speaking task of a standardised test of general English provided by an international examination board (Nakatsuhara and Jaiyote, 2015). The key question in the research was how we can ensure fairness to test-takers who perform paired tests in shared and non-shared L1 pairs. The second research is a large-scale, a priori test validation study conducted as a part of the development of a new EAP (English for academic purposes) test offered by a national examination board, targeting only single L1 users (Nakatsuhara, 2014). Of particular interest is the way in which its pronunciation rating scale was developed and validated in the single L1 context. In light of these examples of research into international and locally-developed tests, this chapter aims to demonstrate the importance of the construct of a test and its score usage when reconsidering a) whether specific English varieties are considered to be construct-relevant or construct-irrelevant and b) what Englishes (rather than ‘standard’ English) should be elicited and assessed. Nakatsuhara, F. (2014). A Research Report on the Development of the Test of English for Academic Purposes (TEAP) Speaking Test for Japanese University Entrants – Study 1 & Study 2, available on line at: www.eiken.or.jp/teap/group/pdf/teap_speaking_report1.pdf Nakatsuhara, F. and Jaiyote, S. (2015). Exploring the impact of test-takers’ L1 backgrounds on paired speaking test performance: how do they perform in shared and non-shared L1 pairs? BAAL / Cambridge University Press Applied Linguistics Seminar, York St John University, UK (24-26/06/2015).
    • Scoring validity of the Aptis speaking test : investigating fluency across tasks and levels of proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (British Council, 2017-11-16)
      Second language oral fluency has long been considered as an important construct in communicative language ability (e.g. de Jong et al, 2012) and many speaking tests are designed to measure fluency aspect(s) of candidates’ language (e.g. IELTS, TOEFL iBT, PTE Academic). Current research in second language acquisition suggests that a number of measures of speed, breakdown and repair fluency can reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterise fluency at each level of proficiency, and which can consistently distinguish one proficiency level from the next. This study is an attempt to help answer these questions. This study investigated fluency constructs across four different levels of proficiency (A2–C1) and four different semi-direct speaking test tasks performed by 32 candidates taking the Aptis Speaking test. Using PRAAT (Boersma & Weenik, 2013), we analysed 120 task performances on different aspects of utterance fluency including speed, breakdown and repair measures across different tasks and levels of proficiency. The results suggest that speed measures consistently distinguish fluency across different levels of proficiency, and many of the breakdown measures differentiate between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different proficiency levels and tasks suggest that a more complex process is at play. The non-significant differences between most of fluency measures in the four tasks suggest that fluency is not affected by task type in the Aptis Speaking test. The implications of the findings are discussed in relation to the Aptis Speaking test fluency rating scales and rater training materials. 
    • Topic and background knowledge effects on performance in speaking assessment

      Khabbazbashi, Nahal (Sage, 2015-08-10)
      This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment.