• Aspects of fluency across assessed levels of speaking proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (Wiley, 2020-01-25)
      Recent research in second language acquisition suggests that a number of speed, breakdown, repair and composite measures reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterize fluency at each assessed level of proficiency, and which can consistently distinguish one level from the next. This study investigated fluency in 32 speakers’ performing four tasks of the British Council’s Aptis Speaking test, which were awarded four different levels of proficiency (CEFR A2-C1). Using PRAAT, the performances were analysed for various aspects of utterance fluency across different levels of proficiency. The results suggest that speed and composite measures consistently distinguish fluency from the lowest to upper-intermediate levels (A2-B2), and many breakdown measures differentiate between the lowest level (A2) and the rest of the proficiency groups, with a few differentiating between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different levels suggest that a more complex process is at play. The findings imply that a detailed micro-analysis of fluency offers a more reliable understanding of the construct and its relationship with assessment of proficiency.
    • Defining integrated reading-into-writing constructs: evidence at the B2 C1 interface

      Chan, Sathena Hiu Chong (Cambridge University Press, 2018-06-01)
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • The effects of single and double play upon listening test outcomes and cognitive processing

      Field, John; British Council (British Council, 2015-01-01)
      Report on a project investigating the effects of playing recorded material twice upon test taker scores and upon their behaviour
    • The English Benchmarking Study in Maltese Schools: Technical Report 2015

      Khabbazbashi, Nahal; Khalifa, Hanan; Robinson, M.; Ellis, S.; Cambridge English Language Assessment (Cambridge English Language Assessment, 2016-04-15)
      This is a report for a project between Cambridge English Language Assessment and the Maltese Ministry for Education and Employment [Nahal Khabbazbashi was principal investigator for project].
    • English language teacher development in a Russian university: context, problems and implications

      Rasskazova, Tatiana; Guzikova, Maria; Green, Anthony; Ural Federal University; University of Bedfordshire (Elsevier, 2017-02-02)
      The evaluation of teacher professional development efficiency has always been an issue that has attracted attention of professionals in education. This paper reports on the results of a two-year English language teacher professional development programme following a Needs Analysis study conducted by Cambridge ESOL in 2012. Longitudinal research shows that in Russia English language teaching has several problems which exist throughout decades. This article focuses on some of them: class interaction mode; the use of native (Russian) language in class; error correction strategies employed by teachers. A new approach to evaluation was employed by asking students and teachers the same questions from different perspectives on areas identified during the needs analysis study. The results varied in significance, though some positive changes have been noticed in class interaction mode, little has changed in the error correction strategies, the use of Russian in the classroom seems to be quite reasonable and does not interfere with learning. Overall, the study may be useful for general audience, especially for the post-Soviet countries as it provides evidence of change management and their impact on ELT. The findings presented in this paper seek to contribute to the formulation or adjustment of policies related to educational reforms, such as curriculum reform and teacher professional development in non-English-speaking countries.
    • Exploring the value of bilingual language assistants with Japanese English as a foreign language learners

      Macaro, Ernesto; Nakatani, Yasuo; Hayashi, Yuko; Khabbazbashi, Nahal; University of Oxford; Hosei University (Routledge, 2012-04-27)
      We report on a small-scale exploratory study of Japanese students’ reactions to the use of a bilingual language assistant on an EFL study-abroad course in the UK and we give an insight into the possible effect of using bilingual assistants on speaking production. First-year university students were divided into three groups all taught by a monolingual (native) speaker of English. Two teachers had monolingual assistants to help them; the third group had a bilingual (Japanese–English) assistant. In the third group, students were encouraged to ask the assistant for help with English meanings and to provide English equivalents for Japanese phrases, especially during student-centred activities. Moreover, the students in the third group were encouraged to code-switch rather than speak hesitantly or clam up in English. In the first two groups, the students were actively discouraged from using Japanese among themselves in the classroom. The data from an open-ended questionnaire suggest that attitudes to having a bilingual assistant were generally positive. Moreover the ‘bilingual’ group made the biggest gains over the three week period in fluency and in overall speaking scores although these gains were not statistically significant. Suggestions for further research are explored particularly in relation to whether a bilingual assistant may provide support with the cross-cultural challenges faced by EFL learners.
    • IELTS washback in context: preparation for academic writing in higher education

      Green, Anthony (Cambridge University Press, 2007-12-01)
      The International English Language Testing System (IELTS) plays a key role in international student access to universities around the world. Although IELTS includes a direct test of writing, it has been suggested that test preparation may hinder international students from acquiring academic literacy skills required for university study. This study investigates the washback of the IELTS Writing test on English for Academic Purposes (EAP) provision.
    • Interactional competence with and without extended planning time in a group oral assessment

      Lam, Daniel M. K. (Routledge, Taylor & Francis Group, 2019-05-02)
      Linking one’s contribution to those of others’ is a salient feature demonstrating interactional competence in paired/group speaking assessments. While such responses are to be constructed spontaneously while engaging in real-time interaction, the amount and nature of pre-task preparation in paired/group speaking assessments may have an influence on how such an ability (or lack thereof) could manifest in learners’ interactional performance. Little previous research has examined the effect of planning time on interactional aspects of paired/group speaking task performance. Within the context of school-based assessment in Hong Kong, this paper analyzes the discourse of two group interactions performed by the same four student-candidates under two conditions: (a) with extended planning time (4–5 hours), and (b) without extended planning time (10 minutes), with the aim of exploring any differences in student-candidates’ performance of interactional competence in this assessment task. The analysis provides qualitative discourse evidence that extended planning time may impede the assessment task’s capacity to discriminate between stronger and weaker candidates’ ability to spontaneously produce responses contingent on previous speaker contribution. Implications for the implementation of preparation time for the group interaction task are discussed.
    • International assessment and local contexts: a case study of an English language initiative in higher education institutes in Egypt

      Khalifa, Hanan; Khabbazbashi, Nahal; Abdelsalam, Samar; Said, Mohsen Elmahdy; Cambridge English Language Assessment; Cairo University (Association for Language Testing and Assessment of Australia and New Zealand, 2015-11-07)
      Within the long-term objectives of English language reform in higher education (HE) institutes across Egypt and increasing employability in the global job market, the Center for Advancement of Postgraduate Studies and Research in Cairo University (CAPSCU), Cambridge English Language Assessment and the British Council (Egypt) have implemented a multi-phase upskilling program aimed at enhancing the workplace language skills of socially disadvantaged undergraduates, developing teachers’ pedagogical knowledge and application, providing both students and teachers with a competitive edge in the job markets through internationally recognised certification and the introduction of 21st century skills such as digital-age literacy and effective communication in HE, and, lastly, integrating international standards for teaching, learning and assessment within the local context. This paper reports on a mixed methods research study aimed at evaluating the effectiveness of this initiative and its impact at the micro and macro levels. The research focused on language progression, learner autonomy, motivation towards digital learning and assessment, improvements in pedagogical knowledge and teaching practices. Standardised assessment, attitudinal and perceptions surveys, and observational data were used. Findings suggested a positive impact of the upskilling program, illustrated how international collaborations can provide the necessary skills for today’s global job market, and highlighted areas for consideration for upscaling the initiative.
    • Investigating the cognitive constructs measured by the Aptis writing test in the Japanese context: a case study

      Moore, Yumiko; Chan, Sathena Hiu Chong; British Council (the British Council, 2018-11-30)
      This study investigates the context and cognitive validity of the Aptis General Writing Part 4 Tasks. An online survey with almost 50 Japanese universities was conducted to investigate the nature of the predominant academic writing in the wider context. Twenty-five Year 1 academic writing tasks were then sampled from a single Japanese university. Regarding the context validity of the Aptis test, online survey and expert judgement were used to examine the degree of correspondence between the task features of the Aptis task and those of the target academic writing tasks in real life. Regarding its cognitive validity, this study examined the cognitive processes elicited by the Aptis task as compared to the Year 1 writing tasks through a cognitive process questionnaire (n=35) and interviews with seven students and two lecturers. The overall resemblance between the test and the real-life tasks reported in this study supports the context and cognitive validity of the Aptis Writing test Part 4 in the Japanese context. The overall task setting (topic domain, cognitive demands and language function to be performed) of the Aptis test resembles that of the real-life tasks. Aptis Writing test Part 4 tasks, on the other hand, outperformed the sampled real-life tasks in terms of clarity of writing purpose, knowledge of criteria and intended readerships. However, when considering the wider Japanese academic context, a wider range of academic genres, such as summary and report, and some more demanding language functions such as synthesis, should also be represented in the Aptis Writing test. The results show that all target processes in each cognitive phase (conceptualisation, meaning and discourse construction, organising, low-level monitoring and revising, and high-level monitoring and revising) were reported by a reasonable percentage of the participants. Considering the comparatively lower proficiency in English of Japanese students and their unfamiliarity of direct writing assessment, the results are encouraging. However, some sub-processes such as linking important ideas and revising appear to be under-represented in Aptis. In addition, the lack of time management and typing skills of some participants appear to hinder them from spending appropriate time planning, organising, and revising at low and high levels. Recommendations are provided to address these issues.
    • Language learning gains among users of English Liulishuo

      Green, Anthony; O'Sullivan, Barry; LAIX (LAIX, 2019-02-26)
      This study investigated improvements in English language ability (as measured by the British Council Aptis test) among 746 users of the English Liulishuo app, the flagship mobile app produced by LAIX Inc. (NYSE:LAIX), taking courses at three levels over a period of approximately two months.
    • Linking tests of English for academic purposes to the CEFR: the score user’s perspective

      Green, Anthony (Taylor and Francis, 2017-11-13)
      The Common European Framework of Reference for Languages (CEFR) is widely used in setting language proficiency requirements, including for international students seeking access to university courses taught in English. When different language examinations have been related to the CEFR, the process is claimed to help score users, such as university admissions staff, to compare and evaluate these examinations as tools for selecting qualified applicants. This study analyses the linking claims made for four internationally recognised tests of English widely used in university admissions. It uses the Council of Europe’s (2009) suggested stages of specification, standard setting, and empirical validation to frame an evaluation of the extent to which, in this context, the CEFR has fulfilled its potential to “facilitate comparisons between different systems of qualifications.” Findings show that testing agencies make little use of CEFR categories to explain test content; represent the relationships between their tests and the framework in different terms; and arrive at conflicting conclusions about the correspondences between test scores and CEFR levels. This raises questions about the capacity of the CEFR to communicate competing views of a test construct within a coherent overarching structure.
    • Measuring L2 speaking

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Khabbazbashi, Nahal (Routledge, 2019-07-11)
      This chapter on measuring L2 speaking has three main focuses: (a) construct representation, (b) test methods and task design, and (c) scoring and feedback. We will briefly trace the different ways in which speaking constructs have been defined over the years and operationalized using different test methods and task features. We will then discuss the challenges and opportunities that speaking tests present for scoring and providing feedback to learners. We will link these discussions to the current understanding of SLA theories and empirical research, learning oriented assessment approaches and advances in educational technology.
    • Paper-based vs computer-based writing assessment: divergent, equivalent or complementary?

      Chan, Sathena Hiu Chong (Elsevier, 2018-05-16)
      Writing on a computer is now commonplace in most post-secondary educational contexts and workplaces, making research into computer-based writing assessment essential. This special issue of Assessing Writing includes a range of articles focusing on computer-based writing assessments. Some of these have been designed to parallel an existing paper-based assessment, others have been constructed as computer-based from the beginning. The selection of papers addresses various dimensions of the validity of computer-based writing assessment use in different contexts and across levels of L2 learner proficiency. First, three articles deal with the impact of these two delivery modes, paper-baser-based or computer-based, on test takers’ processing and performance in large-scale high-stakes writing tests; next, two articles explore the use of online writing assessment in higher education; the final two articles evaluate the use of technologies to provide feedback to support learning.
    • Rating scale development: a multistage exploratory sequential design

      Galaczi, Evelina D.; Khabbazbashi, Nahal; Cambridge English Language Assessment (Cambridge University Press, 2016-03-01)
      The project chosen to showcase the application of the exploratory sequential design in second/ foreign (L2) language assessment comes from the context of rating scale development and focuses on the development of a set of scales for a suite of high-stakes L2 speaking tests. The assessment of speaking requires assigning scores to a speech sample in a systematic fashion by focusing on explicitly defined criteria which describe different levels of performance (Ginther 2013). Rating scales are the instruments used in this evaluation process, and they can be either holistic (i.e. providing a global overall assessment) or analytic (i.e. providing an independent evaluations for a number of assessment criteria, e.g. Grammar, Vocabulary, Organisation, etc.). The discussion in this chapter is framed within the context of rating scales in speaking assessment. However, it is worth noting that the principles espoused, stages employed and decisions taken during the development process have wider applicability to performance assessment in general.
    • Research and practice in assessing academic English: the case of IELTS

      Taylor, Lynda; Saville, N. (Cambridge University Press, 2019-12-01)
      Test developers need to demonstrate they have premised their measurement tools on a sound theoretical framework which guides their coverage of appropriate language ability constructs in the tests they offer to the public. This is essential for supporting claims about the validity and usefulness of the scores generated by the test.  This volume describes differing approaches to understanding academic reading ability that have emerged in recent decades and goes on to develop an empirically grounded framework for validating tests of academic reading ability.  The framework is then applied to the IELTS Academic reading module to investigate a number of different validity perspectives that reflect the socio-cognitive nature of any assessment event.  The authors demonstrate how a systematic understanding and application of the framework and its components can help test developers to operationalise their tests so as to fulfill the validity requirements for an academic reading test.  The book provides:   An up to date review of the relevant literature on assessing academic reading  A clear and detailed specification of the construct of academic reading  An evaluation of what constitutes an adequate representation of the construct of academic reading for assessment purposes  A consideration of the nature of academic reading in a digital age and its implications for assessment research and test development  The volume is a rich source of information on all aspects of testing academic reading ability.  Examination boards and other institutions who need to validate their own academic reading tests in a systematic and coherent manner, or who wish to develop new instruments for measuring academic reading, will find it of interest, as will researchers and graduate students in the field of language assessment, and those teachers preparing students for IELTS (and similar tests) or involved in English for Academic Purpose programmes. 
    • Researching participants taking IELTS Academic Writing Task 2 (AWT2) in paper mode and in computer mode in terms of score equivalence, cognitive validity and other factors

      Chan, Sathena Hiu Chong; Bax, Stephen; Weir, Cyril J. (British Council and IDP: IELTS Australia, 2017-08-01)
      Computer-based (CB) assessment is becoming more common in most university disciplines, and international language testing bodies now routinely use computers for many areas of English language assessment. Given that, in the near future, IELTS also will need to move towards offering CB options alongside traditional paper-based (PB) modes, the research reported here prepares for that possibility, building on research carried out some years ago which investigated the statistical comparability of the IELTS writing test between the two delivery modes, and offering a fresh look at the relevant issues. By means of questionnaire and interviews, the current study investigates the extent to which 153 test-takers’ cognitive processes, while completing IELTS Academic Writing in PB mode and in CB mode, compare with the real-world cognitive processes of students completing academic writing at university. A major contribution of our study is its use – for the first time in the academic literature – of data from research into cognitive processes within real-world academic settings as a comparison with cognitive processing during academic writing under test conditions. The most important conclusion from the study is that according to the 5-facet MFRM analysis, there were no significant differences in the scores awarded by two independent raters for candidates’ performances on the tests taken under two conditions, one paper-and-pencil and the other computer. Regarding analytic scores criteria, the differences in three areas (i.e. Task Achievement, Coherence and Cohesion, and Grammatical Range and Accuracy) were not significant, but the difference reported in Lexical Resources was significant, if slight. In summary, the difference of scores between the two modes is at an acceptable level. With respect to the cognitive processes students employ in performing under the two conditions of the test, results of the Cognitive Process Questionnaire (CPQ) survey indicate a similar pattern between the cognitive processes involved in writing on a computer and writing with paper-and-pencil. There were no noticeable major differences in the general tendency of the mean of each questionnaire item reported on the two test modes. In summary, the cognitive processes were employed in a similar fashion under the two delivery conditions. Based on the interview data (n=30), it appears that the participants reported using most of the processes in a similar way between the two modes. Nevertheless, a few potential differences indicated by the interview data might be worth further investigation in future studies. The Computer Familiarity Questionnaire survey shows that these students in general are familiar with computer usage and their overall reactions towards working with a computer are positive. Multiple regression analysis, used to find out if computer familiarity had any effect on students’ performances on the two modes, suggested that test-takers who do not have a suitable familiarity profile might perform slightly worse than those who do, in computer mode. In summary, the research offered in this report offers a unique comparison with realworld academic writing, and presents a significant contribution to the research base which IELTS and comparable international testing bodies will need to consider, if they are to introduce CB test versions in future.
    • Restoring perspective on the IELTS test

      Green, Anthony (Oxford University Press, 2019-03-18)
      This article presents a response to William Pearson’s article, ‘Critical Perspectives on the IELTS Test’. It addresses his critique of the role of IELTS as a test for regulating international mobility and access to English medium education and evaluates his more specific prescriptions for the improvements to the quality of the test itself.
    • Some evidence of the development of L2 reading-into-writing skills at three levels

      Chan, Sathena Hiu Chong; University of Bedfordshire (Castledown, 2018-09-05)
      While an integrated format has been widely incorporated into high-stakes writing assessment, there is relatively little research on students’ cognitive processing involved in integrated reading-into-writing tasks. Even research which reviews how the reading-into-writing construct is distinct from one level to the other is scarce. Using a writing process questionnaire, we examined and compared test takers’ cognitive processes on integrated reading-into-writing tasks at three levels. More specifically, the study aims to provide evidence of the predominant reading-into-writing processes appropriate at each level (i.e., the CEFR B1, B2, and C1 levels). The findings of the study reveal the core processes which are essential to the reading-into-writing construct at all three levels. There is also a clear progression of the reading-into-writing skills employed by the test takers across the three CEFR levels. A multiple regression analysis was used to examine the impact of the individual processes on predicting the writers’ level of reading-into-writing abilities. The findings provide empirical evidence concerning the cognitive validity of reading-into-writing tests and have important implications for task design and scoring at each level.