• An application of AUA to examining the potential washback of a new test of English for university entrance

      Nakamura, Keita; Green, Anthony; Eiken Foundation of Japan; University of Bedfordshire (2013-11-17)
    • Assessing English on the global stage : the British Council and English language testing, 1941-2016

      Weir, Cyril J.; O'Sullivan, Barry (Equinox, 2017-08-05)
      This book tells the story of the British Council’s seventy-five year involvement in the field of English language testing. The first section of the book explores the role of the British Council in spreading British influence around the world through the export of British English language examinations and British expertise in language testing. Founded in 1934, the organisation formally entered the world of English language testing with the signing of an agreement with the University of Cambridge Local Examination Syndicate (UCLES) in 1941. This agreement, which was to last until 1993, saw the British Council provide substantial English as a Foreign Language (EFL) expertise and technical and financial assistance to help UCLES develop their suite of English language tests. Perhaps the high points of this phase were the British Council inspired Cambridge Diploma of English Studies introduced in the 1940s and the central role played by the British Council in the conceptualisation and development of the highly innovative English Language Testing Service (ELTS) in the 1970s, the precursor to the present day International English Language Testing System (IELTS). British Council support for the development of indigenous national English language tests around the world over the last thirty years further enhanced the promotion of English and the creation of soft power for Britain. In the early 1990s the focus of the British Council changed from test development to delivery of British examinations through its global network. However, by the early years of the 21st century, the organisation was actively considering a return to test development, a strategy that was realised with the founding of the Assessment Research Group in early 2012. This was followed later that year by the introduction of the Aptis English language testing service; the first major test developed in-house for over thirty years. As well as setting the stage for the re-emergence of professional expertise in language testing within the organisation, these initiatives have resulted in a growing strategic influence for the organisation on assessment in English language education. This influence derives from a commitment to test localisation, the development and provision of flexible, accessible and affordable tests and an efficient delivery, marking and reporting system underpinned by an innovative socio-cognitive approach to language testing. This final period can be seen as a clear return by the British Council to using language testing as a tool for enhancing soft power for Britain: a return to the original raison d’etre of the organisation.
    • Comparing rating modes: analysing live, audio, and video ratings of IELTS Speaking Test performances

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; (Taylor & Francis, 2020-08-26)
      This mixed methods study compared IELTS examiners’ scores when assessing spoken performances under live and two ‘non-live’ testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers’ performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the many-facet Rasch model (MFRM). For all three modes, examiners provided written justifications for their ratings, and verbal reports were also collected to gain insights into examiner perceptions towards performance under the audio and video conditions. Results showed that, for all rating criteria, audio ratings were significantly lower than live and video ratings. Examiners noticed more negative performance features under the two non-live rating conditions, compared to the live condition. However, richer information about test-taker performance in the video mode appeared to cause raters to rely less on such negative evidence than audio raters when awarding scores. Verbal report data showed that having visual information in the video-rating mode helped examiners to understand what the test-takers were saying, to comprehend better what test-takers were communicating using non-verbal means, and to understand with greater confidence the source of test-takers’ hesitation, pauses, and awkwardness.
    • A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • Continuity and innovation: a history of the Cambridge Proficiency in English examination 1913-2002

      Weir, Cyril J.; Milanovic, Michael (Cambridge University Press, 2003-01-01)
      This volume documents in some detail the most recent revision of Cambridge English: Proficiency, also known as Certificate of Proficiency in English (CPE), which took place from 1991 to 2002. CPE is the oldest of the Cambridge suite of English as a Foreign Language (EFL) examinations and was originally introduced in 1913. Since that time the test has been regularly revised and updated to bring it into line with current thinking in language teaching, applied linguistics and language testing theory and practice. The volume provides a full account of the revision process, the questions and problems faced by the revision teams, and the solutions they came up with. It is also an attempt to encourage in the public domain greater understanding of the complex thinking, processes and procedures which underpin the development and revision of all the Cambridge English tests, and as such it will be of interest and relevance to a wide variety of readers.
    • A research report on the development of the Test of English for Academic Purposes (TEAP) writing test for Japanese university entrants

      Weir, Cyril J.; University of Bedfordshire (Eiken Foundation of Japan, 2014-01-01)
      Rigorous and iterative test design, accompanied by systematic trialing procedures, produced a pilot version of the test which demonstrated acceptable context and cognitive validity for use as an English for academic purposes (EAP) writing test for students wishing to enter Japanese universities. A study carried out on the scoring validity of the rating of the TEAP Writing Test indicated acceptable levels of inter‐ and intra‐marker reliability and demonstrated that receiving institutions could depend on the consistency of the results obtained on the test. study carried out on the contextual complexity parameters (lexical, grammatical, and cohesive) of scripts allocated to different bands on the TEAP Writing Test rating scale indicated that there were significant differences between the scripts in adjacent band levels, with the band B1 scripts produced by students being more complex than the band A2 scripts across a broad set of indices.
    • The role of the L1 in testing L2 English

      Nakatsuhara, Fumiyo; Taylor, Lynda; Jaiyote, Suwimol (Cambridge University Press, 2018-11-28)
      This chapter compares and contrasts two research studies that addressed the role of L1 in the assessment of L2 spoken English. The first research is a small-scale, mixed-methods study which explored the impact of test-takers’ L1 backgrounds in the paired speaking task of a standardised test of general English provided by an international examination board (Nakatsuhara and Jaiyote, 2015). The key question in the research was how we can ensure fairness to test-takers who perform paired tests in shared and non-shared L1 pairs. The second research is a large-scale, a priori test validation study conducted as a part of the development of a new EAP (English for academic purposes) test offered by a national examination board, targeting only single L1 users (Nakatsuhara, 2014). Of particular interest is the way in which its pronunciation rating scale was developed and validated in the single L1 context. In light of these examples of research into international and locally-developed tests, this chapter aims to demonstrate the importance of the construct of a test and its score usage when reconsidering a) whether specific English varieties are considered to be construct-relevant or construct-irrelevant and b) what Englishes (rather than ‘standard’ English) should be elicited and assessed. Nakatsuhara, F. (2014). A Research Report on the Development of the Test of English for Academic Purposes (TEAP) Speaking Test for Japanese University Entrants – Study 1 & Study 2, available on line at: www.eiken.or.jp/teap/group/pdf/teap_speaking_report1.pdf Nakatsuhara, F. and Jaiyote, S. (2015). Exploring the impact of test-takers’ L1 backgrounds on paired speaking test performance: how do they perform in shared and non-shared L1 pairs? BAAL / Cambridge University Press Applied Linguistics Seminar, York St John University, UK (24-26/06/2015).
    • Towards a model of multi-dimensional performance of C1 level speakers assessed in the Aptis Speaking Test

      Nakatsuhara, Fumiyo; Tavakoli, Parveneh; Awwad, Anas; British Council; University of Bedfordshire; University of Reading; Isra University, Jordan (British Council, 2019-09-14)
      This is a peer-reviewed online research report in the British Council Validation Series (https://www.britishcouncil.org/exam/aptis/research/publications/validation). Abstract The current study draws on the findings of Tavakoli, Nakatsuhara and Hunter’s (2017) quantitative study which failed to identify any statistically significant differences between various fluency features in speech produced by B2 and C1 level candidates in the Aptis Speaking test. This study set out to examine whether there were differences between other aspects of the speakers’ performance at these two levels, in terms of lexical and syntactic complexity, accuracy and use of metadiscourse markers, that distinguish the two levels. In order to understand the relationship between fluency and these other aspects of performance, the study employed a mixed-methods approach to analysing the data. The quantitative analysis included descriptive statistics, t-tests and correlational analyses of the various linguistic measures. For the qualitative analysis, we used a discourse analysis approach to examining the pausing behaviour of the speakers in the context the pauses occurred in their speech. The results indicated that the two proficiency levels were statistically different on measures of accuracy (weighted clause ratio) and lexical diversity (TTR and D), with the C1 level producing more accurate and lexically diverse output. The correlation analyses showed speed fluency was correlated positively with weighted clause ratio and negatively with length of clause. Speed fluency was also positively related to lexical diversity, but negatively linked with lexical errors. As for pauses, frequency of end-clause pauses was positively linked with length of AS-units. Mid-clause pauses also positively correlated with lexical diversity and use of discourse markers. Repair fluency correlated positively with length of clause, and negatively with weighted clause ratio. Repair measures were also negatively linked with number of errors per 100 words and metadiscourse marker type. The qualitative analyses suggested that the pauses mainly occurred a) to facilitate access and retrieval of lexical and structural units, b) to reformulate units already produced, and c) to improve communicative effectiveness. A number of speech exerpts are presented to illustrate these examples. It is hoped that the findings of this research offer a better understanding of the construct measured at B2 and C1 levels of the Aptis Speaking test, inform possible refinements of the Aptis Speaking rating scales, and enhance its rater training programme for the two highest levels of the test.