• Aspects of fluency across assessed levels of speaking proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (Wiley, 2020-01-25)
      Recent research in second language acquisition suggests that a number of speed, breakdown, repair and composite measures reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterize fluency at each assessed level of proficiency, and which can consistently distinguish one level from the next. This study investigated fluency in 32 speakers’ performing four tasks of the British Council’s Aptis Speaking test, which were awarded four different levels of proficiency (CEFR A2-C1). Using PRAAT, the performances were analysed for various aspects of utterance fluency across different levels of proficiency. The results suggest that speed and composite measures consistently distinguish fluency from the lowest to upper-intermediate levels (A2-B2), and many breakdown measures differentiate between the lowest level (A2) and the rest of the proficiency groups, with a few differentiating between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different levels suggest that a more complex process is at play. The findings imply that a detailed micro-analysis of fluency offers a more reliable understanding of the construct and its relationship with assessment of proficiency.
    • Assessing English on the global stage : the British Council and English language testing, 1941-2016

      Weir, Cyril J.; O'Sullivan, Barry (Equinox, 2017-07-06)
      This book tells the story of the British Council’s seventy-five year involvement in the field of English language testing. The first section of the book explores the role of the British Council in spreading British influence around the world through the export of British English language examinations and British expertise in language testing. Founded in 1934, the organisation formally entered the world of English language testing with the signing of an agreement with the University of Cambridge Local Examination Syndicate (UCLES) in 1941. This agreement, which was to last until 1993, saw the British Council provide substantial English as a Foreign Language (EFL) expertise and technical and financial assistance to help UCLES develop their suite of English language tests. Perhaps the high points of this phase were the British Council inspired Cambridge Diploma of English Studies introduced in the 1940s and the central role played by the British Council in the conceptualisation and development of the highly innovative English Language Testing Service (ELTS) in the 1970s, the precursor to the present day International English Language Testing System (IELTS). British Council support for the development of indigenous national English language tests around the world over the last thirty years further enhanced the promotion of English and the creation of soft power for Britain. In the early 1990s the focus of the British Council changed from test development to delivery of British examinations through its global network. However, by the early years of the 21st century, the organisation was actively considering a return to test development, a strategy that was realised with the founding of the Assessment Research Group in early 2012. This was followed later that year by the introduction of the Aptis English language testing service; the first major test developed in-house for over thirty years. As well as setting the stage for the re-emergence of professional expertise in language testing within the organisation, these initiatives have resulted in a growing strategic influence for the organisation on assessment in English language education. This influence derives from a commitment to test localisation, the development and provision of flexible, accessible and affordable tests and an efficient delivery, marking and reporting system underpinned by an innovative socio-cognitive approach to language testing. This final period can be seen as a clear return by the British Council to using language testing as a tool for enhancing soft power for Britain: a return to the original raison d’etre of the organisation.
    • The cognitive validity of reading and writing tests designed for young learners

      Field, John (Cambridge University Press, 2018-06-01)
      The notion of cognitive validity becomes considerably more complicated when one extends it to  tests designed for Young Learners. It then becomes necessary to take full account of the level of cognitive development of the target population (their ability to handle certain mental operations and not others). It may also be necessary to include some consideration of their level of linguistic development in L1: in particular, the degree of proficiency they may have achieved in reading and writing. This chapter examines the extent to which awareness of the cognitive development of young learners up to the age of 12 should and does influence the decisions made by those designing tests of second language reading  and writing. The limitations and strengths of young learners of this age range are matched against the various processing demands entailed in second language reading and writing and are then related to characteristics of the Young Learners tests offered by the Cambridge English examinations.
    • The cognitive validity of tests of listening and speaking designed for young learners

      Field, John (Cambridge University Press, 2018-06)
      The notion of cognitive validity becomes considerably more complicated when one extends it to  tests designed for Young Learners. It then becomes necessary to take full account of the level of cognitive development of the target population (their ability to handle certain mental operations and not others). It may also be necessary to include some consideration of their level of linguistic development in L1: in particular, the degree of proficiency they may have achieved in each of the four skills. This chapter examines the extent to which awareness of the cognitive development of young learners up to the age of 12 should and does influence the decisions made by those designing tests of second language listening and speaking. The limitations and strengths of young learners of this age range are matched against the various processing demands entailed in second language listening and speaking and are then related to characteristics of the Young Learners tests offered by the Cambridge English examinations.
    • Comparing rating modes: analysing live, audio, and video ratings of IELTS Speaking Test performances

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; (Taylor & Francis, 2020-08-26)
      This mixed methods study compared IELTS examiners’ scores when assessing spoken performances under live and two ‘non-live’ testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers’ performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the many-facet Rasch model (MFRM). For all three modes, examiners provided written justifications for their ratings, and verbal reports were also collected to gain insights into examiner perceptions towards performance under the audio and video conditions. Results showed that, for all rating criteria, audio ratings were significantly lower than live and video ratings. Examiners noticed more negative performance features under the two non-live rating conditions, compared to the live condition. However, richer information about test-taker performance in the video mode appeared to cause raters to rely less on such negative evidence than audio raters when awarding scores. Verbal report data showed that having visual information in the video-rating mode helped examiners to understand what the test-takers were saying, to comprehend better what test-takers were communicating using non-verbal means, and to understand with greater confidence the source of test-takers’ hesitation, pauses, and awkwardness.
    • CRELLA and the socio-cognitive approach to test validation

      Chan, Sathena Hiu Chong; University of Bedfordshire (2013-10-31)
    • Demonstrating the cognitive validity and face validity of PTE Academic Writing items Summarize Written Text and Write Essay

      Chan, Sathena Hiu Chong (Pearson, 2011-07-01)
      This study examines the cognitive validity of two item types of the Writing Section of the PTE Academic test – Summarize Written Text and Write Essay - within Weir’s (2005) socio-cognitive framework for test validation. The study focuses on cognitive validity by investigating and comparing the cognitive processes of a group of ESL test takers undertaking Summarize Written Text (an integrated writing item) and Write Essay (an independent writing item). Cognitive validity is a ‘measure of how closely it [a writing task] represents the cognitive processing involved in writing contexts beyond the test itself’ (Shaw and Weir, 2007:34). In addition, the study investigates test takers’ opinions regarding the two different writing item types: independent and integrated. Test takers’ scores on both items are compared to investigate if the two performances correlate. The study uses screen capture technique to record test takers’ successive writing processes on both items, followed by retrospective stimulated recalls. The findings demonstrate that Summarize Written Text and Write Essay engage different cognitive processes that are essential in academic writing contexts. In particular, macro-planning and discourse synthesis processes such as selecting relevant ideas from source text are elicited by the Summarize Written Text item whereas processes in micro-planning, monitoring and revising at low levels are activated on the Write Essay item. In terms of test performances, the results show that test takers in this study performed significantly better on Write Essay than on Summarize Written Text.
    • The discourse of the IELTS Speaking Test : interactional design and practice

      Seedhouse, Paul; Nakatsuhara, Fumiyo (Cambridge University Press, 2018-02-15)
      The volume provides a unique dual perspective on the evaluation of spoken discourse in that it combines a detailed portrayal of the design of a face-to-face speaking test with its actual implementation in interactional terms. Using many empirical extracts of interaction from authentic IELTS Speaking Tests, the book illustrates how the interaction is organised in relation to the institutional aim of ensuring valid assessment. The relationship between individual features of the interaction and grading criteria is examined in detail across a number of different performance levels.
    • The effects of single and double play upon listening test outcomes and cognitive processing

      Field, John; British Council (British Council, 2015-01-01)
      Report on a project investigating the effects of playing recorded material twice upon test taker scores and upon their behaviour
    • Establishing test form and individual task comparability: a case study of a semi-direct speaking test

      Weir, Cyril J.; Wu, Jessica R.W.; University of Luton; Language Training and Testing Center, Taiwan (SAGE, 2006-04-01)
      Examination boards are often criticized for their failure to provide evidence of comparability across forms, and few such studies are publicly available. This study aims to investigate the extent to which three forms of the General English Proficiency Test Intermediate Speaking Test (GEPTS-I) are parallel in terms of two types of validity evidence: parallel-forms reliability and content validity. The three trial test forms, each containing three different task types (read-aloud, answering questions and picture description), were administered to 120 intermediate-level EFL learners in Taiwan. The performance data from the different test forms were analysed using classical procedures and Multi-Faceted Rasch Measurement (MFRM). Various checklists were also employed to compare the tasks in different forms qualitatively in terms of content. The results showed that all three test forms were statistically parallel overall and Forms 2 and 3 could also be considered parallel at the individual task level. Moreover, sources of variation to account for the variable difficulty of tasks in Form 1 were identified by the checklists. Results of the study provide insights for further improvement in parallel-form reliability of the GEPTS-I at the task level and offer a set of methodological procedures for other exam boards to consider. © 2006 Edward Arnold (Publishers) Ltd.
    • European language testing in a global context: proceedings of the 2001 ALTE conference.

      Milanovic, Michael; Weir, Cyril J. (Cambridge University Press, 2004-01-01)
      The ALTE conference, Euopean Language Testing in a Global Context, was held in Barcelona in 2001 in support of the European Year of Languages. The conference papers presented in ths volume represent a small subset of the many excellent presentations made at that event. They have been selected to provide a flavour of the issues that the conference addressed.
    • Exploring language assessment and testing: language in action

      Green, Anthony (Routledge, 2020-12-30)
      Exploring Language Assessment and Testing offers a straightforward and accessible introduction that starts from real-world experiences and uses practical examples to introduce the reader to the academic field of language assessment and testing. Extensively updated, with additional features such as reader tasks (with extensive commentaries from the author), a glossary of key terms and an annotated further reading section, this second edition provides coverage of recent theoretical and technological developments and explores specific purposes for assessment. Including concrete models and examples to guide readers into the relevant literature, this book also offers practical guidance for educators and researchers on designing, developing and using assessments. Providing an inclusive and impartial survey of both classroom-based assessment by teachers and larger-scale testing, this is an indispensable introduction for postgraduate and advanced undergraduate students studying Language Education, Applied Linguistics and Language Assessment.
    • Exploring language assessment and testing: language in action

      Green, Anthony (Taylor and Francis, 2013-10-01)
      This book is an indispensable introduction to the areas of language assessment and testing, and will be of interest to language teachers as well as postgraduate and advanced undergraduate students studying Language Education, Applied Linguistics and Language Assessment.
    • Exploring performance across two delivery modes for the IELTS Speaking Test: face-to-face and video-conferencing delivery (Phase 2)

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D. (IELTS Partners, 2017-10-04)
      Face-to-face speaking assessment is widespread as a form of assessment, since it allows the elicitation of interactional skills. However, face-to-face speaking test administration is also logistically complex, resource-intensive and can be difficult to conduct in geographically remote or politically sensitive areas. Recent advances in video-conferencing technology now make it possible to engage in online face-to-face interaction more successfully than was previously the case, thus reducing dependency upon physical proximity. A major study was, therefore, commissioned to investigate how new technologies could be harnessed to deliver the face-to-face version of the IELTS Speaking test.  Phase 1 of the study, carried out in London in January 2014, presented results and recommendations of a small-scale initial investigation designed to explore what similarities and differences, in scores, linguistic output and test-taker and examiner behaviour, could be discerned between face-to-face and internet-based videoconferencing delivery of the Speaking test (Nakatsuhara, Inoue, Berry and Galaczi, 2016). The results of the analyses suggested that the speaking construct remains essentially the same across both delivery modes.  This report presents results from Phase 2 of the study, which was a larger-scale followup investigation designed to: (i) analyse test scores obtained using more sophisticated statistical methods than was possible in the Phase 1 study (ii) investigate the effectiveness of the training for the video-conferencing- delivered test which was developed based on findings from the Phase 1 study (iii) gain insights into the issue of sound quality perception and its (perceived) effect (iv) gain further insights into test-taker and examiner behaviours across the two delivery modes (v) confirm the results of the Phase 1 study. Phase 2 of the study was carried out in Shanghai, People’s Republic of China in May 2015. Ninety-nine (99) test-takers each took two speaking tests under face-to-face and internet-based video-conferencing conditions. Performances were rated by 10 trained IELTS examiners. A convergent parallel mixed-methods design was used to allow for collection of an in-depth, comprehensive set of findings derived from multiple sources. The research included an analysis of rating scores under the two delivery conditions, test-takers’ linguistic output during the tests, as well as short interviews with test-takers following a questionnaire format. Examiners responded to two feedback questionnaires and participated in focus group discussions relating to their behaviour as interlocutors and raters, and to the effectiveness of the examiner training. Trained observers also took field notes from the test sessions and conducted interviews with the test-takers.  Many-Facet Rasch Model (MFRM) analysis of test scores indicated that, although the video-conferencing mode was slightly more difficult than the face-to-face mode, when the results of all analytic scoring categories were combined, the actual score difference was negligibly small, thus supporting the Phase 1 findings. Examination of language functions elicited from test-takers revealed that significantly more test-takers asked questions to clarify what the examiner said in the video-conferencing mode (63.3%) than in the face-to-face mode (26.7%) in Part 1 of the test. Sound quality was generally positively perceived in this study, being reported as 'Clear' or 'Very clear', although the examiners and observers tended to perceive it more positively than the test-takers. There did not seem to be any relationship between sound quality perceptions and the proficiency level of test-takers. While 71.7% of test-takers preferred the face-to-face mode, slightly more test-takers reported that they were more nervous in the face-to-face mode (38.4%) than in the video-conferencing mode (34.3%).  All examiners found the training useful and effective, the majority of them (80%) reporting that the two modes gave test-takers equal opportunity to demonstrate their level of English proficiency. They also reported that it was equally easy for them to rate test-taker performance in face-to-face and video-conferencing modes.  The report concludes with a list of recommendations for further research, including suggestions for further examiner and test-taker training, resolution of technical issues regarding video-conferencing delivery and issues related to rating, before any decisions about deploying a video-conferencing mode of delivery for the IELTS Speaking test are made.
    • Exploring performance across two delivery modes for the same L2 speaking test: face-to-face and video-conferencing delivery: a preliminary comparison of test-taker and examiner behaviour

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D. (The IELTS Partners: British Council, Cambridge English Language Assessment and IDP: IELTS Australia, 2016-11-10)
      This report presents the results of a preliminary exploration and comparison of test-taker and examiner behaviour across two different delivery modes for an IELTS Speaking test: the standard face-to-face test administration, and test administration using Internetbased video-conferencing technology. The study sought to compare performance features across these two delivery modes with regard to two key areas:  • an analysis of test-takers’ scores and linguistic output on the two modes and their perceptions of the two modes  • an analysis of examiners’ test management and rating behaviours across the two modes, including their perceptions of the two conditions for delivering the speaking test.  Data were collected from 32 test-takers who took two standardised IELTS Speaking tests under face-to-face and internet-based video-conferencing conditions. Four trained examiners also participated in this study. The convergent parallel mixed methods research design included an analysis of interviews with test-takers, as well as their linguistic output (especially types of language functions) and rating scores awarded under the two conditions. Examiners provided written comments justifying the scores they awarded, completed a questionnaire and participated in verbal report sessions to elaborate on their test administration and rating behaviour. Three researchers also observed all test sessions and took field notes.  While the two modes generated similar test score outcomes, there were some differences in functional output and examiner interviewing and rating behaviours. This report concludes with a list of recommendations for further research, including examiner and test-taker training and resolution of technical issues, before any decisions about deploying (or not) a video-conferencing mode of the IELTS Speaking test delivery are made. 
    • Implementing a learning-oriented approach within English Language assessment in Hong Kong schools: practices, issues and complexities.

      Hamp-Lyons, Liz (Palgrave Macmillan, 2016-12-16)
      This paper provides an overview of the multiple studies carried out between 2005 and 2011 on the Hong Kong School-based assessment (SBA), which was designed to implement an assessment for learning philosophy, and places the work within a learning-oriented language assessment (LOLA) paradigm (Hamp-Lyons & Green 2014) which is growing worldwide. The Hong Kong SBA continues to be used Hong Kong-wide to formatively assess the English as a second language speaking of all students in secondary years 4, 5 and 6. After discussing the structure and goals of this innovative assessment and its teacher language assessment literacy aims and processes, the chapter then discusses some of the constraints and issues, which have inhibited the degree to which the intended consequences have transpired. It points to compulsory ‘statistical moderation’, which undermines teachers’ trust in the new system; and to local contextual issues such as heavy reliance on ‘cram schools’, competition among school and teachers’ perceptions of fairness as being ‘the same for everyone’.
    • An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS Speaking Test

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda (The IELTS Partners: British Council, IDP: IELTS Australia and Cambridge English Language Assessment, 2017-03-01)
      This study compared IELTS examiners’ scores when they assessed test-takers’ spoken performance under live and two non-live rating conditions using audio and video recordings. It also explored examiners’ perceptions towards test-takers’ performance in the two non-live rating modes.  This was a mixed-methods study that involved both existing and newly collected datasets. A total of six trained IELTS examiners assessed 36 test-takers’ performance under the live, audio and video rating conditions. Their scores in the three modes of rating were calibrated using the multifaceted Rasch model analysis.  In all modes of rating, the examiners were asked to make notes on why they awarded the scores that they did on each analytical category. The comments were quantitatively analysed in terms of the volume of positive and negative features of test-takers’ performance that examiners reported noticing when awarding scores under the three rating conditions.  Using selected test-takers’ audio and video recordings, examiners’ verbal reports were also collected to gain insights into their perceptions towards test-takers’ performance under the two non-live conditions.  The results showed that audio ratings were significantly lower than live and video ratings for all rating categories. Examiners noticed more negative performance features of test-takers under the two non-live rating conditions than the live rating condition. The verbal report data demonstrated how having visual information in the video-rating mode helped examiners to understand test-takers’ utterances, to see what was happening beyond what the test-takers were saying and to understand with more confidence the source of test-takers’ hesitation, pauses and awkwardness in their performance.  The results of this study have, therefore, offered a better understanding of the three modes of rating, and a recommendation was made regarding enhanced double-marking methods that could be introduced to the IELTS Speaking Test.
    • An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS Speaking Test

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; University of Bedfordshire (IELTS Partners, 2017-03-01)
      This study compared IELTS examiners’ scores when they assessed test-takers’ spoken performance under live and two non-live rating conditions using audio and video recordings. It also explored examiners’ perceptions towards test-takers’ performance in the two non-live rating modes.  This was a mixed-methods study that involved both existing and newly collected datasets. A total of six trained IELTS examiners assessed 36 test-takers’ performance under the live, audio and video rating conditions. Their scores in the three modes of rating were calibrated using the multifaceted Rasch model analysis.  In all modes of rating, the examiners were asked to make notes on why they awarded the scores that they did on each analytical category. The comments were quantitatively analysed in terms of the volume of positive and negative features of test-takers’ performance that examiners reported noticing when awarding scores under the three rating conditions.  Using selected test-takers’ audio and video recordings, examiners’ verbal reports were also collected to gain insights into their perceptions towards test-takers’ performance under the two non-live conditions.  The results showed that audio ratings were significantly lower than live and video ratings for all rating categories. Examiners noticed more negative performance features of test-takers under the two non-live rating conditions than the live rating condition. The verbal report data demonstrated how having visual information in the video-rating mode helped examiners to understand test-takers’ utterances, to see what was happening beyond what the test-takers were saying and to understand with more confidence the source of test-takers’ hesitation, pauses and awkwardness in their performance.  The results of this study have, therefore, offered a better understanding of the three modes of rating, and a recommendation was made regarding enhanced double-marking methods that could be introduced to the IELTS Speaking Test.
    • Language assessment literacy for learning-oriented language assessment

      Hamp-Lyons, Liz (Australian Association of Applied Linguistics, 2017-12-16)
       A small-scale and exploratory study explored a set of authentic speaking test video samples from the Cambridge: First (First Certificate of English) speaking test, in order to learn whether, and where, opportunities might be revealed in, or inserted into formal speaking tests, order to provide language assessment literacy opportunities for language teachers teaching in test preparation courses as well as teachers training to become speaking test raters. By paying particular attention to some basic components of effective interaction that we would want an examiner or interlocutor to exhibit if they seek to encourage interactive responses from test candidates. Looking closely at body language (in particular eye contact; intonation, pacing and pausing), management of turn-taking, and elicitation of candidate-candidate interaction we saw ways in which a shift in focus to view tests as learning opportunities is possible: we call this new focus learning-oriented language assessment (LOLA).
    • Language testing and validation: an evidence based approach

      Weir, Cyril J. (Palgrave, 2005-01-01)
      Tests for the measurement of language abilities must be constructed according to a coherent validity framework based on the latest developments in theory and practice. This innovative book, by a world authority on language testing, deals with all key aspects of language test design and implementation. It provides a road map to effective testing based on the latest approaches to test validation. A book for all MA students in Applied Linguistics or TESOL, and for professional language teachers