• Researching participants taking IELTS Academic Writing Task 2 (AWT2) in paper mode and in computer mode in terms of score equivalence, cognitive validity and other factors

      Chan, Sathena Hiu Chong; Bax, Stephen; Weir, Cyril J. (British Council and IDP: IELTS Australia, 2017-08-01)
      Computer-based (CB) assessment is becoming more common in most university disciplines, and international language testing bodies now routinely use computers for many areas of English language assessment. Given that, in the near future, IELTS also will need to move towards offering CB options alongside traditional paper-based (PB) modes, the research reported here prepares for that possibility, building on research carried out some years ago which investigated the statistical comparability of the IELTS writing test between the two delivery modes, and offering a fresh look at the relevant issues. By means of questionnaire and interviews, the current study investigates the extent to which 153 test-takers’ cognitive processes, while completing IELTS Academic Writing in PB mode and in CB mode, compare with the real-world cognitive processes of students completing academic writing at university. A major contribution of our study is its use – for the first time in the academic literature – of data from research into cognitive processes within real-world academic settings as a comparison with cognitive processing during academic writing under test conditions. The most important conclusion from the study is that according to the 5-facet MFRM analysis, there were no significant differences in the scores awarded by two independent raters for candidates’ performances on the tests taken under two conditions, one paper-and-pencil and the other computer. Regarding analytic scores criteria, the differences in three areas (i.e. Task Achievement, Coherence and Cohesion, and Grammatical Range and Accuracy) were not significant, but the difference reported in Lexical Resources was significant, if slight. In summary, the difference of scores between the two modes is at an acceptable level. With respect to the cognitive processes students employ in performing under the two conditions of the test, results of the Cognitive Process Questionnaire (CPQ) survey indicate a similar pattern between the cognitive processes involved in writing on a computer and writing with paper-and-pencil. There were no noticeable major differences in the general tendency of the mean of each questionnaire item reported on the two test modes. In summary, the cognitive processes were employed in a similar fashion under the two delivery conditions. Based on the interview data (n=30), it appears that the participants reported using most of the processes in a similar way between the two modes. Nevertheless, a few potential differences indicated by the interview data might be worth further investigation in future studies. The Computer Familiarity Questionnaire survey shows that these students in general are familiar with computer usage and their overall reactions towards working with a computer are positive. Multiple regression analysis, used to find out if computer familiarity had any effect on students’ performances on the two modes, suggested that test-takers who do not have a suitable familiarity profile might perform slightly worse than those who do, in computer mode. In summary, the research offered in this report offers a unique comparison with realworld academic writing, and presents a significant contribution to the research base which IELTS and comparable international testing bodies will need to consider, if they are to introduce CB test versions in future.
    • Researching the cognitive validity of GEPT high-intermediate and advanced reading : an eye tracking an stimulated recall study

      Bax, Stephen; Chan, Sathena Hiu Chong (Language Training and Testing Center (LTTC), 2016-07-01)
      It is important for any language test to establish its cognitive validity in order to ensure that the test elicits from test takers those cognitive processes which correspond to the processes which they would normally employ in the target real-life context (Weir 2005). This study investigates the cognitive validity of the GEPT Reading Test at two levels. High-intermediate (CEFR B2) and Advanced (CEFR C1), using innovative eye-tracking technology and detailed stimulated recall interviews and surveys. Representative reading items were carefully selected from across all parts of the GEPT High- Intermediate Level Reading Test and the GEPT Advanced Level Reading Test. Taiwanese students (n=24) studying Masters level programmes at British universities were asked to complete the test items on a computer, while the Tobii X2 Eye Tracker was used to track their gaze behaviour during completion of the test items. Immediately after they had completed each individual part, they were asked to report the cognitive process they employed by using a Reading Process Checklist, and a further (n=8) then participated in a detailed stimulated recall interview while viewing video footage of their gaze patterns. Taking into account all these sources of data, it was found that the High-Intermediate section of the GEPT test successfully elicited and tested an appropriate range of lower and higher cognitive processes, as defined in Khalifa and Weir (2009). It was also concluded that the Advanced sections of the test elicited the same set of cognitive processes as the High- Intermediate test, with the addition in the final section of the most difficult of all in Khalifa and Weir's scheme. In summary, it is apparent that the two elements of the GEPT test which were researched in this project were successful in requiring of candidates the range of cognitive processing activity commensurate with High-Intermediate and Advanced reading levels respectively, which is an important element in establishing the cognitive validity of the GEPT test.
    • Researching the comparability of paper-based and computer-based delivery in a high-stakes writing test

      Chan, Sathena Hiu Chong; Bax, Stephen; Weir, Cyril J. (Elsevier, 2018-04-07)
      International language testing bodies are now moving rapidly towards using computers for many areas of English language assessment, despite the fact that research on comparability with paper-based assessment is still relatively limited in key areas. This study contributes to the debate by researching the comparability of a highstakes EAP writing test (IELTS) in two delivery modes, paper-based (PB) and computer-based (CB). The study investigated 153 test takers' performances and their cognitive processes on IELTS Academic Writing Task 2 in the two modes, and the possible effect of computer familiarity on their test scores. Many-Facet Rasch Measurement (MFRM) was used to examine the difference in test takers' scores between the two modes, in relation to their overall and analytic scores. By means of questionnaires and interviews, we investigated the cognitive processes students employed under the two conditions of the test. A major contribution of our study is its use - for the first time in the computer-based writing assessment literature - of data from research into cognitive processes within realworld academic settings as a comparison with cognitive processing during academic writing under test conditions. In summary, this study offers important new insights into academic writing assessment in computer mode.
    • Restoring perspective on the IELTS test

      Green, Anthony (Oxford University Press, 2019-03-18)
      This article presents a response to William Pearson’s article, ‘Critical Perspectives on the IELTS Test’. It addresses his critique of the role of IELTS as a test for regulating international mobility and access to English medium education and evaluates his more specific prescriptions for the improvements to the quality of the test itself.
    • Rethinking the second language listening test : from theory to practice

      Field, John (Equinox, 2019-03-01)
      The book begins with an account of the various processes that contribute to listening, in order to raise awareness of the difficulties faced by second language learners. This information feeds in to a new set of descriptors of listening behaviour across proficiency levels and informs much of the discussion in later chapters. The main body of the book critically examines the various components of a listening test, challenging some of the false assumptions behind them and proposing practical alternatives. The discussion covers: the recording-as-text, the recording-as-speech, conventions of test delivery, standard task formats and item design. Major themes are the critical role played by the recorded material and the degree to which tests impose demands that go beyond those of real-world listening. The following section focuses on two types of listener with different needs from the general candidate: those aiming to demonstrate academic or professional proficiency in English and young language learners, where level of cognitive development is an issue for test design. There is a brief reflection on the extent to which integrated listening tests reflect the reality of listening events. The book concludes with a report of a study into how feasible it is to identify the information load of a listening text, a factor potentially contributing to test difficulty.
    • Reviewing the suitability of English language tests for providing the GMC with evidence of doctors' English proficiency

      Taylor, Lynda; Chan, Sathena Hiu Chong (The General Medical Council, 2015-05-13)
      The research project described in this report set out to identify English language proficiency (ELP) test(s) which might be considered comparable to IELTS in terms of their suitability for satisfying the General Medical Council (the GMC) of the English language proficiency of doctors applying for registration and licensing in the UK.  Through a process of consultation between CRELLA and the GMC, the specific aims of the IELTS Equivalence Research Project were established as follows:  1. To identify a comprehensive list of other available test(s) of English language proficiency and/or communication skills apart from IELTS, including any that are specifically used within a medical context (UK and international).   2. To consider how other professional regulatory bodies (both UK and international) check for and confirm an acceptable level of English language proficiency prior to entry into a technical, high-risk profession.  3. To compare the list of tests identified in (1) above to IELTS, with respect to their suitability on a range of essential quality criteria. IELTS was recognised, therefore, as constituting the criterion or standard of suitability against which other potentially suitable English language proficiency tests should be compared.  4. To identify, should one or more tests be considered as at least as suitable as IELTS, what would be the equivalent for these test(s) to the GMC’s current requirements for the academic version of IELTS, as well as how the equivalent scores identified on alternative tests compare to the levels of the Common European Framework of Reference for Languages (2001).
    • The role of listening in oral interview tests

      Nakatsuhara, Fumiyo; University of Bedfordshire (2012-03-15)
    • The role of the L1 in testing L2 English

      Nakatsuhara, Fumiyo; Taylor, Lynda; Jaiyote, Suwimol (Cambridge University Press, 2018-11-28)
      This chapter compares and contrasts two research studies that addressed the role of L1 in the assessment of L2 spoken English. The first research is a small-scale, mixed-methods study which explored the impact of test-takers’ L1 backgrounds in the paired speaking task of a standardised test of general English provided by an international examination board (Nakatsuhara and Jaiyote, 2015). The key question in the research was how we can ensure fairness to test-takers who perform paired tests in shared and non-shared L1 pairs. The second research is a large-scale, a priori test validation study conducted as a part of the development of a new EAP (English for academic purposes) test offered by a national examination board, targeting only single L1 users (Nakatsuhara, 2014). Of particular interest is the way in which its pronunciation rating scale was developed and validated in the single L1 context. In light of these examples of research into international and locally-developed tests, this chapter aims to demonstrate the importance of the construct of a test and its score usage when reconsidering a) whether specific English varieties are considered to be construct-relevant or construct-irrelevant and b) what Englishes (rather than ‘standard’ English) should be elicited and assessed. Nakatsuhara, F. (2014). A Research Report on the Development of the Test of English for Academic Purposes (TEAP) Speaking Test for Japanese University Entrants – Study 1 & Study 2, available on line at: www.eiken.or.jp/teap/group/pdf/teap_speaking_report1.pdf Nakatsuhara, F. and Jaiyote, S. (2015). Exploring the impact of test-takers’ L1 backgrounds on paired speaking test performance: how do they perform in shared and non-shared L1 pairs? BAAL / Cambridge University Press Applied Linguistics Seminar, York St John University, UK (24-26/06/2015).
    • Scaling and scheming: the highs and lows of scoring writing

      Green, Anthony; University of Bedfordshire (2019-12-04)
    • Scoring validity of the Aptis speaking test : investigating fluency across tasks and levels of proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (British Council, 2017-11-16)
      Second language oral fluency has long been considered as an important construct in communicative language ability (e.g. de Jong et al, 2012) and many speaking tests are designed to measure fluency aspect(s) of candidates’ language (e.g. IELTS, TOEFL iBT, PTE Academic). Current research in second language acquisition suggests that a number of measures of speed, breakdown and repair fluency can reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterise fluency at each level of proficiency, and which can consistently distinguish one proficiency level from the next. This study is an attempt to help answer these questions. This study investigated fluency constructs across four different levels of proficiency (A2–C1) and four different semi-direct speaking test tasks performed by 32 candidates taking the Aptis Speaking test. Using PRAAT (Boersma & Weenik, 2013), we analysed 120 task performances on different aspects of utterance fluency including speed, breakdown and repair measures across different tasks and levels of proficiency. The results suggest that speed measures consistently distinguish fluency across different levels of proficiency, and many of the breakdown measures differentiate between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different proficiency levels and tasks suggest that a more complex process is at play. The non-significant differences between most of fluency measures in the four tasks suggest that fluency is not affected by task type in the Aptis Speaking test. The implications of the findings are discussed in relation to the Aptis Speaking test fluency rating scales and rater training materials. 
    • Second language listening: current ideas, current issues

      Field, John (Cambridge University Press, 2019-06-01)
      This chapter starts by mentioning the drawbacks of the approach conventionally adopted in L2 listening instruction – in particular, its focus on the products of listening rather than the processes that contribute to it. It then offers an overview of our present understanding of what those processes are, drawing upon research findings in psycholinguistics, phonetics and Applied Linguistics. Section 2 examines what constitutes proficient listening and how the performance of an L2 listener diverges from it; and Section 3 considers the perceptual problems caused by the nature of spoken input. Subsequent sections then cover various areas of research in L2 listening. Section 4 provides a brief summary of topics that have been of interest to researchers over the years; and Section 5 reviews the large body of research into listening strategies. Section 6 then covers a number of interesting issues that have come to the fore in recent studies: multimodality, levels of listening vocabulary, cross-language phoneme perception, the use of a variety of accents, the validity of playing a recording twice, text authenticity and listening anxiety. A final section identifies one or two recurring themes that have arisen, and considers how instruction is likely to develop in future.
    • Some evidence of the development of L2 reading-into-writing skills at three levels

      Chan, Sathena Hiu Chong; University of Bedfordshire (Castledown, 2018-09-05)
      While an integrated format has been widely incorporated into high-stakes writing assessment, there is relatively little research on students’ cognitive processing involved in integrated reading-into-writing tasks. Even research which reviews how the reading-into-writing construct is distinct from one level to the other is scarce. Using a writing process questionnaire, we examined and compared test takers’ cognitive processes on integrated reading-into-writing tasks at three levels. More specifically, the study aims to provide evidence of the predominant reading-into-writing processes appropriate at each level (i.e., the CEFR B1, B2, and C1 levels). The findings of the study reveal the core processes which are essential to the reading-into-writing construct at all three levels. There is also a clear progression of the reading-into-writing skills employed by the test takers across the three CEFR levels. A multiple regression analysis was used to examine the impact of the individual processes on predicting the writers’ level of reading-into-writing abilities. The findings provide empirical evidence concerning the cognitive validity of reading-into-writing tests and have important implications for task design and scoring at each level.
    • Study writing: a course in written English for academic purposes

      Hamp-Lyons, Liz; Heasley, Ben (Cambridge University Press, 2006-07-01)
      Study Writing is an ideal reference book for EAP students who want to write better academic essays, projects, research articles or theses. The book helps students at intermediate level develop their academic writing skills and strategies by: * introducing key concepts in academic writing, such as the role of generalizations and definitions, and their application. * exploring the use of information structures, including those used to develop and present an argument. * familiarizing learners with the characteristics of academic genre and analysing the grammar and vocabulary associated with them. * encouraging students to seek feedback on their own writing and analyse expert writers' texts in order to become more reflective and effective writers. This second edition has been updated to reflect modern thinking in the teaching of writing. It includes more recent texts in the disciplines presented and takes into account new media and the growth of online resources.
    • Testing four skills in Japan

      Green, Anthony; University of Bedfordshire (Japan Society of English Language Education, 2016-02-01)
      This paper considers arguments for the testing of spoken language skills in Japan and the contribution the use of such tests might make to language education. The Japanese government, recognising the importance of spontaneous social interaction in English to participation in regional and global communities, mandates the development of all ‘four skills’ (Reading, Writing, Listening and Speaking) in schools. However, university entrance tests continue to emphasize the written language. Because they control access to opportunities, entrance tests tend to dominate teaching and learning. They are widely believed to encourage traditional forms of teaching and to inhibit speaking and listening activities in the classroom. Comprehensive testing of spoken language skills should, in contrast, encourage (or at least not discourage) the teaching and learning of these skills. On the other hand, testing spoken language skills also represents a substantial challenge. New organisational structures are needed to support new testing formats and these will be unfamiliar to all involved, resulting in an increased risk of system failures. Introducing radical change to any educational system is likely to provoke a reaction from those who benefit most from the status quo. For this reason, critics will be ready to exploit any perceived shortcomings to reverse innovative policies. Experience suggests that radical changes in approaches to testing are unlikely to deliver benefits for the education system unless they are well supported by teacher training, new materials and public relations initiatives. The introduction of spoken language tests is no doubt essential to the success of Japan’s language policies, but is not without risk and needs to be carefully integrated with other aspects of the education system.
    • Testing speaking skills: why and how?

      Nakatsuhara, Fumiyo; Inoue, Chihiro; University of Bedfordshire (2013-09-16)
    • Three current, interconnected concerns for writing assessment

      Hamp-Lyons, Liz (Elsevier Ltd, 2014-09-26)
      Editorial
    • Topic and background knowledge effects on performance in speaking assessment

      Khabbazbashi, Nahal (Sage, 2015-08-10)
      This study explores the extent to which topic and background knowledge of topic affect spoken performance in a high-stakes speaking test. It is argued that evidence of a substantial influence may introduce construct-irrelevant variance and undermine test fairness. Data were collected from 81 non-native speakers of English who performed on 10 topics across three task types. Background knowledge and general language proficiency were measured using self-report questionnaires and C-tests respectively. Score data were analysed using many-facet Rasch measurement and multiple regression. Findings showed that for two of the three task types, the topics used in the study generally exhibited difficulty measures which were statistically distinct. However, the size of the differences in topic difficulties was too small to have a large practical effect on scores. Participants’ different levels of background knowledge were shown to have a systematic effect on performance. However, these statistically significant differences also failed to translate into practical significance. Findings hold implications for speaking performance assessment.