• Academic speaking: does the construct exist, and if so, how do we test it?

      Inoue, Chihiro; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Taylor, Lynda; University of Bedfordshire (2018-03-14)
    • Aspects of fluency across assessed levels of speaking proficiency

      Tavakoli, Parveneh; Nakatsuhara, Fumiyo; Hunter, Ann-Marie (Wiley, 2020-01-25)
      Recent research in second language acquisition suggests that a number of speed, breakdown, repair and composite measures reliably assess fluency and predict proficiency. However, there is little research evidence to indicate which measures best characterize fluency at each assessed level of proficiency, and which can consistently distinguish one level from the next. This study investigated fluency in 32 speakers’ performing four tasks of the British Council’s Aptis Speaking test, which were awarded four different levels of proficiency (CEFR A2-C1). Using PRAAT, the performances were analysed for various aspects of utterance fluency across different levels of proficiency. The results suggest that speed and composite measures consistently distinguish fluency from the lowest to upper-intermediate levels (A2-B2), and many breakdown measures differentiate between the lowest level (A2) and the rest of the proficiency groups, with a few differentiating between lower (A2, B1) and higher levels (B2, C1). The varied use of repair measures at different levels suggest that a more complex process is at play. The findings imply that a detailed micro-analysis of fluency offers a more reliable understanding of the construct and its relationship with assessment of proficiency.
    • Cognitive validity in the testing of speaking

      Field, John; University of Bedfordshire (2013-11-17)
    • The cognitive validity of tests of listening and speaking designed for young learners

      Field, John (Cambridge University Press, 2018-06)
      The notion of cognitive validity becomes considerably more complicated when one extends it to  tests designed for Young Learners. It then becomes necessary to take full account of the level of cognitive development of the target population (their ability to handle certain mental operations and not others). It may also be necessary to include some consideration of their level of linguistic development in L1: in particular, the degree of proficiency they may have achieved in each of the four skills. This chapter examines the extent to which awareness of the cognitive development of young learners up to the age of 12 should and does influence the decisions made by those designing tests of second language listening and speaking. The limitations and strengths of young learners of this age range are matched against the various processing demands entailed in second language listening and speaking and are then related to characteristics of the Young Learners tests offered by the Cambridge English examinations.
    • A comparative study of the variables used to measure syntactic complexity and accuracy in task-based research

      Inoue, Chihiro; University of Bedfordshire (Taylor & Francis (Routledge): SSH Titles, 2016-04-12)
      The constructs of complexity, accuracy and fluency (CAF) have been used extensively to investigate learner performance on second language tasks. However, a serious concern is that the variables used to measure these constructs are sometimes used conventionally without any empirical justification. It is crucial for researchers to understand how results might be different depending on which measurements are used, and accordingly, choose the most appropriate variables for their research aims. The first strand of this article examines the variables conventionally used to measure syntactic complexity in order to identify which may be the best indicators of different proficiency levels, following suggestions by Norris and Ortega. The second strand compares the three variables used to measure accuracy in order to identify which one is most valid. The data analysed were spoken performances by 64 Japanese EFL students on two picture-based narrative tasks, which were rated at Common European Framework of Reference for Languages (CEFR) A2 to B2 according to Rasch-adjusted ratings by seven human judges. The tasks performed were very similar, but had different degrees of what Loschky and Bley-Vroman term ‘task-essentialness’ for subordinate clauses. It was found that the variables used to measure syntactic complexity yielded results that were not consistent with suggestions by Norris and Ortega. The variable found to be the most valid for measuring accuracy was errors per 100 words. Analysis of transcripts revealed that results were strongly influenced by the differing degrees of task-essentialness for subordination between the two tasks, as well as the spread of errors across different units of analysis. This implies that the characteristics of test tasks need to be carefully scrutinised, followed by careful piloting, in order to ensure greater validity and reliability in task-based research.
    • Comparing rating modes: analysing live, audio, and video ratings of IELTS Speaking Test performances

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; (Taylor & Francis, 2020-08-26)
      This mixed methods study compared IELTS examiners’ scores when assessing spoken performances under live and two ‘non-live’ testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers’ performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the many-facet Rasch model (MFRM). For all three modes, examiners provided written justifications for their ratings, and verbal reports were also collected to gain insights into examiner perceptions towards performance under the audio and video conditions. Results showed that, for all rating criteria, audio ratings were significantly lower than live and video ratings. Examiners noticed more negative performance features under the two non-live rating conditions, compared to the live condition. However, richer information about test-taker performance in the video mode appeared to cause raters to rely less on such negative evidence than audio raters when awarding scores. Verbal report data showed that having visual information in the video-rating mode helped examiners to understand what the test-takers were saying, to comprehend better what test-takers were communicating using non-verbal means, and to understand with greater confidence the source of test-takers’ hesitation, pauses, and awkwardness.
    • A comparison of holistic, analytic, and part marking models in speaking assessment

      Khabbazbashi, Nahal; Galaczi, Evelina D. (SAGE, 2020-01-24)
      This mixed methods study examined holistic, analytic, and part marking models (MMs) in terms of their measurement properties and impact on candidate CEFR classifications in a semi-direct online speaking test. Speaking performances of 240 candidates were first marked holistically and by part (phase 1). On the basis of phase 1 findings – which suggested stronger measurement properties for the part MM – phase 2 focused on a comparison of part and analytic MMs. Speaking performances of 400 candidates were rated analytically and by part during that phase. Raters provided open comments on their marking experiences. Results suggested a significant impact of MM; approximately 30% and 50% of candidates in phases 1 and 2 respectively were awarded different (adjacent) CEFR levels depending on the choice of MM used to assign scores. There was a trend of higher CEFR levels with the holistic MM and lower CEFR levels with the part MM. While strong correlations were found between all pairings of MMs, further analyses revealed important differences. The part MM was shown to display superior measurement qualities particularly in allowing raters to make finer distinctions between different speaking ability levels. These findings have implications for the scoring validity of speaking tests.
    • Developing tools for learning oriented assessment of interactional competence: bridging theory and practice

      May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel M. K.; Galaczi, Evelina D. (SAGE Publications, 2019-10-01)
      In this paper we report on a project in which we developed tools to support the classroom assessment of learners’ interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test.  Six trained examiners provided stimulated verbal reports (n=72) on 12 paired interactions, focusing on interactional features of candidates’ performance. We thematically analyzed the verbal reports to inform a draft checklist and materials, which were then trialled by four experienced teachers. Informed by both data sources, the final product comprised (a) a detailed IC checklist with nine main categories and over 50 sub-categories, accompanying detailed description of each area and feedback to learners, which teachers can adapt to suit their teaching and testing contexts, and (b) a concise IC checklist with four categories and bite-sized feedback for real-time classroom assessment. IC, a key aspect of face-to-face communication, is under-researched and under-explored in second/foreign language teaching, learning, and assessment contexts. This in-depth treatment of it, therefore, stands to contribute to learning contexts through raising teachers’ and learners’ awareness of micro-level features of the construct, and to assessment contexts through developing a more comprehensive understanding of the construct.
    • The discourse of the IELTS Speaking Test : interactional design and practice

      Seedhouse, Paul; Nakatsuhara, Fumiyo (Cambridge University Press, 2018-02-15)
      The volume provides a unique dual perspective on the evaluation of spoken discourse in that it combines a detailed portrayal of the design of a face-to-face speaking test with its actual implementation in interactional terms. Using many empirical extracts of interaction from authentic IELTS Speaking Tests, the book illustrates how the interaction is organised in relation to the institutional aim of ensuring valid assessment. The relationship between individual features of the interaction and grading criteria is examined in detail across a number of different performance levels.
    • Establishing test form and individual task comparability: a case study of a semi-direct speaking test

      Weir, Cyril J.; Wu, Jessica R.W.; University of Luton; Language Training and Testing Center, Taiwan (SAGE, 2006-04-01)
      Examination boards are often criticized for their failure to provide evidence of comparability across forms, and few such studies are publicly available. This study aims to investigate the extent to which three forms of the General English Proficiency Test Intermediate Speaking Test (GEPTS-I) are parallel in terms of two types of validity evidence: parallel-forms reliability and content validity. The three trial test forms, each containing three different task types (read-aloud, answering questions and picture description), were administered to 120 intermediate-level EFL learners in Taiwan. The performance data from the different test forms were analysed using classical procedures and Multi-Faceted Rasch Measurement (MFRM). Various checklists were also employed to compare the tasks in different forms qualitatively in terms of content. The results showed that all three test forms were statistically parallel overall and Forms 2 and 3 could also be considered parallel at the individual task level. Moreover, sources of variation to account for the variable difficulty of tasks in Form 1 were identified by the checklists. Results of the study provide insights for further improvement in parallel-form reliability of the GEPTS-I at the task level and offer a set of methodological procedures for other exam boards to consider. © 2006 Edward Arnold (Publishers) Ltd.
    • Exploring performance across two delivery modes for the IELTS Speaking Test: face-to-face and video-conferencing delivery (Phase 2)

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D. (IELTS Partners, 2017-10-04)
      Face-to-face speaking assessment is widespread as a form of assessment, since it allows the elicitation of interactional skills. However, face-to-face speaking test administration is also logistically complex, resource-intensive and can be difficult to conduct in geographically remote or politically sensitive areas. Recent advances in video-conferencing technology now make it possible to engage in online face-to-face interaction more successfully than was previously the case, thus reducing dependency upon physical proximity. A major study was, therefore, commissioned to investigate how new technologies could be harnessed to deliver the face-to-face version of the IELTS Speaking test.  Phase 1 of the study, carried out in London in January 2014, presented results and recommendations of a small-scale initial investigation designed to explore what similarities and differences, in scores, linguistic output and test-taker and examiner behaviour, could be discerned between face-to-face and internet-based videoconferencing delivery of the Speaking test (Nakatsuhara, Inoue, Berry and Galaczi, 2016). The results of the analyses suggested that the speaking construct remains essentially the same across both delivery modes.  This report presents results from Phase 2 of the study, which was a larger-scale followup investigation designed to: (i) analyse test scores obtained using more sophisticated statistical methods than was possible in the Phase 1 study (ii) investigate the effectiveness of the training for the video-conferencing- delivered test which was developed based on findings from the Phase 1 study (iii) gain insights into the issue of sound quality perception and its (perceived) effect (iv) gain further insights into test-taker and examiner behaviours across the two delivery modes (v) confirm the results of the Phase 1 study. Phase 2 of the study was carried out in Shanghai, People’s Republic of China in May 2015. Ninety-nine (99) test-takers each took two speaking tests under face-to-face and internet-based video-conferencing conditions. Performances were rated by 10 trained IELTS examiners. A convergent parallel mixed-methods design was used to allow for collection of an in-depth, comprehensive set of findings derived from multiple sources. The research included an analysis of rating scores under the two delivery conditions, test-takers’ linguistic output during the tests, as well as short interviews with test-takers following a questionnaire format. Examiners responded to two feedback questionnaires and participated in focus group discussions relating to their behaviour as interlocutors and raters, and to the effectiveness of the examiner training. Trained observers also took field notes from the test sessions and conducted interviews with the test-takers.  Many-Facet Rasch Model (MFRM) analysis of test scores indicated that, although the video-conferencing mode was slightly more difficult than the face-to-face mode, when the results of all analytic scoring categories were combined, the actual score difference was negligibly small, thus supporting the Phase 1 findings. Examination of language functions elicited from test-takers revealed that significantly more test-takers asked questions to clarify what the examiner said in the video-conferencing mode (63.3%) than in the face-to-face mode (26.7%) in Part 1 of the test. Sound quality was generally positively perceived in this study, being reported as 'Clear' or 'Very clear', although the examiners and observers tended to perceive it more positively than the test-takers. There did not seem to be any relationship between sound quality perceptions and the proficiency level of test-takers. While 71.7% of test-takers preferred the face-to-face mode, slightly more test-takers reported that they were more nervous in the face-to-face mode (38.4%) than in the video-conferencing mode (34.3%).  All examiners found the training useful and effective, the majority of them (80%) reporting that the two modes gave test-takers equal opportunity to demonstrate their level of English proficiency. They also reported that it was equally easy for them to rate test-taker performance in face-to-face and video-conferencing modes.  The report concludes with a list of recommendations for further research, including suggestions for further examiner and test-taker training, resolution of technical issues regarding video-conferencing delivery and issues related to rating, before any decisions about deploying a video-conferencing mode of delivery for the IELTS Speaking test are made.
    • Exploring performance across two delivery modes for the same L2 speaking test: face-to-face and video-conferencing delivery: a preliminary comparison of test-taker and examiner behaviour

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D. (The IELTS Partners: British Council, Cambridge English Language Assessment and IDP: IELTS Australia, 2016-11-10)
      This report presents the results of a preliminary exploration and comparison of test-taker and examiner behaviour across two different delivery modes for an IELTS Speaking test: the standard face-to-face test administration, and test administration using Internetbased video-conferencing technology. The study sought to compare performance features across these two delivery modes with regard to two key areas:  • an analysis of test-takers’ scores and linguistic output on the two modes and their perceptions of the two modes  • an analysis of examiners’ test management and rating behaviours across the two modes, including their perceptions of the two conditions for delivering the speaking test.  Data were collected from 32 test-takers who took two standardised IELTS Speaking tests under face-to-face and internet-based video-conferencing conditions. Four trained examiners also participated in this study. The convergent parallel mixed methods research design included an analysis of interviews with test-takers, as well as their linguistic output (especially types of language functions) and rating scores awarded under the two conditions. Examiners provided written comments justifying the scores they awarded, completed a questionnaire and participated in verbal report sessions to elaborate on their test administration and rating behaviour. Three researchers also observed all test sessions and took field notes.  While the two modes generated similar test score outcomes, there were some differences in functional output and examiner interviewing and rating behaviours. This report concludes with a list of recommendations for further research, including examiner and test-taker training and resolution of technical issues, before any decisions about deploying (or not) a video-conferencing mode of the IELTS Speaking test delivery are made. 
    • Exploring the use of video-conferencing technology in the assessment of spoken language: a mixed-methods study

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Berry, Vivien; Galaczi, Evelina D.; University of Bedfordshire; British Council; Cambridge English Language Assessment (Taylor & Francis, 2017-02-10)
      This research explores how internet-based video-conferencing technology can be used to deliver and conduct a speaking test, and what similarities and differences can be discerned between the standard and computer-mediated face-to-face modes. The context of the study is a high-stakes speaking test, and the motivation for the research is the need for test providers to keep under constant review the extent to which their tests are accessible and fair to a wide constituency of test takers. The study examines test-takers’ scores and linguistic output, and examiners’ test administration and rating behaviors across the two modes. A convergent parallel mixed-methods research design was used, analyzing test-takers’ scores and language functions elicited, examiners’ written comments, feedback questionnaires and verbal reports, as well as observation notes taken by researchers. While the two delivery modes generated similar test score outcomes, some differences were observed in test-takers’ functional output and the behavior of examiners who served as both raters and interlocutors.
    • The IELTS Speaking Test: what can we learn from examiner voices?

      Inoue, Chihiro; Khabbazbashi, Nahal; Lam, Daniel M. K.; Nakatsuhara, Fumiyo; University of Bedfordshire (2018-11-25)
    • Interactional competence with and without extended planning time in a group oral assessment

      Lam, Daniel M. K. (Routledge, Taylor & Francis Group, 2019-05-02)
      Linking one’s contribution to those of others’ is a salient feature demonstrating interactional competence in paired/group speaking assessments. While such responses are to be constructed spontaneously while engaging in real-time interaction, the amount and nature of pre-task preparation in paired/group speaking assessments may have an influence on how such an ability (or lack thereof) could manifest in learners’ interactional performance. Little previous research has examined the effect of planning time on interactional aspects of paired/group speaking task performance. Within the context of school-based assessment in Hong Kong, this paper analyzes the discourse of two group interactions performed by the same four student-candidates under two conditions: (a) with extended planning time (4–5 hours), and (b) without extended planning time (10 minutes), with the aim of exploring any differences in student-candidates’ performance of interactional competence in this assessment task. The analysis provides qualitative discourse evidence that extended planning time may impede the assessment task’s capacity to discriminate between stronger and weaker candidates’ ability to spontaneously produce responses contingent on previous speaker contribution. Implications for the implementation of preparation time for the group interaction task are discussed.
    • Investigating examiner interventions in relation to the listening demands they make on candidates in oral interview tests

      Nakatsuhara, Fumiyo (John Benjamins, 2018-08-08)
      Examiners intervene in second language oral interviews in order to elicit intended language functions, to probe a candidate’s proficiency level or to keep the interaction going. Interventions of this kind can affect the candidate’s output language and score, since the candidate is obliged to process them as a listener and respond to them as a speaker. This chapter reports on a study that examined forty audio-recorded interviews of the oral test of a major European examination board, with a view to examining examiner interventions (i.e., questions, comments) in relation to the listening demands they make upon candidates. Half of the interviews involved candidates who scored highly on the test while the other half featured low-scoring candidates. This enabled a comparison of the language and behaviour of the same examiner across candidate proficiency levels, to see how they were modified in response to the communicative competence of the candidate. The recordings were transcribed and analyzed with regard to a) types of examiner intervention in terms of linguistic and pragmatic features and b) the extent to which the interventions varied in response to the proficiency level of the candidate. The study provides a new insight into examiner-examinee interactions, by identifying how examiners are differentiating listening demands according to the task types and the perceived proficiency level of the candidate. It offers several implications about the ways in which examiner interventions engage candidates’ listening skills, and the ways in which listening skills can be more validly and reliably measured when using a format based on examiner-candidate interaction.
    • An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS Speaking Test

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda (The IELTS Partners: British Council, IDP: IELTS Australia and Cambridge English Language Assessment, 2017-03-01)
      This study compared IELTS examiners’ scores when they assessed test-takers’ spoken performance under live and two non-live rating conditions using audio and video recordings. It also explored examiners’ perceptions towards test-takers’ performance in the two non-live rating modes.  This was a mixed-methods study that involved both existing and newly collected datasets. A total of six trained IELTS examiners assessed 36 test-takers’ performance under the live, audio and video rating conditions. Their scores in the three modes of rating were calibrated using the multifaceted Rasch model analysis.  In all modes of rating, the examiners were asked to make notes on why they awarded the scores that they did on each analytical category. The comments were quantitatively analysed in terms of the volume of positive and negative features of test-takers’ performance that examiners reported noticing when awarding scores under the three rating conditions.  Using selected test-takers’ audio and video recordings, examiners’ verbal reports were also collected to gain insights into their perceptions towards test-takers’ performance under the two non-live conditions.  The results showed that audio ratings were significantly lower than live and video ratings for all rating categories. Examiners noticed more negative performance features of test-takers under the two non-live rating conditions than the live rating condition. The verbal report data demonstrated how having visual information in the video-rating mode helped examiners to understand test-takers’ utterances, to see what was happening beyond what the test-takers were saying and to understand with more confidence the source of test-takers’ hesitation, pauses and awkwardness in their performance.  The results of this study have, therefore, offered a better understanding of the three modes of rating, and a recommendation was made regarding enhanced double-marking methods that could be introduced to the IELTS Speaking Test.
    • An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS Speaking Test

      Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda; University of Bedfordshire (IELTS Partners, 2017-03-01)
      This study compared IELTS examiners’ scores when they assessed test-takers’ spoken performance under live and two non-live rating conditions using audio and video recordings. It also explored examiners’ perceptions towards test-takers’ performance in the two non-live rating modes.  This was a mixed-methods study that involved both existing and newly collected datasets. A total of six trained IELTS examiners assessed 36 test-takers’ performance under the live, audio and video rating conditions. Their scores in the three modes of rating were calibrated using the multifaceted Rasch model analysis.  In all modes of rating, the examiners were asked to make notes on why they awarded the scores that they did on each analytical category. The comments were quantitatively analysed in terms of the volume of positive and negative features of test-takers’ performance that examiners reported noticing when awarding scores under the three rating conditions.  Using selected test-takers’ audio and video recordings, examiners’ verbal reports were also collected to gain insights into their perceptions towards test-takers’ performance under the two non-live conditions.  The results showed that audio ratings were significantly lower than live and video ratings for all rating categories. Examiners noticed more negative performance features of test-takers under the two non-live rating conditions than the live rating condition. The verbal report data demonstrated how having visual information in the video-rating mode helped examiners to understand test-takers’ utterances, to see what was happening beyond what the test-takers were saying and to understand with more confidence the source of test-takers’ hesitation, pauses and awkwardness in their performance.  The results of this study have, therefore, offered a better understanding of the three modes of rating, and a recommendation was made regarding enhanced double-marking methods that could be introduced to the IELTS Speaking Test.