Validating a set of Japanese EFL proficiency tests: demonstrating locally designed tests meet international standards
Authors
Dunlea, JamieIssue Date
2015-12Subjects
JapaneseEFL proficiency
EFL
English language assessment
English as a Foreign Language
language assessment
language testing
X162 Teaching English as a Foreign Language (TEFL)
Metadata
Show full item recordAbstract
This study applied the latest developments in language testing validation theory to derive a core body of evidence that can contribute to the validation of a large-scale, high-stakes English as a Foreign Language (EFL) testing program in Japan. The testing program consists of a set of seven level-specific tests targeting different levels of proficiency. This core aspect of the program was selected as the main focus of this study. The socio-cognitive model of language test development and validation provided a coherent framework for the collection, analysis and interpretation of evidence. Three research questions targeted core elements of a validity argument identified in the literature on the socio-cognitive model. RQ 1 investigated the criterial contextual and cognitive features of tasks at different levels of proficiency, Expert judgment and automated analysis tools were used to analyze a large bank of items administered in operational tests across multiple years. RQ 2 addressed empirical item difficulty across the seven levels of proficiency. An innovative approach to vertical scaling was used to place previously administered items from all levels onto a single Rasch-based difficulty scale. RQ 3 used multiple standard-setting methods to investigate whether the seven levels could be meaningfully related to an external proficiency framework. In addition, the study identified three subsidiary goals: firstly, toevaluate the efficacy of applying international standards of best practice to a local context: secondly, to critically evaluate the model of validation; and thirdly, to generate insights directly applicable to operational quality assurance. The study provides evidence across all three research questions to support the claim that the seven levels in the program are distinct. At the same time, the results provide insights into how to strengthen explicit task specification to improve consistency across levels. This study is the largest application of the socio-cognitive model in terms of the amount of operational data analyzed, and thus makes a significant contribution to the ongoing study of validity theory in the context of language testing. While the study demonstrates the efficacy of the socio-cognitive model selected to drive the research design, it also provides recommendations for further refining the model, with implications for the theory and practice of language testing validation.Citation
Dunlea, J. (2015) 'Validating a set of Japanese EFL proficiency tests: demonstrating locally designed tests meet international standards'. PhD thesis. University of Bedfordshire.Publisher
University of BedfordshireType
Thesis or dissertationLanguage
enDescription
A thesis submitted to the University of Bedfordshire in fulfillment of the requirements for the degree of Doctor of PhilosophyCollections
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by-nc-nd/4.0/
Related items
Showing items related by title, author, creator and subject.
-
Linking writing and speaking in English as a Second Language assessmentHamp-Lyons, Liz (Hampton Press, 2012-03)
-
Developing a model for investigating the impact of language assessment within educational contexts by a public examination providerSaville, N.D. (University of BedfordshireUniversity of Bedfordshire, 2009-01)There is no comprehensive model of language test or examination impact and how it might be investigated within educational contexts by a provider of high-stakes examinations, such as an international examinations board. This thesis addresses the development of such a model from the perspective of Cambridge ESOL, a provider of English language tests and examinations in over 100 countries. The starting point for the thesis is a discussion of examinations within educational processes generally and the role that examinations board, such as Cambridge ESOL play within educational systems. The historical context and assessment tradition is an important part of this discussion. In the literature review, the effects and consequences of language tests and examinations are discussed with reference to the better known concept of washback and how impact can be defined as a broader notion operating at both micro and macro levels. This is contextualised within the assessment literature on validity theory and the application of innovation theories within educational systems. Methodologically, the research is based on a meta-analysis which is employed in order to describe and review three impact projects. These three projects were carried out by researchers based in Cambridge to implement an approach to test impact which had emerged during the 1990s as part of the test development and validation procedures adopted by Cambridge ESOL. Based on the analysis, the main outcome and contribution to knowledge is an expanded model of impact designed to provide examination providers with a more effective “theory of action”. When applied within Cambridge ESOL, this model will allow anticipated impacts of the English language examinations to be monitored more effectively and will inform on-going processes of innovation; this will lead to well-motivated improvements in the examinations and the related systems. Wider applications of the model in other assessment contexts are also suggested.
-
The impact of computer interface design on Saudi students’ performance on a L2 reading testKorevaar, Serge (University of BedfordshireUniversity of Bedfordshire, 2015-01)This study investigates the effect of testing mode on lower-level Saudi Arabian test-takers’ performance and cognitive processes when taking an L2 reading test on computer compared to its paper-based counterpart from an interface design perspective. An interface was developed and implemented into the computer-based version of the L2 reading test in this study, which was administered to 102 Saudi Arabian University students for quantitative analyses and to an additional eighteen for qualitative analyses. All participants were assessed on the same L2 reading test in two modes on two separate occasions in a within-subject design. Statistical tests such as correlations, group comparisons, and item analyses were employed to investigate test-mode effect on test-takers’ performance whereas test-takers’ concurrent verbalizations were recorded when taking the reading test to investigate their cognitive processes. Strategies found in both modes were compared through their frequency of occurrence. In addition, a qualitative illustration of test-takers cognitive behavior was given to describe the processes when taking a lower-level L2 reading test. A mixed-method approach was adhered to when collecting data consisting of questionnaires think-aloud protocols, and post-experimental interviews as main data collection instruments. Results on test-takers’ performance showed that there was no significant difference between the two modes of testing on overall reading performance, however, item level analyses discovered significant differences on two of the test’s items. Further qualitative investigation into possible interface design related causes for these differences showed no identifiable relationship between test-takers’ performance and the computer-based testing mode. Results of the cognitive processes analyses showed significant differences in three out of the total number of cognitive processes employed by test-takers indicating that test-takers had more difficulties in processing text in the paper-based test than in the computer-based test. Both product and process analyses carried out further provided convincing supporting evidence for the cognitive validity, content validity, and context validity contributing to the construct validity of the computer-based test used in this study.