Readability for Second Language English Learners
Joon Suh Choi
Citations
Abstract
Readability assessment models play a crucial role in a wide range of contexts where it is important to confirm that there is alignment between an individual’s reading level and the difficulty of a text material. Despite such importance and utility, readability formulas that are currently popular are lacking in at least three aspects. One, despite the majority of English speakers being non-native English speakers (NNES; Ethnologue, 2022), most formulas were developed using text difficulty ratings gathered from a monolithic group of native English speakers (NES). Second, the corpora used to derive these models are generally either small-scale or lacking in genre diversity, impacting the models’ generalizability. Third, most models utilize surface-level linguistic features only, which are not strongly backed by theories of reading comprehension. This study addresses these limitations by developing a large-scale corpus with difficulty ratings collected from four different groups of NES and non-native English speakers (NNES) and deriving more advanced readability assessment models based on the corpus using linguistic features and large language models (LLMs). Through the derivation and analyses of these models as well as the analyses of readability ratings from the four different language groups, this study provides insights with regard to the differences in text difficulty perception across language groups, as well as producing performant, interpretable readability assessment models for each language group. This study will have important implications on the theories of second language (L2) reading comprehension, and on the practical use of readability assessment models across all relevant domains.
