t is a wide practice that Chinese language instructors develop their own instruments for classroom assessment and make important pedagogical decisions (e.g., assigning grades) accordingly. However, the quality of such instruments has rarely been discussed in the literature. This chapter focuses on the measurement quality of an instructor-developed test used as a final written exam in an undergraduate Chinese language course in the U.S. The test was designed to assess the linguistic knowledge taught in the course and contained 37 binary-scored (0/1) items and 17 constructed-response items. Two four-category rating scales were developed to evaluate the constructed responses. Examinees were 88 students enrolled in the Chinese course. Results showed acceptable overall measurement quality of the test as indicated by measures of difficulty, discrimination, reliability, and Rasch model fit. The two rating scales, however, were found to include excessive score categories, suggesting measurement redundancy. The findings of this study are intended to raise the awareness among CSL instructors of the potential limitations of their self-developed assessment instruments.


Author accepted manuscript version of a chapter published in:

Li, S., Feng, Y. & Wen, T. (2019). Measurement quality and rating scale functioning of a CSL classroom assessment instrument. In F. Yuan & S. Li (Eds.), Classroom Research on Chinese as a second language (pp. 211–236). New York: CRC Press.