Date of Award


Degree Type

Closed Dissertation

Degree Name

Doctor of Philosophy (PhD)


Educational Policy Studies

First Advisor

Carolyn F. Furlow - Chair

Second Advisor

Phillip Gagne

Third Advisor

T. Chris Oshima

Fourth Advisor

Christopher Domaleski


In the field of education, decisions are influenced by the results of various high stakes measures. Investigating the presence of differential item functioning (DIF) in a set of items ensures that results from these measures are valid. For example, if an item measuring math self-efficacy is identified as having DIF then this indicates that some other characteristic (e.g. gender) other than the latent trait of interest may be affecting an examinee’s score on that particular item. The use of hierarchical generalized linear modeling (HGLM) enables the modeling of items nested within examinees, with person-level predictors added at level-2 for DIF detection. Unlike traditional DIF detection methods that require a reference and focal group, HGLM allows the modeling of a continuous person-level predictor. This means that instead of dichotomizing a continuous variable associated with DIF into a focal and reference group, the continuous variable can be added at level-2. Further benefits of HGLM are discussed in this study. This study is an extension of work done by Williams and Beretvas (2006) where the use of HGLM with polytomous items (PHGLM) for detection of DIF was illustrated. In the Williams and Beretvas study, the PHGLM was compared with the generalized Mantel-Haenszel (GMH), for DIF detection and it was found that the two performed similarly. A Monte Carlo simulation study was conducted to evaluate HGLM’s power to detect DIF and its associated Type 1 error rates using the constrained form of Muraki’s Rating Scale Model (Muraki, 1990) as the generating model. The two methods were compared when DIF was associated with a continuous variable which was dichotomized for the GMH and used as a continuous person-level predictor with PHGLM. Of additional interest in this study was the comparison of HGLM’s performance with that of the GMH under a variety of DIF and sample size conditions. Results showed that sample size, sample size ratio and DIF magnitude substantially influenced the power performance for both GMH and HGLM. Furthermore, the power performance associated with the GMH was comparable to HGLM for conditions with large sample sizes. The mean performance for both DIF detection methods showed good Type I error control.