Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Educational Policy Studies

First Advisor

T. Chris Oshima, Ph.D.

Second Advisor

Kristen L. Buras, Ph.D.

Third Advisor

Hongli Li, Ph.D.

Fourth Advisor

Keith D. Wright, Ph.D.


The examination of assessment items for potential bias is more important than ever. Items that function differently for examinees of equal ability from different groups are said to exhibit differential item functioning (DIF). Traditionally, DIF has been detected by comparing only two groups at a time. In racial/ethnic pairwise comparisons, White examinees were treated as the reference group and one minority group was treated as the focal group. This pairwise analysis was repeated for each minority group of interest. The practice of comparing minority examinees to White examinees must be troubled from a critical race theory perspective. To address the limitations of pairwise analyses, DIF methods that simultaneously analyze items for DIF based on multiple groups and/or multiple grouping factors have been developed. These methods include the generalized Mantel-Haenszel (GMH) statistic and multiple indicators, multiple causes (MIMIC) confirmatory factor analysis (CFA) models. Recently, a multiple-group non-compensatory DIF (MG-NCDIF) index that uses a random sample of all examinees as a base reference group was developed. This study compared the performance of the MG-NCDIF index with the GMH and MIMIC DIF detection methods in simulated conditions that modeled both uniform and non-uniform DIF. Additionally, the GMH and MIMIC methods, which have historically used a traditional reference group, were modeled using a base group reference. Overall, the MG-NCDIF method exhibited lower power and higher Type I error rates than the MIMIC method. The MG-NCDIF method did outperform the GMH method when non-uniform DIF was simulated via the a parameter only; however, when the b parameter was manipulated (to model uniform DIF or non-uniform DIF in combination with manipulation of the a parameter), power was higher for the GMH index than the MG-NCDIF index. Across analyses, GMH exhibited lower Type I error rates than MG-NCDIF. All three methods exhibited higher power for the detection of uniform DIF and non-uniform DIF when both the a and b parameters were adjusted; power was lower for the detection of non-uniform DIF when the adjustment was made solely to the a parameter. A critical race theory framework guided this study.


File Upload Confirmation