Date of Award

Spring 5-7-2011

Degree Type


Degree Name

Doctor of Philosophy (PhD)


Educational Policy Studies

First Advisor

Chris T. Oshima, Ph.D.

Second Advisor

Phillip Gagne', Ph.D.

Third Advisor

Raymond Hart, Ph.D.

Fourth Advisor

Philo Hutcheson, Ph.D.


Standardized testing has been part of the American educational system for decades. Controversy from the beginning has plagued standardized testing, is plaguing testing today, and will continue to be controversial. Given the current federal educational policies supporting increased standardized testing, psychometricians, educators and policy makers must seek ways to ensure that tests are not biased towards one group over another.

In measurement theory, if a test item behaves differently for two different groups of examinees, this test item is considered a differential functioning test item (DIF). Differential item functioning, often conceptualized in the context of item response theory (IRT) is a term used to describe test items that may favor one group over another after matched on ability. It is important to determine whether an item is functioning significantly different for one group over another regardless as to why. Hypothesis testing is used to determine statistical significant DIF items; an effect size measure quantifies a statistical significant difference.

This study investigated the addition of reporting an effect size measure for differential item functioning of items and tests’ (DFIT) noncompensatory differential item functioning (NCDIF), and reporting empirically observed power. The Mantel-Haenszel (MH) parameter served as the benchmark for developing NCDIF’s effect size measure, for reporting moderate and large differential item functioning in test items. In addition, by modifying NCDIF’s unique method for determining statistical significance, NCDIF will be the first DIF statistic of test items where in addition to reporting an effect size measure, empirical power can also be reported.

Furthermore, this study added substantially to the body of literature on effect size by also investigating the behavior of two other DIF measures, Simultaneous Item Bias Test (SIBTEST) and area measure. Finally, this study makes a significant contribution to the body of literature by verifying in a large-scale simulation study, the accuracy of software developed by Roussos, Schnipke, and Pashley (1999) to calculate the true MH parameter. The accuracy of this software had not been previously verified.