Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Educational Policy Studies

First Advisor

T. C. Oshima, Ph.D.

Second Advisor

William Curlette

Third Advisor

Hongli Li, Ph.D.

Fourth Advisor

Frances McCarty, Ph.D.

Fifth Advisor

Teresa K. Snow, Ph.D.


This study uses simulated data to compare two methods of calculating Differential Test Functioning (DTF): Raju’s DFIT, a parametric method that measures the squared difference between two Test Characteristic Curves (Raju, van der Linden & Fleer, 1995), and a variance estimator based on the Mantel-Haenszel/Liu-Agresti method, a non-parametric method enabled in the DIFAS (Penfield, 2005) program.

Most research has been done on Differential Item Functioning (DIF; Pae & Park, 2006), and theory and empirical studies indicate that DTF is the summation of DIF in a test (Donovan, Drasgow & Probst; 2000, Ellis & Mead, 2000; Nandakumar, 1993). Perhaps because of this, measurement of DTF is under-investigated. A number of reasons can be given why the study of DTF is important. From a statistical viewpoint, items, when compared to tests, are small and unreliable samples (Gierl, Bisanz, Bisanz, Boughton, & Khaliq, 2001). As an aggregate measure of DIF, DTF can present an overall view of the effect of differential functioning, even when no single item exhibits significant DIF (Shealy & Stout, 1993b). Decisions about examinees are made at the test level, not the item level (Ellis & Raju, 2003; Jones, 2000; Pae & Park, 2006; Roznowski & Reith, 1999; Zumbo, 2003).

Overall both methods performed as expected with some exceptions. DTF tended to increase with DIF magnitude and with sample size. The MH/LA method generally showed greater rates of DTF than DFIT. It was also especially sensitive to group distribution differences (impact) identifying it as DTF where DFIT did not. An empirical cutoff value seemed to work as a method of determining statistical significance for the MH/LA method. Plots of the MH/LA DTF indicator showed a tendency towards and F-distribution for equal Reference and focal group sizes, and a normal distribution for unequal sample sizes. Areas for future research are identified.