Date of Award

Fall 1-5-2018

Degree Type


Degree Name

Master of Public Health (MPH)


Public Health

First Advisor

Ike Okosun

Second Advisor

Lucy Fike



Cluster Analysis In Epidemiology: An Application to Diabetic Dyslipidemia




INTRODUCTION: Cluster analysis is a rapidly growing area in biostatistics. Its application in epidemiology is inconsistent in terms of choice of methods. Often, reasons for application of a method are not stated in research articles where cluster analyses are performed. Dyslipidemia is an important factor in determining cardiovascular morbidity and mortality among subjects with diabetes. Although clinical classification (based on cardiovascular risk) is available to guide the treatment of diabetic dyslipidemia, whether lipid profiles of diabetics exist in clusters is not well-understood.

AIM: This study sought to determine: (a), whether lipid profiles of adults with diabetes occur in clusters; (b), how the clusters differ across different lipids; and (c), how applying different methods of cluster analysis affect the clusters generated on the lipid profiles.

METHODS: Two-step, K-means, and hierarchical clustering analysis were applied to NHANES 2005-2014 data of adults with diabetes to compare numbers of clusters across four blood lipids singularly and jointly. For ease of comparison, “single” cluster option (as opposed to the “range” option) was used in the hierarchical method and the number of clusters specified in the hierarchical and k means methods were chosen to match those generated by the two-step agglomerative tool.

RESULTS: Using two-step approach, total cholesterol yielded two clusters (n=1328 and n=1000), HDL yielded three clusters (n=1072, 992 and 260), LDL yielded three clusters (n=964, 1071 and 293), triglyceride yielded three clusters (n=1160, 843 and 325) and all four combined lipids yielded three clusters (n=1083, 432 and 813). The corresponding values using hierarchical clustering for the two total cholesterol clusters were 1484 and 844, respectively; values for the three HDL clusters were 489, 1772 and 67, respectively; values for the three LDL clusters were 1111, 988 and 229, respectively; values for the three triglyceride clusters were 1761, 417 and 150; and values for the three clusters using all four combined lipids were 1400, 891 and 37, respectively. Using k means approach, the analogous cluster values using total cholesterol were 1554 and 774; for HDL were 95, 1468 and 765; for LDL were 1111, 943 and 274; for triglyceride were 1230, 808 and 290; and for all four combined lipids were 1092, 950 and 286. Good-of-fit analysis indicates that differences in size of cluster membership based on the clustering method is small.

DISCUSSION: Lipid profiles in adults with diabetes exist in clusters and vary in the size depending on clustering analysis methods. The clinical implications of these differences warrant further studies.