Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)



First Advisor

Ritu Aneja

Second Advisor

Yi Jiang

Third Advisor

Arkadiusz Gertych


Cancer recurrence is the major cause of cancer mortality. Despite tremendous research efforts, there is a dearth of biomarkers that reliably predict risk of cancer recurrence. Currently available biomarkers and tools in the clinic have limited usefulness to accurately identify patients with a higher risk of recurrence. Consequently, cancer patients suffer either from under- or over- treatment. Recent advances in machine learning and image analysis have facilitated development of techniques that translate digital images of tumors into rich source of new data. Leveraging these computational advances, my work addresses the unmet need to find risk-predictive biomarkers for Triple Negative Breast Cancer (TNBC), Ductal Carcinoma in-situ (DCIS), and Pancreatic Neuroendocrine Tumors (PanNETs). I have developed unique, clinically facile, models that determine the risk of recurrence, either local, invasive, or metastatic in these tumors. All models employ hematoxylin and eosin (H&E) stained digitized images of patient tumor samples as the primary source of data. The TNBC (n=322) models identified unique signatures from a panel of 133 protein biomarkers, relevant to breast cancer, to predict site of metastasis (brain, lung, liver, or bone) for TNBC patients. Even our least significant model (bone metastasis) offered superior prognostic value than clinopathological variables (Hazard Ratio [HR] of 5.123 vs. 1.397 p<0.05). A second model predicted 10-year recurrence risk, in women with DCIS treated with breast conserving surgery, by identifying prognostically relevant features of tumor architecture from digitized H&E slides (n=344), using a novel two-step classification approach. In the validation cohort, our DCIS model provided a significantly higher HR (6.39) versus any clinopathological marker (p<0.05). The third model is a deep-learning based, multi-label (annotation followed by metastasis association), whole slide image analysis pipeline (n=90) that identified a PanNET high risk group with over an 8x higher risk of metastasis (versus the low risk group p<0.05), regardless of cofounding clinical variables. These machine-learning based models may guide treatment decisions and demonstrate proof-of-principle that computational pathology has tremendous clinical utility.

Available for download on Saturday, May 01, 2021