Date of Award

Fall 12-11-2023

Degree Type


Degree Name

Doctor of Philosophy (PhD)



First Advisor

Ritu Aneja

Second Advisor

Jun Kong

Third Advisor

Emilius Adrianus Maria Janssen


Background: Triple Negative Breast Cancer (TNBC) is an aggressive breast cancer subtype that lacks expression of estrogen (ER), progesterone (PR), and human epidermal growth factor 2 (HER2) receptors. Neoadjuvant chemotherapy (NAC), or chemotherapy given before surgery to downstage the tumor, is part of the standard treatment approach for patients with TNBC. However, only 30-40% of TNBC patients respond well to NAC, resulting in a pathological complete response (pCR), i.e., absence of residual disease (RD). Patients with TNBC who do not respond well to NAC (~ 60-70%) are at a higher risk of disease recurrence and distant metastasis (met).

Methods: In this study, we developed supervised machine learning models to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of annotated TNBC tissue to identify features that can predict NAC response and metastasis. In the NAC study, H&E-stained WSIs of treatment-naïve biopsies from 85 patients (model development cohort) and 79 patients (validation cohort). Tile-level model inputs were preprocessed tiles from WSIs measured as 55 texture features and separated through a stratified 8-fold cross-validation strategy (TNBC H&E histology pipeline). Patient-level models leveraged the top eight graph-based features of paired histology classification maps following a leave-one-out cross-validation strategy. The metastasis study incorporated H&E-stained WSIs of adjuvant-treated resections from 115 patients into the TNBC H&E histology pipeline. Patient-level models leveraged graph-based features, normalized clinical features, and metastasis outcome data following a synthetic minority oversampling (SMOTE) and nested 4-fold cross-validation strategy.

Results: The NAC ML pipeline achieved 84.1% accuracy, and the Met ML pipeline achieved 80.1% accuracy. The histological class pairs with the strongest NAC response predictive ability were tumor & tumor tumor-infiltrating lymphocytes (TILs) for pCR and microvessel density & polyploid giant cancer cells (PGCCs) for RD. Similarly, the histological class pairs with the strongest Met predictive ability were Stroma & PGGC and Stroma & Tumor. The addition of clinical variables in met pipelines statistically significantly increased the overall performance metrics of Met ML models.

Conclusion: Our machine learning pipelines can robustly identify clinically relevant histological classes to predict NAC response and metastatic outcomes in TNBC patients.


File Upload Confirmation


Available for download on Friday, December 06, 2024