Current Research

Harnessing Electronic Health Records Data to Expedite the Diagnosis of Early Stage Ovarian Cancer

Early detection of epithelial ovarian cancer (EOC) remains critical in oncology, with 5-year survival rates <40% due to late-stage diagnosis. We present a machine learning framework leveraging electronic health records to detect early signs of EOC, potentially enabling earlier intervention when survival rates >95%. Using retrospective data from UW-Health & NIH All of Us, we constructed an ensemble of tree-based models. Cases & controls were matched 1:1 on demographics & healthcare system interactions, with an optimization-based algorithm minimizing distance between matches. Distressed controls were added to reflect real-world cases where clinicians must distinguish EOC from similar conditions. All data was censored 90 days prior to diagnosis. Our models showed strong performance using stratified 10-fold cross-validation (90/10 train-test split, 30 iterations), with XGBoost & GradBoost achieving AUROC scores of 0.70-0.72 & positive predictive values of 0.61-0.64. SHAP analysis identified three key feature groups: socioeconomic factors (education, income, Community Deprivation Index), blood-based markers (carcinoembryonic antigen, hemoglobin, erythrocytes), & genetic markers. While these results don’t directly identify early-stage EOC, they represent a key step toward earlier detection. Moving forward, we aim to develop multi-modal models capable of differentiating EOC stages, enabling identification of stage-specific predictive features for optimal intervention timing.