Devising a solution to the problems of Cancer awareness in Telangana
Using public-health datasets and machine-learning classifiers, the authors build an app that predicts women’s cancer risk, directs them to nearby screening centers, and powers data-driven awareness campaigns across Telangana.
1. What the method is
The study delivers a pipeline that cleans survey and clinical data, trains decision-tree and support-vector models to flag breast and cervical cancer susceptibility, and couples those predictions with geolocation APIs to recommend the nearest accredited hospital. The resulting toolkit feeds a user-facing awareness app and generates synthetic fault-free datasets for future analysis.
2. Why the method was developed
Telangana’s screening rates for women’s cancers sit below four percent despite rising incidence. Existing awareness drives are sporadic, and diagnostic facilities are unevenly distributed. The authors sought an automated way to identify high-risk individuals early, steer them toward appropriate clinics, and create a data loop that helps public-health officials focus outreach budgets where they will save the most lives.
3. Who should care
- State-level public-health directors
- Hospital network strategy & outreach teams
- Digital-health product managers building risk-screening apps
4. How the method works
Two open datasets—one from Venezuela for cervical cancer and one from the Breast Cancer Surveillance Consortium—are cleansed, deduplicated, and balanced. Key demographic and lifestyle variables are standardized, then fed into multiple classifiers. Decision trees win for cervical-cancer prediction; support-vector machines top the breast-cancer task. A geocoding layer (PositionStack + MapMyIndia) turns user latitude-longitude into a ranked list of oncology centers, while a simple clustering module highlights districts most in need of awareness drives and follow-up campaigns.
5. How it was evaluated
After preprocessing, 688 cervical-cancer records and 15 203 breast-cancer records were split into training and test sets. Accuracy, confusion-matrix balance, and F1 scores were reported for four algorithms: decision tree, SVM, SGD, and random forest. Additional metrics included precision-recall curves and feature-importance rankings to gauge model interpretability.
6. How it performed
Decision trees achieved 100 % training accuracy and 99.4 % test accuracy on cervical-cancer risk, while SVMs posted 99 % training and 98.9 % test accuracy for breast-cancer susceptibility. Confusion-matrix false-negative rates stayed below two percent—critical for public-health triage. *(Source: arXiv 2506.21500, 2025)*
← Back to dossier index