
Histopathological evaluation of biopsies is a clinical gold standard, yet AI-based grading systems often fail when confronted with variations in tissue preparation and imaging. Using a large multicenter prostate cancer dataset spanning six cohorts across three countries, we systematically assess how differences in section thickness, staining protocols, and scanners affect AI performance. We show that such variations can significantly degrade grading accuracy, highlighting important risks for clinical deployment of AI in pathology. To address this, we develop PCAI, an AI-based grading framework trained on patient outcomes rather than subjective expert annotations. PCAI incorporates algorithmic strategies such as domain adversarial training and credibility-guided color adaptation to improve robustness. As a result, the framework consistently outperforms both baseline AI models and experienced pathologists across multiple external test cohorts.


Stay In Touch