Ashkan Pirmani, Edward De Brouwer, Adam Arany, Martijn Oldenhof, Antoine Passemiers, Axel Faes, Tomas Kalincik, et al. (73 authors)
npj Digital Medicine (15.357 IF) · 2025
Abstract
The fragmented nature of Real-World Data (RWD) presents significant challenges for research, particularly in low-prevalence diseases like Multiple Sclerosis (MS). Data dispersed across institutions and countries, coupled with restrictive privacy regulations, limits access to comprehensive datasets, delaying the development of predictive models and understanding of critical outcomes. Federated Learning (FL) offers a collaborative solution by enabling model training without centralizing sensitive patient data, but it often fails to address variations across data providers. Personalized Federated Learning (PFL) adapts models to these differences, improving the precision and relevance of insights. Yet, its potential to enhance predictive performance and guide clinical decisions in MS using routine clinical RWD remains largely unexplored. In this study, we evaluate standard FL alongside two personalization strategies for predicting confirmed MS disability progression over two years, using RWD from over 26,000 patients in the MSBase registry. These strategies include (1) AdaptiveDualBranchNet, a novel architecture that selectively exchanges key model parameters, enabling more nuanced adaptation across diverse clinical centers and (2) fine-tuning the global FL model to better fit each local's data. We benchmark these methods against fully centralized (pooled data) and local (site-specific) models to assess their relative performance and practicality. Our results show that standard FL methods trail both centralized and local baselines, highlighting the limitations of a uniform global model. In contrast, personalization yields a notable performance boost, with both PFL strategies surpassing all comparators. Specifically, the adaptive and fine-tuned versions of FedProx and FedAVG achieved the highest ROC-AUC scores, with FedProx reaching 0.8398 ± 0.0019 and 0.8375 ± 0.0019, and FedAVG scoring 0.8384 ± 0.0014 and 0.8370 ± 0.0016, respectively. In contrast, among the standard FL methods, FedAdam and FedYogi showed the best performance with ROC-AUC scores of 0.7919 ± 0.0031 and 0.7910 ± 0.0028. These results underscore that personalization is not a luxury but a necessity, enabling FL to reach, and even exceed, the predictive power of traditional modeling approaches. By providing concrete guidelines for implementing PFL and introducing a new, flexible architecture, this study establishes a clear path toward unlocking FL's untapped potential. Ultimately, these advances establish a clear path for more impactful, privacy-aware predictive modeling that enhances clinical decision-making and patient care in MS and beyond.