View slides from Dr. Chen's related Distinguished Faculty talk.
Electronic health record (EHR) data hold great promise to improve patient health, but how can we accurately integrate evidence from large EHR datasets in hospitals that, due to privacy concerns, are not allowed to share their data? All of the traditional ways of dealing with these data have their merits, but all of them present drawbacks as well. Thus the call for the highly accurate new method that Yong Chen, PhD, and colleagues have recently introduced.
“Pooled analysis of patient-level data is the gold standard for accuracy. However it requires researchers to go through lengthy data-sharing agreements and IRB approval. And it is clearly not privacy-preserving, as patient-level data are exchanged across hospitals,” Dr. Chen explains. Most commonly, researchers employ meta-analysis: They do separate analyses of their own data, and then pool results together. However, this method introduces biases in important applications such as pharmacovigilance studies. Various distributed algorithms present yet another alternative: Researchers run separate analyses of their own data and iteratively update the model by exchanging their individual analysis results. “This procedure is the state-of-art, but it is not communication-efficient: Researchers must exchange results in multiple rounds across hospitals,” Dr. Chen comments. An equally important issue is between-hospital heterogeneity, which has been ignored by many of existing algorithms. This is a prevalent situation as patients and standard of care vary across hospitals. Special attention is needed to account for this variation and to avoid biases from distributed analyses.
“Our methods, which are based on a novel distributed learning framework, require hospitals only to communicate their analysis results in one single round—yet they are highly accurate, preserve privacy and account for between-hospital heterogeneity,” he says. “My colleagues and I hope to advance next generation data-sharing and the way we integrate evidence from multiple data sources—thus helping to enable reproducible scientific discovery while protecting patients’ privacy.”