Cristin-resultat-ID: 1928590
Sist endret: 25. august 2021, 10:37
NVI-rapporteringsår: 2021
Resultat
Vitenskapelig artikkel
2021

Benchmarking PySyft Federated Learning Framework on MIMIC-III Dataset

Bidragsytere:
  • Andrius Budrionis
  • Magda Miara
  • Piotr Miara
  • Szymon Wilk og
  • Johan Gustav Bellika

Tidsskrift

IEEE Access
ISSN 2169-3536
e-ISSN 2169-3536
NVI-nivå 1

Om resultatet

Vitenskapelig artikkel
Publiseringsår: 2021
Publisert online: 2021
Open Access

Beskrivelse Beskrivelse

Tittel

Benchmarking PySyft Federated Learning Framework on MIMIC-III Dataset

Sammendrag

The adoption of the advanced data analytics methods has been limited in industries governed by strict data reuse regulations, such as healthcare. Barriers to data access and sharing have affected numerous research and development initiatives in healthcare resulting in major delays, extensive use of resources for data access and findings originating from datasets that are too small to be generalizable. Federated machine learning presents a solution to the problems health data analytics projects are facing by providing a way of complying with strict regulatory requirements without sacrificing privacy. Computing frameworks supporting federated machine learning are still in their infancy and their performance in realistic settings has been studied only to a limited extent. To expand the existing knowledge on federated learning in realistic deployment settings three groups of experiments comparing the performance of a neural network-based model trained in federated manner to that of an equivalent baseline model trained on centralized data storage were designed. Experiments were conducted on the MIMIC-III dataset and modelled a binary classification problem predicting in-hospital mortality. The effect that varying amounts of data, number of computational nodes, and data distribution in the federated network had on model performance and on training and inference durations were studied. Experiments demonstrated predictive performance comparable to that of the baseline for models trained in federated settings in terms of area under the ROC and F1 scores. Data distribution across computing nodes showed minimal to no effect on model performance or on training and inference durations. However, federated model training and inference took approximately 9 and 40 times longer, respectively, than the equivalent tasks executed in centralized settings. These results indicate that federated learning is a viable solution for enabling advanced data analytics in environments regulated by strict privacy requirements.

Bidragsytere

Andrius Budrionis

  • Tilknyttet:
    Forfatter
    ved Nasjonalt senter for e-helseforskning ved Universitetssykehuset Nord-Norge HF

Magda Miara

  • Tilknyttet:
    Forfatter
    ved Politechnika Poznanska

Piotr Miara

  • Tilknyttet:
    Forfatter
    ved Politechnika Poznanska

Szymon Wilk

  • Tilknyttet:
    Forfatter
    ved Politechnika Poznanska

Johan Gustav Bellika

  • Tilknyttet:
    Forfatter
    ved Nasjonalt senter for e-helseforskning ved Universitetssykehuset Nord-Norge HF
1 - 5 av 5