ABSTRACT

Federated Learning (FL) promises to enhance data-driven health research by enabling collaborative machine learning across distributed datasets without direct data exchange. However, current FL implementations primarily reflect the data-sharing interests of institutional controllers rather than those of individual patients whose data are at stake. Existing consent mechanisms—like broad consent under HIPAA or explicit consent under the GDPR—fail to provide patients with control over how their data is used. This article explores the integration of smart contracts (SCs) into FL as a mechanism for automating, enforcing, and documenting consent in data transactions. SCs, encoded in decentralized ledger technologies, can ensure that FL processes align with patient preferences by providing an immutable, and dynamically updatable consent architecture. Integrating SCs into FL and swarm learning (SL) frameworks can mitigate ethico-legal concerns related to patient autonomy, data re-identification, and data use. This approach addresses persistent principle-agent asymmetries in biomedical data sharing by ensuring that patients, rather than data controllers alone, can specify the terms of access to insights derived from their health data. We discuss the implications of this model for regulatory compliance, data governance, and patient engagement, emphasizing its potential to foster public trust in health data ecosystems.

I. INTRODUCTION

Federated learning (FL) promises to enhance data-driven learning models while protecting data for data controllers and processors, offering a critical tool to address global health challenges by accruing the benefits of ‘big data’ without the need to duplicate, exchange or centralize it. While a promising solution, FL frameworks to date are more often implemented1 as mechanisms to observe data protection, sharing preferences and proprietary interests of institutions, organizations and corporations (data controllers), and do not inherently integrate mechanisms to observe data sharing preferences of individuals (patients; citizens) who are the primary subjects of those data (ie the identified or identifiable natural person that personal data relates to). Recently some of us2 presented a mechanism—smart contracts (SCs)—with the potential to empower patients to actively participate in decisions about how and with whom data controllers share their personal health information (PHI). Here, we propose how SCs may be integrated into FL and other variants of decentralized learning like swarm learning (SL)3 to ensure meaningful (ie democratized, transparent, and efficient) patient consent to the exchange of health information. This approach contributes to efforts to create ‘trusted infrastructures’ for data sharing supported by legislative initiatives in the US and Europe (Box 1).

Box 1. Legislative initiatives in the US and Europe for Trusted Data Processing Infrastructures

The European Union’s Data Governance Act,4 for example, focuses on increasing public trust in data intermediaries and thereby encouraging organizations, institutions and industry players to engage in ‘data altruism’ (voluntary data sharing) to fuel data-driven innovations. Moreover, the emerging European Health Data Space (EHDS)5 also aims to empower citizens to access their personal health data and share it with physicians and other health-care providers. It further defines procedures for the secondary use of electronic health data for research and development. A similar initiative in the U.S., the National Strategy to Advance Privacy-Preserving Data Sharing and Analytics (PPDSA), seeks to catalyze innovation and creativity by facilitating data sharing and analytics while protecting sensitive information. Together, these initiatives reflect a growing, transnational recognition of the need to support responsible forms of data collection and stewardship and to proactively counter sources of ethical hesitation around data exchange.

II. FEDERATED LEARNING AND THE CHALLENGE OF PATIENT-CENTRIC CONSENT

FL is a learning paradigm that involves training algorithms collaboratively (across non-co-located data sets) without exchanging the data itself.6 This approach could be used to balance the need for data sharing to improve research and care delivery for the common good with patients’ and clinical research participants’ right to data protection and consent. The requirement to obtain informed consent is rooted in the respect for individual autonomy7 and outlined in the Helsinki Declaration as a process whose function is to allow individuals to decide, based on their own preferences and values, whether they want to enroll in a specific study.8 Although the principle of consent is widely embraced in theory within the scientific community, the demands of contemporary biomedical research for repurposing data for multiple uses beyond its original collection purpose stretches their interpretation of what counts as consensual sharing. Specifically, secondary data uses are permissible where individuals have provided ‘broad’ consent (under HIPAA) or where ‘explicit’ consent is justifiably broadened (under Recital 33 of the GDPR). These forms of consent serve as a primary binding agreements for certain types of secondary data use. In cases where data are de-identified (under HIPAA specifications) or anonymized (under GDPR), they enable hospitals, institutions, organizations or companies to buy, sell or exchange patient data. Meanwhile, patients lack visibility into the nature or extent of these exchanges.

Broad (or broadened explicit) consent thus introduces a significant principle-agent problem. Data subjects must place their trust in data controllers (entities who determine the means and purposes of data processing, as stipulated in Article 4(7) of the GDPR) to act as responsible stewards of their data, often without any evidence to support such trust. These forms of consent have been criticized as being outdated, serving as a ‘ritualized entry point’ to clinical care and research9 to protect the interests of data controllers (including not only healthcare centers and research institutions, but also pharmaceutical companies, device manufacturers and other industry players10) rather than constituting a meaningful documentation of patient understanding and agreement. More meaningful forms of consent would enable patient awareness of the nature of data exchange and participation in decisions about the specific contingencies for exchange11,12 (Box 2).

Box 2. What is ‘meaningful consent’?

Current approaches to consent (eg ‘broad’ consent under HIPAA) and ‘broadened explicit’ consent under the GDPR) have been criticized as not reflecting original intentions of (ie information, comprehension, and voluntariness outlined in the Belmont Report, nor serving other distinct and often underrecognized functions of consent, including the promotion of transparency, integrity, and trustworthiness, each holding intrinsic ethical value.13 Some also argue for greater data ‘apparency,’ or the knowledge that data is being accessed and shared, by whom and for what purposes.

While ‘dynamic’ consent has been proposed as a more meaningful alternative in that it notifies participants and seeks renewed consent as new opportunities for secondary data uses arise, this approach has likewise been criticized for burdening patients who may be uninterested, unwilling or lack sufficient information to make informed decisions about data sharing on a continual basis.14 Emerging data protection laws (cite Canadian privacy law; also California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA)) are calling for more ‘meaningful’ forms of consent that: (i) Provide specific, standardized, plain-language information about the intended use of personal information; (ii) Inform individuals about third parties with which their personal information may be shared; (iii) Prohibit the bundling of consent to diverse or unrelated choices into a single contract; and (iv) Ensure individuals are aware of what personal information is being collected, with which parties it is being shared, the purpose of its collection, use, or disclosure, and the potential risks and other consequences of these actions.

As progress for meaningful consent remains stagnant, patient data is becoming more abundant, more sensitive due to difficulties in de-identifying certain types of data (eg genetic), and easier to re-identify, a practice permissible under current U.S. data protection laws. While FL provides a promising solution to deter unwanted attempts to re-identify or prohibit inferences from patient data by prohibiting direct access to (and copy and secondary exchanges of) data, it continues to leave meaningful patient consent out of the equation. It may seem that FL solves principle-agent dynamics by ensuring that patient data are kept private and unduplicated, but such an assumption draws a false phenomenological distinction between sharing data whole cloth versus sharing only insights from those data. Consider that a patient philosophically opposed to sharing personal mental health data with a neurotechnology company may not perceive a great difference between that company being able to see the (de-identified) data itself versus insights from those data. From an ethico-legal perspective, the knowledge to be gained is key to the intellectual property at stake. It is for this reason that the principle-agent problem does not disappear with FL but is instead exacerbated by FL’s capacity to facilitate and incentivize privacy-protected data exchange at larger scale, as long as conditions and legal basis for exchange are compliant with HIPAA and GDPR. While broader data sharing is widely considered an essential benefit of FL, data controllers remain in control of data sharing decisions while data subjects remain largely unaware of the nature or purpose of these transactions, despite having legally consented to them. Rather than continue to accept this power imbalance as the status quo, emerging technological capacities provide us with options—perhaps even an ethical obligation—to deliver more effective and meaningful approaches to privacy and informed consent.

III. SMART CONTRACTS: TOOLS FOR MEANINGFUL PATIENT CONSENT IN FEDERATED LEARNING

In the context of FL, smart contracts (SCs) may help to address the principle-agent problem by providing a mechanism to automatically ensure—and immutably document and verify—that data transactions reflect and align with patient-level data sharing preferences. This approach takes advantage of the decentralized nature of FL to distribute patient trust among a network of entities participating in the FL (eg joint controllers, processors) and, more specifically, trust in the computations that govern their transactions, rather than requiring patients to trust a single data controller (eg their hospital). To explore this potential, we first describe what SC’s are, the various functions they may serve in clinical research, and finally, how they may be integrated with FL/SL to ensure patient-level consent is automatically observed.

SCs are contractual agreements that are computationally encoded and automatically executed in line with pre-specified (‘if-then’) conditions15 (Box 3). SC’s could allow for patient consent preferences around data sharing to be computationally built into data execution environments, such that data become accessible only if the terms of a smart contract are met, while remaining inaccessible (via encryption or later, post-quantum cryptography) by all other unspecified or disallowed entities or applications. A smart contract could be triggered if, for example, a data requester initiates a request that fulfills all the contract’s requirements, for example identity authentication of a data requester (eg via an access control list) or matching certain data use purposes, specified a priori by a patient. Once triggered, the node (a data controller) calls that function to be verified across the rest of the network. Once the conditions for data access are verified across nodes, the data is released and a state change is time-stamped and immutably recorded on the ledger, allowing data controllers visibility over who has requested their data and where such requests have been granted or denied. Further, because the transaction record is historicized on every node (or a critical majority, depending on the protocol), the likelihood of transaction corruption and/or unintended access of properly stored and encrypted data assets is significantly minimized, some argue impossible. Notably, data (or data representations, ie models, in the case of FL) must be stored in a secure location (with the data controller or joint controllers) to avoid ‘backdoor’ access that circumvents interaction with the smart contract.

Box 3. What are Smart Contracts?

Originally proposed by Szabo16 in 1996, a smart contract (SC) is a computer program whose distinctive feature is to be stored and executed in decentralized ledger (ie blockchain) technologies (DLT). It is activated through a transaction executed in the DLT and, based on the given inputs, produces a deterministic outcome. The main benefit is that the underlying DLT represents an auditable trusted log, which render transactions permanent, verifiable and resistant to censorship. While full description of this technology is beyond the scope of this paper, a blockchain can be briefly defined as a network of nodes (eg computers in a peer-to-peer network) that collectively maintain a single, immutable record (ledger) of transactions that is shared and agreed upon by all parties in line with a protocol.

Smart contracts can thus serve the needs of data controllers as well as data subjects. They relieve data controllers from the burden of manually handling data distribution, offering cost-saving incentives for processing fine-grained access criteria. Smart contracts also offer a potential solution to the principle-agent problem by ensuring that data controllers’ sharing practices (automatically) align with the preferences of data subjects.

IV. USING SMART CONTRACTS TO ENSURE PATIENT CONSENT IS OBTAINED AND OBSERVED

Previous work has explored various uses of smart contracts in clinical research independent of FL, including to ensure scientific integrity in clinical trials such as proper adherence to trial recruitment guidelines and transparent implementation of patient consent17,18 (Box 4.1). Smart contracts could help guard against data manipulation by guaranteeing adherence to the rules of engagement (eg standards of research integrity, transparency, data sharing and reporting consistent with rules set by regulatory bodies or Institutional Review Boards19) and by providing an immutable record of trial history (Box 4.2). Certain models20 suggest that such rules of engagement may be computationally enforced not only at the entry point for organizational entities participating in a trial but also, if desired, at the level of each patient record,21 thus allowing for granular, selective data release in line with patient-led smart contracts. Still others have proposed the use of smart contracts as a means to register and manage data requests and retrievals in a blockchain-based network comprised of organizations wishing to exchange data,22 whereby specific data are exchanged in line with specific criteria like requester authenticity and intended data use purpose (Box 4.3). Others have further extended these approaches to include data query and retrieval capabilities (Box 4.4) within a research-oriented blockchain network23 in which SCs could allow participants to directly grant specific permissions for use of their data at the time of enrollment (via a participant-facing interface, encoded into a smart contract) and to help researchers send and receive queries to the trial database (researcher interface) in line with those preferences.24 These capacities are illustrated in Fig. 1.

Box 4.1. Collecting, Storing and Tracking Informed Consent to Participate

Smart contracts (SCs) may help to ensure scientific integrity in clinical trials, including proper adherence to trial recruitment guidelines and transparent implementation of patient consent (ie preventing back-dating consent or enrollment without consent).25,26 For example, SCs can be useful to collect, store and track patients’ informed consent in a secure, unfalsifiable and publicly verifiable way. SCs may automatically restrict study enrollment to patients who have provided documented and verified consent. Records assigning and revoking consent and permission are made visible to all parties, allowing real-time auditing, and offering a technological solution (time-stamped consent documented in a smart contract) to the U.S. Food and Drug Administration’s call27 for implementing mechanisms to ensure the most recent revised consent forms are in active use in a clinical trial.28 Moreover, smart contracts could enable consent tracking across potentially multiple versions of a study protocol, given that in many cases patient re-consent must be sought due to evolution of risks, significant changes in the research procedures, and worsening of a medical condition.

Box 4.2. Encoding Study Protocols to Ensure Stakeholders Play by the Rules

SCs could help guard against data manipulation (eg endpoint switching, data dredging, and selective publication) by outlining the rules of engagement at the entry points for a clinical trial, deterministically permitting various stakeholders to participate in the research in line with consensus rules encoded in SCs. Stakeholders could include regulators, medical (eg pharma) companies, contract research organizations or other entities, who enter into an agreement by becoming nodes (via smart contract transaction) in a private, permissioned blockchain network (ie with a restricted group of users) employing two layers of smart contracts to govern subsequent research. First, the ‘regulator’ contract (ie, with rules potentially set by regulators or trial managers) would act as the primary entry point and hold a data structure containing clinical trial authorization details. Certain stakeholders could then enroll patients via a second layer ‘trial’ smart contract, encrypted using public/private key encryption and store their data in the authorized data structure. Depending on the protocol, this data structure could (only) contain anonymized patient information, consent documentation, and a container file allowing storage of successive clinical measurements documented with certain characteristics or formats (time-stamped, de-identified, string-encoded, etc.). This overall structure allows for more transparent data management in clinical trials and ensures that data is documented, stored and potentially shared (see Box 3) in ways that are consistent with rules set by regulatory bodies and/or agreed to by stakeholders managing the trial(s). Further, the decentralized ledger provides an immutable record of trial history. Importantly, a data structure may integrate encryption not only at the entry point for organizational entities participating in the trial but also, if desired, at the level of each patient record,29 establishing a computational infrastructure needed for more granular, selective data release in line with patient-led SCs.

Box 4.3. Using Smart Contracts to Register and Manage Data Transactions

SCs can provide a means to register and manage data requests and retrievals in a blockchain-based network comprised of organizations that wish to exchange data30 (rather than collaborate in a research trial, as in Box 2). If a network’s primary purpose is to facilitate and document data exchange, smart contracts offer value as a means to register data assets (proof of existence) and immutably document their exchange according to a logic established in the consensus protocol. A data request specifying the intended purposes/uses for the data is broadcasted to the network, and each node can review the request and ‘decide’ whether to share the data. Each node (organization) has its own authority certificate with an API and a second ‘authentication’ API that is integrated with a web server to handle requester authentication. A number of decentralized identity authentication and authorization schemes exist31,32 and could be adapted to this use case. This overall use case conveys the potential of using blockchain-based smart contracts to immutably register data transactions, including documentation of which specific data were exchanged in line with specific criteria like requester authenticity and intended data use purpose.

Box 4.4. Ensuring Data Query and Retrieval is Consistent with Patient Preferences

Uses 1–3 have been further extended to include research participation in a blockchain network with data query and retrieval capabilities.33 SCs have been proposed to improve clinical trials processes by helping patients enroll in clinical trials and grant specific permissions for use of their data (patient version) and to help researchers send and receive queries to a trial database (researcher version). Participants can assign permissions for which of their data can be viewed and by who, and researcher requests for data are written into an immutable record. Following informed consent to participate, some frameworks could allow patients to use a web application to register in the study and set certain permissions on their data that become encoded in a SC. The patient-focused smart contract represents the full array of registered patients and the permissions set (by patients/participants themselves) for their data. Any permission edits they make (dynamic consent) via the application interface are likewise encoded. In their version of the app, researchers could retrieve study data from the database, with query results filtered by patients’ permissions, and the details of these transactions registered in the blockchain.

Proposed Use Cases for Smart Contracts. Figure description: Use case 1 (image from M. Benchoufi et al., Blockchain Protocols in Clinical Trials: Transparency and Traceability of Consent, 6 F1000Research (2017).) illustrates how smart contracts may be used to collect, store and track patients’ stated consent to participate in a research study, as well as static (eg, broad) or dynamic preferences for secondary data sharing. Use case 2 (image from T. Nugent et al., Improving Data Transparency in Clinical Trials Using Blockchain Smart Contracts, 5 F1000Research (2016).) illustrates regulator contracts versus trial contracts, where the logic of the regulator trial contract enforces the trial contract and protocol (See Box manipulation). Use case 3 illustrates automatic tracking of data transactions using blockchain, governed by the trial protocol (Use case 2) and in line with patient consent preferences designated in Use case 1. Use case 4 shows selective data sharing according to consent preferences.
Figure 1

Proposed Use Cases for Smart Contracts. Figure description: Use case 1 (image from M. Benchoufi et al., Blockchain Protocols in Clinical Trials: Transparency and Traceability of Consent, 6 F1000Research (2017).) illustrates how smart contracts may be used to collect, store and track patients’ stated consent to participate in a research study, as well as static (eg, broad) or dynamic preferences for secondary data sharing. Use case 2 (image from T. Nugent et al., Improving Data Transparency in Clinical Trials Using Blockchain Smart Contracts, 5 F1000Research (2016).) illustrates regulator contracts versus trial contracts, where the logic of the regulator trial contract enforces the trial contract and protocol (See Box manipulation). Use case 3 illustrates automatic tracking of data transactions using blockchain, governed by the trial protocol (Use case 2) and in line with patient consent preferences designated in Use case 1. Use case 4 shows selective data sharing according to consent preferences.

Such examples of smart contract utility (outside of FL) raise four important considerations. First, such blockchain-augmented models offer both the capacity for ensuring enrollment is contingent on explicit consent and that data query and retrieval practices observe specific consent preferences laid out by each patient. They introduce a level of granularity in automating consent preferences that is absent from broad consent and less reliant on principle/agent trust. Second, blockchain-augmented models describe the patient smart contract as an ‘array’34 or ‘library’35 of documented patient preferences, which may be updated dynamically, offering a reliable mechanism for executing and tracking dynamic consent (though dynamic consent comes with its own challenges, see Box 2). These capacities offer significant advantages over current consent approaches. However, two further aspects introduce residual concerns. Critically, database servers typically remain centralized, necessitating trust from all parties around a central data controller who curates and stewards data, both during and after a trial. Further, in cases where requests are retrieved from this centralized server and sent to researchers for local analysis outside the system at the researcher’s site, researchers are able to obtain their queried data whole cloth,36 opening the possibility for unwanted or unintended data sharing for purposes potentially unrelated to the specific study goals, unrelated to patient benefits, and/or outside of the purview of the original consensus protocol for participation. This arrangement further creates potential for reidentification of data subjects using triangulation and data fusion techniques.

Thus, the use of smart contracts in the above instances neither circumvents the need to trust the controllers of the central database server nor the research entities to whom the data is distributed, because once the data is unleashed from governance by the smart contract through whole cloth data exchange, those data become susceptible again to unwanted exchange, reidentification and other unintended uses. Despite their advantages for addressing proximate concerns related to research integrity and observation of patient consent, the use of smart contracts and blockchain as outlined in these models are only first steps in addressing these more ultimate concerns. For smart contracts to truly work as a deterministic assurance of data security and exchange in line with patient preferences, data must only be accessible through the smart contract—that is, data must stay where they are, unduplicated and unduplicatable.

V. INTEGRATING BLOCKCHAIN AND FEDERATED LEARNING

This is precisely the task which FL is designed to fulfill. By ensuring that insights are shared from data rather than the data itself, FL protects the privacy of data subjects as well as proprietary or other sensitive information that may be contained in those data. FL also limits data duplication in line with GDPR principles of data minimization (Article 5(1)(c)). However, on its own, FL still cannot filter algorithmic learning across data sets according to patient preferences and also cannot ensure that filtering methods are impervious to influence. Incorporating smart contracts offers the potential to achieve both of these aims.

Smart contracts are already being integrated into decentralized learning frameworks like FL and SL, primarily as gateways for participation in FL and to set the rules of engagement, similar to the capacities noted above.37 Researchers and industry partners38 have already begun integrating blockchain more broadly into FL, allowing for algorithmic learning that is collaborative and decentralized (spread among several distinct but remote partners), secured (data controllers never fully expose their data) and traceable (model training histories are immutably recorded). To address residual risks of relying on a centralized service to orchestrate FL and fairly distribute models and metadata across the network, a more fully decentralized solution has emerged in the form of swarm learning (SL),39 which dispenses with a dedicated server, shares the parameters via the swarm network and builds algorithmic models independently on private data at the individual sites (or ‘swarm edge nodes’). The use of smart contracts in both FL and SL allows collaborators to immutably document relative contribution scores across nodes, which may help to disentangle participants’ contributions to a collaboratively trained model for epistemic purposes or in cases where determining (eg monetary) reward distribution is necessary if a model is licensed or sold. Existing FL40 and SL41 platforms also already allow users to orchestrate the execution of training tasks and to choose the desired permission regimes (which may differ across nodes) and track all operations on data assets. Every operation (ie computation) executed in the FL environment must meet all the permission constraints before it is added to the ledger and subsequently implemented.

These examples demonstrate the feasibility of integrating blockchain-based smart contracts into FL/SL environments to manage site enrollment, enforce adherence study or collaboration protocols, and filter algorithmic access to data in line with organizational preferences. However, these permissions remain specified by data controllers (at the organizational level) rather than directly by patients or research participants (at the level of an individuals’ data/um). As such, patients and research participants continue to lack visibility and direct, granular or dynamic control over decisions about whether and how insights from their data are shared.

V.A. Smart Contracts for Patient-Level Consent in Blockchain-Based Federated and Swarm Learning

Here we highlight a model42 demonstrating the technical feasibility of integrating smart contracts into FL/SL as a way to manage access to a distinct, individual’s data/um based on that individual’s consent preferences. In such a model, data controllers or processors (eg other organizational entities, government-appointed or private auditors, algorithm developers, etc.) constitute primary nodes in a private, permissioned blockchain network, similar to the scenarios described above; however, data subjects (patients; research participants) are also integrated as integral actors in the model design (Fig. 2). Patients receiving clinical care or participating in a clinical research study specify their consent preferences locally with an arbitrary level of granularity (eg consent to all data, consent to use data for certain treatment categories only, consent to use all data but only for certain types or foci of studies). If provided a relevant patient-facing API, patients can grant, revoke, renew or update their consent across the network directly, reducing the need for manual data entry by a local project manager and minimizing the possibility of manipulation, such as inclusion of fake or ineligible (non-consented; expired/revoked) data in attempt to bias models. Use of a trusted third party to digitally manage consent could provide additional security and integrity, especially if they are entities with fiduciary duties. Consent may also be obtained via paper and later digitized to embed consent preferences into smart contracts, (though this requires residual trust in the local manager and/or institution more broadly). Once consent preferences are linked to the localized data (via smart contracts), data access can be filtered, accompanied by a synchronized view of consent both locally and remotely in the FL (or SL) network. When a collaborator deploys a model, results are returned with a list of consents that permit data use, facilitating auditing of which data was accessed for each learning round. Patients may also track the use of their data in a transparent and secure fashion, via a patient-facing interface listing studies in which they have been included, when and by whom.

Encoding Patient Preferences to Ensure Representation in Federated Learning. Figure description: Step 1. Patient consent preferences are encoded into a smart contract that acts as a gatekeeper. Step 2. Patient data is stored in an encrypted database, accessible only through interaction with the smart contract. Step 3. Hospitals & research institutions enroll (via smart contract) in a federated learning study. Data sharing is filtered by patient consent preferences specified in Step 1. Step 4. Transaction logs (and, if desired, contribution scores) are immutably stored and auditable. Credits: Patient image created using ChatGPT4; All other images licensed under Creative Commons and credited (clockwise) to James Fok, Intana Silva, Royyan Razka, Siti Nurhayati, IconLion.
Figure 2

Encoding Patient Preferences to Ensure Representation in Federated Learning. Figure description: Step 1. Patient consent preferences are encoded into a smart contract that acts as a gatekeeper. Step 2. Patient data is stored in an encrypted database, accessible only through interaction with the smart contract. Step 3. Hospitals & research institutions enroll (via smart contract) in a federated learning study. Data sharing is filtered by patient consent preferences specified in Step 1. Step 4. Transaction logs (and, if desired, contribution scores) are immutably stored and auditable. Credits: Patient image created using ChatGPT4; All other images licensed under Creative Commons and credited (clockwise) to James Fok, Intana Silva, Royyan Razka, Siti Nurhayati, IconLion.

VI. BENEFITS AND IMPLICATIONS

VI.A. Identifying Data Value & Provenance

The capacity to transparently document and execute consent and to track secondary data use opens up a number of game changing possibilities for participants in the health information exchange ecosystem. On a technical level, it permits greater scientific integrity, providing a mechanism for determining provenance in scientific workflows, identified as a key requirement for advancing AI/ML.43 Provenance can include information about the process and data used to derive a data product (eg an algorithmic model) and about a data set’s quality and authorship. These insights enable both validation as well as replication of results, critical for data-driven health research.44 Further, as ML models rapidly move to the forefront of learning health systems and play an increasing role in diagnosis and prognosis, a greater imperative (ethical, financial, etc.) arises to implement deterministic means to identify and audit factors contributing to or potentially detracting from data quality. SCs enable auditing and exploration of which data (and their attributes or pedigrees) have informed model development and at which stage(s). SCs could thus act as complements to current initiatives among the ML community to assign versions to datasets and models. Further, the ability of smart contracts to document and quantify relative contributions to model performance could also prove useful for determining relative quality and value of datasets. Tracking data provenance via SCs could also help to evaluate and ensure inclusivity in clinical trials under the United Nation’s Sustainable Development Goals. Further, the nature of smart contracts as metadata also opens the possibility (depending on patient consent) for indexing data in ways that could facilitate data query and identification of relevant data for model training, significantly lowering computational costs incurred from unnecessary processing of irrelevant data.

VI.B. Greater Public Trust (Lessness)

Use of SCs in this fashion stand to directly empower patients and research participants by involving them in the management of their own data. This aligns with ethical principles of autonomy (consent and control) and data protection central to clinical research under HIPAA and enshrined in the GDPR as fundamental rights. SCs in combination with FL could help to modernize outdated forms of consent by ensuring that decisions about access to PHI are patient-led and far less susceptible to unwanted data sharing and duplication.

Smart contracts also enhance data privacy and security for health data at the individual level, complementing the capabilities of FL to provide data protections at the organizational level, all while facilitating collective knowledge gains. These approaches to data stewardship could help to increase public trust in data intermediaries, a central goal of the EU’s recently passed Data Governance Act (DGA),45 or more specifically, to replace trust in intermediaries with confidence in computation and automation. The assumption underlying the DGA is that if data intermediaries employ certain technical infrastructures to preserve privacy, confidentiality and enable ‘safe’ reuse of data, data controllers (and presumably data subjects) will be more inclined to altruistically exchange data insights. The unique combination of SCs with FL constitutes one such trusted (or ‘trustless’) infrastructure with high potential.

VI.C. Enhanced Patient/Participant Engagement

An additional potential deserving of attention is that patients may be afforded the ability to transparently see where their data is going, once collected. If data are truly stored as a unique copy, accessible only under conditions specified by a smart contract (and by no other means), and if transactions are transparently tracked and conveyed in a format that is widely interpretable, then patients should be able to visualize the full scope of exchanges that involve their PHI (achieving ‘apparency’; see Box 2). Affording patients and research participants greater visibility into the nature of these exchanges permits greater awareness and, importantly, more meaningful decision making about whether, when, with whom and for which purposes they wish to share data. As such, this approach pays greater respect to individuals as participants in rather than subjects of research and reconceptualizes data sharing decisions as shared rather than unilaterally executed by a set of researchers, clinicians, institutions or companies.

Responding to an area of growing interest, SCs may also help to respect the legal rights of patients (under GDPR) to have their data deleted if requested. While absolute deletion from data sets may require additional steps, patients could use SCs interfaces to revoke consent for their data/um, rendering them computationally inaccessible to anyone. However, it remains unclear whether insights from their data could ever be extracted from already-trained models.

VII. INCENTIVES ALIGNMENT

A shift to greater patient visibility and control is likely to meet systemic resistance, as it dislocates current and often entrenched sources of control from a small set of powerful players in health information exchange.46 Decentralized learning systems like FL are not by themselves a threat to existing power dynamics that animate the health information exchange economy; however, the inclusion of patient-led SCs could upend these dynamics by enabling individuals to not only see data exchanges but also potentially control them. It should be noted that a growing number of other decentralized/blockchain frameworks propose using smart contracts to govern access to individual health records, where access is controlled directly and granularly by patients rather than by hospitals, research institutes or other organizations.47,48,49 However, arrangements that position only patients at the helm ignore the fact that multiple stakeholders are involved in the generation, maintenance, storage and processing of health data in ways that ethically (and often legally, in cases of intellectual property engaged at various stages of processing) preclude patients from being the sole controllers of their data. For this reason, SC terms should reflect an alignment of incentives across multiple relevant stakeholders, including physicians, hospitals, research institutions, community-based organizations, innovative industry partners and others. SCs offer new opportunities to explore fairer, more democratic and multi-stakeholder deliberation around health data exchange.

An additional consideration is that patients or research participants may not want such granular control over their health data; and the rest of society may not want them to have such control. From a sociocultural perspective, greater control comes with greater responsibility which requires a minimum knowledge of the consequences of data sharing decisions. Lesson learned around dynamic consent50 suggest that data sharing decisions can place unwanted burden on individuals to provide ongoing ‘opinions’ while lacking confidence in their ability to understand or differentiate among the complex characteristics, goals and implications of research or exchange requests. Further, scholars51,52,53 have compellingly argued that we should consider health data as collective property (‘data commons’) instead of endorsing individual-level property rights. While granting direct control may help to protect individuals’ privacy and dignity, as described above, data sharing decisions have broader impacts for communities and nations. These entities should also help decide how data should be used and for whose benefits.54 Any implementation of smart contracts should thus be done not only with patient preferences but also, ideally, with community deliberation and input,55 fostering community engagement in clinical research.56 This may require a concerted effort among legal scholars, ethicists, and especially social scientists in the design of SCs to ensure they serve not only patient interests but those of distinct communities and broader society.

VIII. CONCLUSION

Expanded use of decentralized learning systems like FL promise to advance data protection at the institutional, organizational and industry levels and foster collaborative discovery. However, simultaneous attention must be given to integrating tools for privacy and data sharing preferences at the individual level. Combining smart contracts with FL (or SL) provides a concrete mechanism to balance the need for greater data accessibility with individuals’ rights to privacy and consent to data sharing. This approach offers more equitable and meaningful participation in decisions about data sharing and secondary data use. It further aligns with legislative initiatives in both the U.S. and E.U. designed to stimulate secure forms of data sharing, particularly for the continued development and refinement of algorithmic and other data-centric AI models which promise to generate novel insights for precision medicine and address global health priorities.

IX. SUMMARY TABLE

What was already known on the topic?

  • Federated learning (FL) is widely recognized as a promising solution to observe data protection, sharing preferences and proprietary interests of institutions, organizations and corporations (data controllers).

  • Smart contracts (SC) are versatile contractual agreements that are computationally encoded and automatically executed in line with pre-specified (‘if-then’) conditions.

What has this study added to our knowledge?

  • Combining SCs with FL offers a concrete mechanism to observe data sharing preferences not only of data controllers but, importantly, of individuals (patients; citizens) who are the primary subjects of those data.

  • This paper described how SCs can be integrated into FL to ensure more meaningful consent and engagement in secondary data sharing decisions.

ACKNOWLEDGEMENTS

This work was supported by National Institute for Mental Health of the National Institutes of Health under award number 3R01MH125958-02S1, by a Novo Nordisk Foundation Grant for a scientifically independent International Collaborative Bioscience Innovation & Law Programme (Inter-CeBIL programme - grant no. NNF23SA0087056), by the European Union (Grant Agreement no. 101057321; the ‘CLASSICA project’). The views presented here are solely those of the authors and do not reflect those of the funders, who were not involved in the study design, in the collection, analysis, or interpretation of data, in the writing of the report; nor in the decision to submit the paper for publication. Neither the NIH nor the European Union nor the granting authority can be held responsible for them. We would like to acknowledge Dr Stefano Ferretti for his review of an earlier version of this manuscript.

AUTHOR CONTRIBUTIONS

All authors were involved in the conceptualization of this article. KKQ wrote the original draft, and TM, MMC and MRA were involved in Writing: review and editing and supervision.

Footnotes

1

N. Rieke et al., The Future of Digital Health with Federated Learning, 3 Npj  Dig. Med. 1, 1–7 (2020).

2

K. Kostick-Quenet et al., How NFTs Could Transform Health Information Exchange, 375 Science 500, 500–02 (2022).

3

S. Warnat-Herresthal et al., Swarm Learning for Decentralized and Confidential Clinical Machine Learning, 594 Nature 265, 265–70 (2021).

4

Commission E., European Data Governance (Data Governance Act) (2020).

5

R. Raab et al., Federated Electronic Health Records for the European Health Data Space, The  Lancet  Digital  Health (2023).

6

Rieke, supra note 1, at 1–7.

7

T.L. Beauchamp & J.F. Childress, Principles of  Biomedical  Ethics (Oxford Univ. Press 2001).

8

N.W. Dickert et al., Reframing Consent for Clinical Research: A Function-Based Approach, 17 Am. J. Bioethics 3, 3–11 (2017).

9

B.A. Koenig, Have We Asked Too Much of Consent?, 44 Hastings  Ctr. Rep. 33, 33–34 (2014).

10

A. Dahi & M.C. Compagnucci, Device Manufacturers as Controllers—Expanding the Concept of ‘Controllership’ in the GDPR, 47 Comput. l. & Sec. Rev. 105762 (2022).

11

M. Schraefel et al., The Internet of Things: Interaction Challenges to Meaningful Consent at Scale, 24 Interactions 26, 26–33 (2017).

12

A. Cavoukian, Privacy in a Wireless World—The M-Commerce Challenge, 24 I Ways 22, 22–23 (2001).

13

Dickert, supra note 8, at 3–11.

14

K.S. Steinsbekk et al., Broad Consent Versus Dynamic Consent in Biobank Research: Is Passive Participation an Ethical Problem?, 21 Eur. J. Hum. Genetics 897, 897–902 (2013).

15

N. Szabo, Smart  Contracts: Building  Blocks for  Digital  Markets  Copyright, Alamut.com (1996).

16

Id.

17

M. Benchoufi et al., Blockchain Protocols in Clinical Trials: Transparency and Traceability of Consent, 6 F1000Research (2017).

18

T. Nugent et al., Improving Data Transparency in Clinical Trials Using Blockchain Smart Contracts, 5 F1000Research (2016).

19

M.N. Galtier & C. Marini, Substra: A Framework for Privacy-Preserving, Traceable and Collaborative Machine Learning, arXiv Preprint arXiv:191011567 (2019).

20

Id.

21

Id.

22

Koscina, supra note 22, at 231–37.

23

Id.

24

Id.

25

Id.

26

Id.

27

U.S. Food & Drug Admin., Informed Consent: Guidance for IRBs, Clinical Investigators, and Sponsors (2023).

28

M. Koscina  et al.,  Enabling  Trust in  Healthcare  Data  Exchange with a  Federated  Blockchain-Based  Architecture, IEEE/WIC/ACM Int’l Conf. on Web Intelligence-Companion Vol., 231, 231–37 (2019).

29

Nugent, supra note 18.

30

Id.

31

B. Alamri et al., Blockchain-Based Identity Management Systems in Health IoT: A Systematic Review, 10 IEEE Access 59612, 59612–29 (2022).

32

M. Zichichi et al., Data Governance Through a Multi-DLT Architecture in View of the GDPR, 25 Cluster  Computing 4515, 4515–42 (2022).

33

D.M. Maslove et al., Using Blockchain Technology to Manage Clinical Trials Data: A Proof-of-Concept Study, 6 JMIR Med. Informatics e11949 (2018).

34

Id.

35

M.N. Galtier & C. Marini, Substra: A Framework for Privacy-Preserving, Traceable and Collaborative Machine Learning, arXiv Preprint arXiv:191011567 (2019).

36

Maslove, supra note 31.

37

Nugent, supra note 18.

38

Galtier, supra note 35.

39

Warnat-Herresthal, supra note 3, at 265–70.

40

Galtier, supra note 35.

41

Warnat-Herresthal, supra note 3, at 265–70.

42

W. Fdhila et al., Challenges and Opportunities of Blockchain for Auditable Processes in the Healthcare Sector, Intl  Conf. on  Bus. Process  Mgmt. 68, 68–83 (2022).

43

A. Lavin et al., Simulation Intelligence: Towards a New Generation of Scientific Methods, arXiv Preprint arXiv:211203235 (2021).

44

M. Baker, 1500 Scientists Lift the Lid on Reproducibility, 533 Nature (2016).

45

Commission E., supra note 4.

46

K.D. Mandl & I.S. Kohane, Escaping the EHR Trap—The Future of Health IT, 366 N. Engl. J. Med. 2240, 2240–42(2012).

47

Zichichi, supra note 30, at 4515–42.

48

E.Y. Chang et al., DeepLinQ: Distributed Multi-Layer Ledgers for Privacy-Preserving Data Sharing, 2018 IEEE Intl  Conf. on AI & Virtual  Reality 173, 173–78 (2018).

49

G. Zyskind & O. Nathan, Decentralizing Privacy: Using Blockchain to Protect Personal Data, 2015 IEEE Sec. & Privacy  Workshops 180, 180–84 (2015).

50

Steinsbekk, supra note 14, at 897–902.

51

S. Mills, Who Owns the Future? Data Trusts, Data Commons, and the Future of Data Ownership (2019).

52

J. Montgomery, Data Sharing and the Idea of Ownership, 23 New  Bioethics 81, 81–86 (2017).

53

E. Ostrom, Governing the  Commons: The  Evolution of  Institutions for  Collective  Action (Cambridge Univ. Press 1990).

54

T. Kukutai & J. Taylor, Indigenous  Data  Sovereignty: Toward an  Agenda (ANU Press 2016).

55

T.K. Mackey et al., Establishing a Blockchain-Enabled Indigenous Data Sovereignty Framework for Genomic Data, 185 Cell 2626, 2626–31 (2022).

56

Benchoufi, supra note 17.

This is an Open Access article distributed under the terms of the Creative Commons Attribution NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact [email protected]