DuaDeep-SeqAffinity: Dual-Stream Deep Learning Framework for Sequence-Only Antigen-Antibody Affinity Prediction
Abstract
Predicting the binding affinity between antigens and antibodies is fundamental to drug discovery and vaccine development. Traditional computational approaches often rely on experimentally determined 3D structures, which are scarce and computationally expensive to obtain. This paper introduces DuaDeep-SeqAffinity, a novel sequence-only deep learning framework that predicts affinity scores solely from their amino acid sequences using a dual-stream hybrid architecture. Our approach leverages pre-trained ESM-2 protein language model embeddings, combining 1D Convolutional Neural Networks (CNNs) for local motif detection with Transformer encoders for global contextual representation. A subsequent fusion module integrates these multi-faceted features, which are then passed to a fully connected network for final score regression. Experimental results demonstrate that DuaDeep-SeqAffinity significantly outperforms individual architectural components and existing state-of-the-art (SOTA) methods. DuaDeep achieved a superior Pearson correlation of 0.688, an R^2 of 0.460, and a Root Mean Square Error (RMSE) of 0.737, surpassing single-branch variants ESM-CNN and ESM-Transformer. Notably, the model achieved an Area Under the Curve (AUC) of 0.890, outperforming sequence-only benchmarks and even surpassing structure-sequence hybrid models. These findings prove that high-fidelity sequence embeddings can capture essential binding patterns typically reserved for structural modeling. By eliminating the reliance on 3D structures, DuaDeep-SeqAffinity provides a highly scalable and efficient solution for high-throughput screening of vast sequence libraries, significantly accelerating the therapeutic discovery pipeline.
Summary
The paper addresses the critical problem of predicting antigen-antibody binding affinity, a key step in drug discovery and vaccine development. Traditional methods rely on 3D structures, which are expensive and time-consuming to obtain. The authors introduce DuaDeep-SeqAffinity, a novel sequence-only deep learning framework that predicts affinity scores directly from amino acid sequences. The approach uses a dual-stream architecture, leveraging pre-trained ESM-2 protein language model embeddings. One stream employs 1D CNNs to capture local sequence motifs, while the other utilizes Transformer encoders for global contextual representation. These features are then fused and used to predict the affinity score. The experimental results demonstrate that DuaDeep-SeqAffinity outperforms single-branch models (ESM-CNN and ESM-Transformer) and existing state-of-the-art methods. Specifically, DuaDeep achieved a Pearson correlation of 0.688, an R^2 of 0.460, and an RMSE of 0.737. Impressively, it achieved an AUC of 0.890, surpassing even sequence-structure hybrid models. These findings highlight the ability of high-fidelity sequence embeddings to capture binding patterns typically associated with structural modeling. By eliminating the need for 3D structures, DuaDeep-SeqAffinity offers a scalable and efficient solution for high-throughput screening, significantly accelerating the therapeutic discovery pipeline.
Key Insights
- •DuaDeep-SeqAffinity introduces a novel dual-stream hybrid architecture combining CNNs for local motif detection and Transformers for global contextual representation in antigen-antibody affinity prediction.
- •The model achieves a Pearson correlation of 0.688, an R^2 of 0.460, and an RMSE of 0.737, demonstrating superior performance compared to single-branch variants like ESM-CNN and ESM-Transformer.
- •DuaDeep-SeqAffinity reaches an AUC of 0.890, outperforming both sequence-only benchmarks and even structure-sequence hybrid models like WALLE-Affinity (Ranking) (AUC = 0.866), which uses 3D structural information.
- •The findings suggest that pre-trained protein language models (ESM-2) can capture essential binding patterns typically requiring structural modeling, enabling accurate sequence-only affinity prediction.
- •The authors perform a thorough ablation study, comparing DuaDeep to ESM-CNN and ESM-Transformer, showing that the combination of local and global feature extraction is crucial for optimal performance.
- •The paper demonstrates that sophisticated architectural fusion is as critical as the choice of pre-trained embeddings, outperforming ranking models like Mint (AUC = 0.775) and ESM-2 + AntiBERTy (AUC = 0.761) by a wide margin.
- •The model's performance is evaluated on the AbRank dataset, with careful data splitting to ensure generalization on unseen antigen-antibody pairs, providing a reliable benchmark.
Practical Implications
- •DuaDeep-SeqAffinity provides a scalable and efficient solution for high-throughput screening of vast sequence libraries, accelerating the therapeutic discovery pipeline by eliminating the reliance on 3D structures.
- •The model can be used for rapid screening of novel antibody repertoires where structural templates are absent, enabling efficient discovery of high-affinity therapeutic leads.
- •Researchers in antibody engineering, vaccine design, and personalized immunotherapy can benefit from this tool to predict binding affinity and optimize therapeutic candidates.
- •The framework can be extended to predict other protein-protein interactions by adapting the architecture and training data.
- •Future research can focus on integrating cross-attention mechanisms and ranking-based loss functions to further improve interaction modeling and incorporating residue-level interaction maps for enhanced model explainability.