A Hybrid NLP and Deep Learning Framework for Phishing Detection in Emails and URLs

Authors

  • Basheer Riskhan School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
  • Md Saiful Arefin School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
  • Mutasim Billah School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
  • Abdullah Al Hadi School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
  • Siti Shafrah Shahawai School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
  • Siva Raja Sindiramutty School of Computer Science, Taylor’s University, Malaysia
  • Noor Zaman Jhanjhi School of Computer Science, Taylor’s University, Malaysia

DOI:

https://doi.org/10.33150/JITDETS-9.2.5

Keywords:

Phishing Detection, Deep Learning, NLP, CNN‑BiLSTM, TF‑IDF, Hybrid Model, Cybersecurity

Abstract

Phishing attacks are constantly evolving, exploiting users with malicious URLs and misleading emails, while conventional rule‑based detection methods struggle to keep pace with new threats. To improve detection accuracy and adaptability, this study proposes a hybrid phishing detection framework that combines Deep Learning (DL) and Natural Language Processing (NLP) techniques. For email classification, the system uses TF‑IDF‑based feature extraction, including word‑ and character‑level n‑grams, domain encoding, and link‑count analysis; for URL analysis, character‑level tokenisation and manually created structural features are used. In addition to CNN, LSTM, and Hybrid CNN‑LSTM models for URL classification, three deep learning architectures are developed for email detection: Convolutional Neural Network (CNN), Bidi‑rectional Long Short‑Term Memory (BiLSTM), and a Hybrid CNN‑BiLSTM model. The hybrid architectures efficiently capture intricate phishing patterns by combining sequential dependency learning with spatial feature extraction. Both primary email and large‑scale URL datasets are used, with stratified data partitioning and suitable preprocessing methods, to assess the proposed framework. The methodology addresses the drawbacks of static, single‑model systems in contemporary cybersecurity environments by demonstrating a scalable, flexible approach to phishing detection.

Downloads

Published

2025-12-19

How to Cite

[1]
Basheer Riskhan, “A Hybrid NLP and Deep Learning Framework for Phishing Detection in Emails and URLs”, J. ICT des. eng. technol. sci., vol. 9, no. 2, p. 105‑116, Dec. 2025.

Issue

Section

Articles