A Hybrid NLP and Deep Learning Framework for Phishing Detection in Emails and URLs

Basheer Riskhan; Md Saiful Arefin; Mutasim Billah; Abdullah Al Hadi; Siti Shafrah Shahawai; Siva Raja Sindiramutty; Noor Zaman Jhanjhi

doi:10.33150/JITDETS-9.2.5

Authors

Basheer Riskhan School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
Md Saiful Arefin School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
Mutasim Billah School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
Abdullah Al Hadi School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
Siti Shafrah Shahawai School of Computing and Informatics, Albukhary International University, Kedah, Malaysia
Siva Raja Sindiramutty School of Computer Science, Taylor’s University, Malaysia
Noor Zaman Jhanjhi School of Computer Science, Taylor’s University, Malaysia

DOI:

https://doi.org/10.33150/JITDETS-9.2.5

Keywords:

Phishing Detection, Deep Learning, NLP, CNN‑BiLSTM, TF‑IDF, Hybrid Model, Cybersecurity

Abstract

Phishing attacks are constantly evolving, exploiting users with malicious URLs and misleading emails, while conventional rule‑based detection methods struggle to keep pace with new threats. To improve detection accuracy and adaptability, this study proposes a hybrid phishing detection framework that combines Deep Learning (DL) and Natural Language Processing (NLP) techniques. For email classification, the system uses TF‑IDF‑based feature extraction, including word‑ and character‑level n‑grams, domain encoding, and link‑count analysis; for URL analysis, character‑level tokenisation and manually created structural features are used. In addition to CNN, LSTM, and Hybrid CNN‑LSTM models for URL classification, three deep learning architectures are developed for email detection: Convolutional Neural Network (CNN), Bidi‑rectional Long Short‑Term Memory (BiLSTM), and a Hybrid CNN‑BiLSTM model. The hybrid architectures efficiently capture intricate phishing patterns by combining sequential dependency learning with spatial feature extraction. Both primary email and large‑scale URL datasets are used, with stratified data partitioning and suitable preprocessing methods, to assess the proposed framework. The methodology addresses the drawbacks of static, single‑model systems in contemporary cybersecurity environments by demonstrating a scalable, flexible approach to phishing detection.

A Hybrid NLP and Deep Learning Framework for Phishing Detection in Emails and URLs

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

about

About the Jouranl

browse

Browse

indexation

Abstracting & Indexing