A Hybrid NLP and Deep Learning Framework for Phishing Detection in Emails and URLs
DOI:
https://doi.org/10.33150/JITDETS-9.2.5Keywords:
Phishing Detection, Deep Learning, NLP, CNN‑BiLSTM, TF‑IDF, Hybrid Model, CybersecurityAbstract
Phishing attacks are constantly evolving, exploiting users with malicious URLs and misleading emails, while conventional rule‑based detection methods struggle to keep pace with new threats. To improve detection accuracy and adaptability, this study proposes a hybrid phishing detection framework that combines Deep Learning (DL) and Natural Language Processing (NLP) techniques. For email classification, the system uses TF‑IDF‑based feature extraction, including word‑ and character‑level n‑grams, domain encoding, and link‑count analysis; for URL analysis, character‑level tokenisation and manually created structural features are used. In addition to CNN, LSTM, and Hybrid CNN‑LSTM models for URL classification, three deep learning architectures are developed for email detection: Convolutional Neural Network (CNN), Bidi‑rectional Long Short‑Term Memory (BiLSTM), and a Hybrid CNN‑BiLSTM model. The hybrid architectures efficiently capture intricate phishing patterns by combining sequential dependency learning with spatial feature extraction. Both primary email and large‑scale URL datasets are used, with stratified data partitioning and suitable preprocessing methods, to assess the proposed framework. The methodology addresses the drawbacks of static, single‑model systems in contemporary cybersecurity environments by demonstrating a scalable, flexible approach to phishing detection.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
