Abstract: Our project focuses on a comparative analysis of spam detection models using three datasets, including two custom-built ones, to improve detection accuracy. We prepared the data using preprocessing techniques such as tokenization, stemming, and stop word removal. Various models, including RNNs, SVM, Naive Bayes, and decision trees, were trained and compared based on accuracy and precision. Our goal is to identify the most effective methodology for detecting spam emails. The results aim to enhance spam detection systems by minimizing false positives and ensuring legitimate emails reach the user. Accurate spam detection can prevent phishing, malware, and other harmful activities. Our findings can contribute to the development of more precise and efficient spam detection technologies. This study has the potential to make email communication safer and more reliable.
Keywords: Naive Bayes, spams, Logistic regression, Bag of Words, Term Frequency- Inverse Document Frequency, non-spam (ham), accuracy, precision, recall, F1-score.
|
DOI:
10.17148/IMRJR.2025.020403