Comparison of Dimensionality Reduction Techniques to Improve Performance and Efficiency of Logistic Regression in Network Anomaly Detection

Keywords: dimensionality reduction, Logistic Regression, network anamoly detection, performance evaluation, Truncated Singular Value Decomposition

Abstract

Network anomaly detection is a crucial process to identify abnormal network traffic, which may pose a security threat. This research aims to improve the performance and efficiency of Logistic Regression (LR) in network anomaly detection by applying dimension reduction techniques, such as Principal Component Analysis (PCA), Truncated Singular Value Decomposition (TSVD), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Independent Component Analysis (ICA). The performance of each dimension reduction method is evaluated based on accuracy, precision, recall, F1-score, and computation time. The results show that TSVD provides the best performance with 95.86% accuracy, 0.96 precision, 0.96 recall, 0.95 F1-score, and 13.83 seconds computation time. In contrast, ICA showed the worst performance, especially in precision, recall, and F1-score, with values of 0.73, 0.83, and 0.78, respectively. Meanwhile, although t-SNE produces competitive accuracy, it has a high computational cost with an execution time of 1698.54 seconds. These findings show that choosing the right dimension reduction algorithm not only improves detection performance but also supports data processing efficiency, making it highly relevant for large-scale network security scenarios. Keywords: dimensionality reduction, Logistic Regression, network anamoly detection, performance evaluation, Truncated Singular Value Decomposition.

Downloads

Download data is not yet available.

Author Biographies

Mokhamad Isna Marzuki Ahfa, Universitas Yudharta Pasuruan

Department of Informatics Engineering

Lukman Hakim, Universitas Yudharta Pasuruan

Department of Informatics Engineering

Muhammad Imron Rosadi, Universitas Yudharta Pasuruan

Department of Informatics Engineering

References

Akritidis, L., & Bozanis, P. (2022). How Dimensionality Reduction Affects Sentiment Analysis NLP Tasks: An Experimental Study. Artificial Intelligence Applications and Innovations. 18, pp. 301–312. Hersonissos, Crete, Greece: Springer. doi:https://doi.org/10.1007/978-3-031-08337-2_25

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6). doi:https://doi.org/10.1186/s12864-019-6413-7

Devassy, B. M., & George, S. (2020). Forensic Science International. Forensic Science International, 311. doi:https://doi.org/10.1016/j.forsciint.2020.110194

Erlin, E., Marlim, Y. N., Junadhi, J., Suryati, L., & Agustina, N. (2022). Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm. Jurnal Nasional Teknik Elektro dan Teknologi Informasi, 11(2), 88-96. doi:https://doi.org/10.22146/jnteti.v11i2.3586

Fikri, K. A., & Djuniadi, D. (2021). Keamanan Jaringan Menggunakan Switch Port Security. InfoTekJar: Jurnal Nasional Informatika dan Teknologi Jaringan, 5(2), 302-307. Retrieved from https://jurnal.uisu.ac.id/index.php/infotekjar/article/view/3501

Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Baltimore, United States: Johns Hopkins University Press.

Gunawan, M. I., Sugiarto, D., & Mardianto, I. (2020). Peningkatan Kinerja Akurasi Prediksi Penyakit Diabetes Mellitus Menggunakan Metode Grid Seacrh pada Algoritma Logistic Regression. JEPIN (Jurnal Edukasi dan Penelitian Informatika), 6(3), 280-284. doi:https://doi.org/10.26418/jp.v6i3.40718

Gupta, A., Anjum, A., Gupta, S., & Katarya, R. (2021). InstaCovNet-19: A deep learning classification model for the detection of COVID-19 patients using Chest X-ray. Applied Soft Computing, 99. doi:https://doi.org/10.1016/j.asoc.2020.106859

Hasan, B. M., & Abdulazeez, A. M. (2021). A Review of Principal Component Analysis Algorithm for Dimensionality Reduction. Journal of Soft Computing and Data Mining, 2(1), 20-30. Retrieved from https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/8032

Hyvärinen, A., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4–5), 411-430. doi:https://doi.org/10.1016/S0893-6080(00)00026-5

Imam, R. M., Sukarno, P., & Nugroho, M. A. (2019). Deteksi Anomali Jaringan Menggunakan Hybrid Algorithm. Proceedings of Engineering (E-Proceeding). 6, pp. 8766-8787. Bandung, Indonesia: Universitas Telkom. Retrieved from https://core.ac.uk/download/pdf/299932449.pdf

Jia, W., Sun, M., Lian, J., & Hou, S. (2022). Feature dimensionality reduction: a review. Complex & Intelligent Systems, 8, 2663–2693. doi:https://doi.org/10.1007/s40747-021-00637-x

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065). doi:https://doi.org/10.1098/rsta.2015.0202

Kumar, V. (2021). Evaluation of computationally intelligent techniques for breast cancer diagnosis. Neural Computing and Applications, 33, 3195–3208. doi:https://doi.org/10.1007/s00521-020-05204-y

Kurita, T. (2021). Principal Component Analysis (PCA). Springer, Cham. doi:https://doi.org/10.1007/978-3-030-63416-2_649

Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., & Kim, K. J. (2019). A survey of deep learning-based network anomaly detection. Cluster Computing, 22, 949–961. doi:https://doi.org/10.1007/s10586-017-1117-8

Noureen, S. S., Bayne, S. B., Shaffer, E., Porschet, D., & Berman, M. (2019). Anomaly Detection in Cyber-Physical System using Logistic Regression Analysis. 2019 IEEE Texas Power and Energy Conference (TPEC). College Station, TX, USA: IEEE. doi:https://doi.org/10.1109/TPEC.2019.8662186

Onkarappa, A. (2019). Network Anamoly Detection. Kaggle. Retrieved from https://www.kaggle.com/datasets/anushonkar/network-anamoly-detection

Pramakrisna, F. D., Adhinata, F. D., & Tanjung, N. A. (2022). Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression. Teknika, 11(2), 90-97. doi:https://doi.org/10.34148/teknika.v11i2.466

Putra, A. P., Wiantari, N. W., Dewi, N. P., & Darmawan, I. D. (2019). Independent Component Analysis (ICA) dan Sparse Component Analysis (SCA) dalam Pemisahan Vokal dan Instrumen pada Seni Geguntangan. JELIKU, 8(1), 105-111. Retrieved from https://www.academia.edu/download/86504929/31504.pdf

Rhamadhani, M. H., & Iswari, L. (2022). Pengembangan Aplikasi Berbasis Web dengan R Shiny untuk Analisis Data Menggunakan Algoritma PCA. Automata, 3(1). Retrieved from https://journal.uii.ac.id/AUTOMATA/article/view/21870

Ruuska, S., Hämäläinen, W., Kajava, S., Mughal, M., Matilainen, P., & Mononen, J. (2018). Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle. Behavioural Processes, 148, 56-62. doi:https://doi.org/10.1016/j.beproc.2018.01.004

Sasikala, K., & Vasuhi, S. (2023). Anomaly Based Intrusion Detection on IOT Devices using Logistic Regression. 2023 International Conference on Networking and Communications (ICNWC). Chennai, India: IEEE. doi:https://doi.org/10.1109/ICNWC57852.2023.10127375

Silva, R., & Melo-Pinto, P. (2023). t-SNE: A study on reducing the dimensionality of hyperspectral data for the regression problem of estimating oenological parameters. Artificial Intelligence in Agriculture, 7, 58-68. doi:https://doi.org/10.1016/j.aiia.2023.02.003

Tuo, X., Zhang, Y., Huang, Y., & Yang, J. (2021). Fast Sparse-TSVD Super-Resolution Method of Real Aperture Radar Forward-Looking Imaging. IEEE Transactions on Geoscience and Remote Sensing, 59(8). doi:https://doi.org/10.1109/TGRS.2020.3027053

Utami, D. Y., Nurlelah, E., & Hasan, F. N. (2021). Comparison of Neural Network Algorithms, Naive Bayes and Logistic Regression to predict diabetes. JITE (Journal of Informatics and Telecommunication Engineering), 5(1), 53-64. doi:https://doi.org/10.31289/jite.v5i1.5201

van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(11), 2579-2605. Retrieved from https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Willy, W., Rini, D. P., & Samsuryadi, S. (2021). Perbandingan Algoritma Random Forest Classifier, Support Vector Machine dan Logistic Regression Clasifier Pada Masalah High Dimension (Studi Kasus: Klasifikasi Fake News). Jurnal Media Informatika Budidarma, 5(4), 1720-1728. doi:https://doi.org/10.30865/mib.v5i4.3177

Yacouby, R., & Axman, D. (2020). Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems (Eval4NLP) (pp. 79–91). Association for Computational Linguistics. doi:https://doi.org/10.18653/v1/2020.eval4nlp-1.9

Zhang, Z., Wang, W., An, A., Qin, Y., & Yang, F. (2023). A human activity recognition method using wearable sensors based on convtransformer model. Evolving Systems, 14, 939–955. doi:https://doi.org/10.1007/s12530-022-09480-y

Published
2025-01-14
How to Cite
Ahfa, M. I. M., Hakim, L., & Rosadi, M. I. (2025). Comparison of Dimensionality Reduction Techniques to Improve Performance and Efficiency of Logistic Regression in Network Anomaly Detection. Journal of Information Technology and Cyber Security, 3(1), 1-13. https://doi.org/10.30996/jitcs.12212
Section
Research Article