KLASIFIKASI AL – QUR’AN TERJEMAHAN BAHASA INDONESIA DENGAN MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM)
Abstract
The classification of verses of the Koran in Indonesian translation aims to classify verses of the Koran that have the same meaning on certain topics. In this study, the labeling of translated Qur'anic verses is grouped into 6 categories including education, motivation, social, history, politics and science (mathematics). The method proposed in this study uses Chi Square feature selection and Principal Analysis with the application of a classification model using the Support Vector Machine (SVM) algorithm to group the translated verses of the Koran into 6 categories. The initial stage is preprocessing, which aims to find the weighting value for each document using TF-IDF. After getting the weighting value for each document, a search for the best classification model is carried out to label the verses of the Qur'an by using feature selection and without using feature selection. In this study, the best classification model results without using feature selection in the SVM algorithm, the AUC value is 83.3%, while using Chi Square feature selection, the AUC is 73.3%, while the PCA feature selection is 63.3%. So that this research is the best model in classifying the Indonesian translation of the Qur'anic verses without using feature selection with the highest AUC value of 83.3%.
Keywords: Feature Selection Techniques; Holy Qur’an; Algorithm Support Vector Machine (SVM); AUC; f1-score.
ABSTRAK
Klasifikasi ayat al-qur’an terjemahan Bahasa Indonesia bertujuan untuk mengelompokkan ayat alqur’an yang mempunyai makna yang sama pada topik tertentu. Pada penelitian ini pelabelan dokumen ayat al - qur’an terjemahan dikelompokkan menjadi 6 kategori diantaranya pendidikan, motivasi, sosial, sejarah, politik dan sains (matematika). Metode yang diusulkan dalam penelitian ini menggunakan feature selection Chi Square dan Principal Component Analysist (PCA) dengan penerapan model klasifikasi menggunakan algoritma Support Vector Machine (SVM) untuk mengelompokkan ayat al - qur’an terjemahan ke dalam 6 kategori. Tahap awal yang dilakukan adalah preprocessing bertujuan untuk mencari nilai pembobotan pada setiap dokumen dengan menggunakan TF-IDF. Setelah mendapatkan nilai pembobotan pada setiap dokumen dilakukan pencarian model klasifikasi terbaik untuk melabeli ayat al-qur’an dengan menggunakan feature selection dan tanpa menggunakan feature selection. Pada penelitian ini didapatkan hasil model klasifikasi terbaik tanpa menggunakan feature selection pada algoritma SVM didapatkan nilai AUC 83.3% sedangan dengan menggunakan feature selection Chi Square mendapatkan nilai AUC 73.3 % sedangkan dengan pada feature selection PCA mendapatkan nilai AUC 63.3 %. Sehingga penelitian ini model yang terbaik dalam mengklasifikasi ayat al-qur’an terjemahan Bahasa Indonesia tanpa menggunakan feature selection dengan nilai AUC tertinggi 83.3 %.
Kata Kunci: Teknik feature selection; Al-Qur’an; Algoritma Support Vector Machine (SVM); AUC; f1-score
References
A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. M. Nawi, “Comparative analysis of text classification algorithms for automated labelling of Quranic verses,” International Journal on Advanced Science, Engineering and Information Technology, vol. 7, no. 4, pp. 1419–1427, 2017.
A. Adeleke and N. Samsudin, “A Hybrid Feature Selection Technique for Classification of Group-based Holy Quran Verses,” International Journal of Engineering & Technology, no. December, pp. 228–233, 2018.
A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. M. Nawi, “A group-based feature selection approach to improve classification of Holy Quran verses,” Advances in Intelligent Systems and Computing, vol. 700, no. January, pp. 282–297, 2018.
A. Adeleke, N. Samsudin, A. Mustapha, and S. Ahmad Khalid, “Automating quranic verses labeling using machine learning approach,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 2, pp. 925–931, 2019.
S. K. Hamed and M. J. Ab Aziz, “Classification of Holy Quran translation using Neural Network technique,” Journal of Engineering and Applied Sciences, vol. 13, no. 12, pp. 4468–4475, 2018.
A. Adeleke, N. A. Samsudin, Z. A. Othman, and S. K. Ahmad Khalid, “A two-step feature selection method for quranic text classification,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 2, pp. 730–736, 2019.
A. Ta’a, S. Zainal Abidin, M. S. Abdullah, A. B. Mat Ali, and M. Ahmad, “Al-Quran themes classification using ontology,” in 4th International Conference on Computing and Informatics (ICOCI 2013), 2013.
M. A. Siddiqui, S. M. Faraz, and S. A. Sattar, “Discovering the Thematic Structure of the Quran using Probabilistic Topic Model,” Proceedings - 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, NOORIC 2013, no. May 2015, pp. 234–239, 2015.
M. F. H. Sianturi, S. Al Faraby, S. Ilmu, K. Fakultas, and I. Universitas, “Klasifikasi Dokumen Menggunakan Kombinasi Algoritma Principal Component Analysis Dan Svm Document Classification Using Combination of Principal Component Analysis Algorithm and Svm,” e-Proceeding of Engineering, vol. 4, no. 3, pp. 5141–5143, 2017.
S. N. Asiyah, “Klasifikasi berita online menggunakan metode support vector machine dan k-nearest neighbor [skripsi],” Surabaya: Institut Teknologi Sepuluh Nopember, vol. 5, no. 2, pp. 1–73, 2016.
M. I. Rahman, N. A. Samsudin, A. Mustapha, and A. Abdullahi, “Comparative analysis for topic classification in Juz Al-Baqarah,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 12, no. 1, pp. 406–411, 2018.
F. Taufiqurrahman, S. Al Faraby, and M. D. Purbolaksono, “Klasifikasi Teks Multi Label pada Hadis Terjemahan Bahasa Indonesia Menggunakan Chi Square dan SVM,” e-Proceeding of Engineering, vol. 8, no. 5, pp. 10650–10659, 2021.
S. Chua and P. N. E. Nohuddin, “Relationship analysis of keyword and chapter in Malay-translated tafseer of Al-Quran,” Journal of Telecommunication, Electronic and Computer Engineering, vol. 9, no. 2–10, pp. 185–189, 2017.
A. Salama, Adiwijaya, and S. Al Faraby, “Klasifikasi Topik Ayat Al-Qur’an Terjemahan Berbahasa Inggris Menggunakan Metode Support Vector Machine Berbasis Vector Space Model dan Word2Vec,” E-proceeding of Engineering, vol. 6, no. 2, pp. 9133–9142, 2019.
T. W. Utami and I. Arianti, “Principal Component Analysis Support Vector Machine (Pca-Svm) Untuk Klasifikasi Kesejahteraan Rumah Tangga Di Kabupaten …,” Proceeding SENDIU 2020, pp. 978–979, 2020.
Authors whose manuscript is published will approve the following provisions:
- The right to publication of all journal material published on the Konvergensi Teknologi Informasi & Komunikasi website is held by the editorial board with the author's knowledge (moral rights remain the property of the author).
- The formal legal provisions for access to digital articles of this electronic journal are subject to the terms of the Creative Commons Attribution-ShareAlike (CC BY-SA) license, which means Konvergensi Teknologi Informasi & Komunikasi reserves the right to store, modify the format, administer in database, maintain and publish articles without requesting permission from the Author as long as it keeps the Author's name as the owner of Copyright.
- Printed and electronic published manuscripts are open access for educational, research and library purposes. In addition to these objectives, the editorial board shall not be liable for violations of copyright law.