Dialect Classification of the Javanese Language Using the K-Nearest Neighbor
Abstract
Indonesia is rich in ethnic and cultural diversity, each reflected in its unique linguistic characteristics. One way to preserve the Javanese language is by conducting research on its dialects. This study aims to classify three main dialects in Java Island—East Java, Central Java, and West Java—using text data from online sources. The classification process includes preprocessing (tokenizing, case folding, and word weighting), data balancing with the Synthetic Minority Oversampling Technique (SMOTE), and classification using the K-Nearest Neighbor (K-NN) algorithm. This study highlights the importance of dialect recognition in supporting the preservation of the Javanese language and the development of linguistic technology applications. Testing using 10-fold cross-validation showed the best performance at , with an accuracy of 94.05%, precision of 95.83%, and recall of 94.44%. These findings significantly support computational linguistics research and the preservation of regional languages.
Downloads
References
Anandarajan, M., Hill, C., & Nolan, T. (2019). Practical Text Analytics: Maximizing the Value of Text Data. Springer Cham. doi:https://doi.org/10.1007/978-3-319-95663-3
Ardhana, A. P. (2018). Klasifikasi Tingkatan Bahasa pada Artikel Berbahasa Jawa dengan Metode Multinomial Naïve Bayes. Surakarta, Indonesia: Universitas Sebelas Maret. Retrieved from https://digilib.uns.ac.id/dokumen/detail/58424/
Asiyah, S. N. (2016). Online News Classification Using Support Vector Machine and K-Nearest Neighbor. Surabaya, Indonesia: Institut Teknologi Sepuluh Nopember. Retrieved from https://repository.its.ac.id/62883/1/1314105016-Undergradute%20Thesis.pdf
Ayub, M. (2007). Proses Data Mining dalam Sistem Pembelajaran Berbantuan Komputer. Jurnal Sistem Informasi, 2(1), 21-30. Retrieved from https://www.researchgate.net/profile/Mewati-Ayub/publication/237692809_Proses_Data_Mining_dalam_Sistem_Pembelajaran_Berbantuan_Komputer/links/5aeefe5c0f7e9b01d3e2bd70/Proses-Data-Mining-dalam-Sistem-Pembelajaran-Berbantuan-Komputer.pdf?_tp=eyJjb250ZXh0Ijp
Briliani, A., Irawan, B., & Setianingsih, C. (2019). Hate Speech Detection in Indonesian Language on Instagram Comment Section Using K-Nearest Neighbor Classification Method. 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS) (pp. 98-104). Bali, Indonesia: IEEE. doi:https://doi.org/10.1109/IoTaIS47347.2019.8980398
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. doi:https://doi.org/10.1613/jair.953
Denny, M. J., & Spirling, A. (2018). Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It. Political Analysis, 26(2), 168-189. doi:https://doi.org/10.1017/pan.2017.44
Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465. doi:https://doi.org/10.1016/j.ins.2018.06.056
Elder, J., Miner, G., & Nisbet, B. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Middlesex County, United States: Academic Press. Retrieved from https://books.google.co.id/books?id=-B6amxqygTMC&dq=G.+Miner,+Practical+Text+Mining+and+Statistical+Analysis+for+Non-Structured+Text+Data+Applications.+Elsevier+Science,+2012.&lr=&hl=id&source=gbs_navlinks_s
Ethnologue. (2013, Feb 28). Methodology. Retrieved from Ethnologue: https://www.ethnologue.com/methodology/
Florensa, R. (2021). Peningkatan Kecepatan Pencarian K-Nearest Neighbour Berbasis Clustering pada Dialek Bahasa Minang. Yogyakarta, Indonesia: Universitas Gadjah Mada. Retrieved from https://etd.repository.ugm.ac.id/penelitian/detail/205771
Irfa, A. A., Adiwijaya, A., & Mubarok, M. S. (2018). Klasifikasi Topik Berita Berbahasa Indonesia Menggunakan k-Nearest Neighbor. Proceedings of Engineering. 5, pp. 3631-3640. Bandung, Indonesia: Universitas Telkom. Retrieved from https://core.ac.uk/download/pdf/299923375.pdf
Irfan, R. (2020). Analisis Perbandingan Algoritma K-Nearest Neighbor dan Support Vector Machine pada Pengklasifikasian Hadits Shahih Muslim. Jakarta, Indonesia: Universitas Islam Negeri Syarif Hidayatullah. Retrieved from https://repository.uinjkt.ac.id/dspace/bitstream/123456789/55999/1/RENALDY%20IRFAN-FST.pdf
Isnain, A. R., Supriyanto, J., & Kharisma, M. P. (2021). Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 15(2), 121-130. Retrieved from https://jurnal.ugm.ac.id/ijccs/issue/view/4602
Jumeilah, F. S. (2017). Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 1(1), 19 - 25. doi:https://doi.org/10.29207/resti.v1i1.11
Junaidi, J., Yani, J., & Rismayeti, R. (2016). Variasi Inovasi Leksikal Bahasa Melayu Riau di Kecamatan Pulau Merbau. Jurnal Pustaka Budaya, 3(1), 1-16. Retrieved from https://journal.unilak.ac.id/index.php/pb/article/view/582
Kadhim, A. I. (2018). An Evaluation of Preprocessing Techniques for Text Classification. International Journal of Computer Science and Information Security (IJCSIS), 16(6). Retrieved from https://www.researchgate.net/profile/Ammar-Kadhim-4/publication/329339664_An_Evaluation_of_Preprocessing_Techniques_for_Text_Classification/links/5c1b6aa6a6fdccfc705ae648/An-Evaluation-of-Preprocessing-Techniques-for-Text-Classification.pdf?_tp=eyJjb250ZX
Kannan, S., & Gurusamy, V. (2014). Preprocessing Techniques for Text Mining. International Journal of Computer Science & Communication Networks, 5(1), 7-16. Retrieved from https://www.researchgate.net/profile/Vairaprakash-Gurusamy/publication/273127322_Preprocessing_Techniques_for_Text_Mining/links/54f8319e0cf210398e949292/Preprocessing-Techniques-for-Text-Mining.pdf
Khamar, K. (2013). Short Text Classification Using kNN Based on Distance Function. International Journal of Advanced Research in Computer and Communication Engineering, 2(4), 1916-1919. Retrieved from https://www.academia.edu/download/38502879/knn.pdf
Kumar, A., & Paul, A. (2016). Mastering Text Mining with R. Birmingham, UK: Packt Publishing. Retrieved from https://www.oreilly.com/library/view/mastering-text-mining/9781783551811/
Liao, Y., & Vemuri, V. (2002). Use of K-Nearest Neighbor classifier for intrusion detection. Computers & Security, 21(5), 439-448. doi:https://doi.org/10.1016/S0167-4048(02)00514-X
Mughnyanti, M. (2020). Analisis penggunaan Manhattan distance dan euclidean distance pada algoritma x-means dalam pengelompokan data. Medan, Indonesia: Universitas Sumatera Utara. Retrieved from https://digilib.usu.ac.id/en/detail.php?ib=201023104920853&i=
Nurjanah, W. E., Perdana, R. S., & Fauzi, M. A. (2017). Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet. JPTIIK (Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer), 1(12), 1750–1757. Retrieved from https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/631
Palinoan, V. W. (2014). Sistem Klasifikasi Dokumen Bahasa Jawa dengan Metode K-Nearest Neighbour. Sleman, Indonesia: Universitas Sanata Dharma. Retrieved from https://repository.usd.ac.id/4346/
Pamungkas, R. D., & Hidayatullah, A. F. (2021). Tinjauan Literatur : Identifikasi Dialek Dengan Deep Learning. Automata. 2. Yogyakarta, Indonesia: Universitas Islam Indonesia. Retrieved from https://journal.uii.ac.id/AUTOMATA/article/view/17390
Purnomo, G. W. (2021). Identifikasi Asal Daerah Berdasarkan Logat Manusia dengan Metode Linear Predictive Coding (LPC) dan K-Nearest Neighbor (K-NN). Bandung, Indonesia: Universitas Telkom. Retrieved from https://openlibrary.telkomuniversity.ac.id/home/catalog/id/175067/slug/identifikasi-asal-daerah-berdasarkan-logat-manusia-dengan-metode-linear-predictive-coding-lpc-dan-k-nearest-neighbor-k-nn-.html
Sarkar, D. (2019). Text Analytics with Python: A Practitioner's Guide to Natural Language Processing. Apress Berkeley. doi:https://doi.org/10.1007/978-1-4842-4354-1
Sarwono, J. (2012). Metode Riset Online: Teori, Praktik, dan Pembuatan Apliaksi (Menggunakan HTML, PHP, dan CSS). Jakarta, Indonesia: Elex Media Komputindo. Retrieved from https://books.google.co.id/books?id=dttMDwAAQBAJ&hl=id&source=gbs_navlinks_s
Srividhya, V., & Anitha, R. (2010). Evaluating Preprocessing Techniques in Text Categorization. International Journal of Computer Science and Application, 47(11), 49-51. Retrieved from http://sinhgad.edu/ijcsa-2012/pdfpapers/1_11.pdf
Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based Framework for Text Categorization. Procedia Engineering, 69, pp. 1356-1364. doi:https://doi.org/10.1016/j.proeng.2014.03.129
Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104-112. doi:https://doi.org/10.1016/j.ipm.2013.08.006
Vijayarani, S., Ilamathi, J., & Nithya, N. (2015). Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16. Retrieved from https://www.ttcenter.ir/ArticleFiles/ENARTICLE/3783.pdf
Wahyono, W., Trisna, I. N., Sariwening, S. L., Fajar, M., & Wijayanto, D. (2020). Comparison of distance measurement on k-nearest neighbour in textual data classification. Jurnal Teknologi dan Sistem Komputer, 8(1), 54-58. doi:https://doi.org/10.14710/jtsiskom.8.1.2020.54-58
Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Singapore: Springer. doi:https://doi.org/10.1007/978-981-16-0100-2
Copyright (c) 2024 The Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright Notice based on COPE (Committee on Publication Ethics) for JITCS: Journal of Information Technology and Cyber Security
-
Ownership and Copyright:
- JITCS: Journal of Information Technology and Cyber Security respects the intellectual property rights of authors. The copyright for individual articles published in JITCS is retained by the respective authors, unless otherwise specified.
- The articles published in JITCS are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0), which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial, and no modifications or adaptations are made.
- JITCS serves as the initial publisher of the articles, providing them with the first publication platform.
-
Permissions and Usage:
- Distribution for Non-Commercial Purposes: Permitted: Users are allowed to distribute the article for non-commercial purposes, provided the original work is properly cited and no modifications or adaptations are made.
- Distribution for Commercial Purposes: Not Permitted: The article may not be distributed for any commercial purposes without obtaining prior written permission from the author(s).
- Inclusion in a Collective Work (e.g., Anthology) for Non-Commercial Purposes: Permitted: Users are allowed to include the article in a collective work, such as an anthology, as long as the use is non-commercial and the work remains unchanged.
- Inclusion in a Collective Work for Commercial Purposes: Not Permitted: The article may not be included in any collective work or anthology intended for commercial purposes without prior permission from the author(s).
- Creation and Distribution of Revised Versions, Adaptations, or Derivative Works (e.g., Translation) for Non-Commercial Purposes: Not Permitted: Users may not create or distribute revised versions, adaptations, or derivative works, including translations, for non-commercial purposes.
- Creation and Distribution of Revised Versions, Adaptations, or Derivative Works for Commercial Purposes: Not Permitted: Users may not create or distribute revised versions, adaptations, or derivative works, including translations, for commercial purposes.
- Text or Data Mining for Non-Commercial Purposes: Permitted: Users are permitted to engage in text or data mining of the article for non-commercial research purposes, provided the original work is properly attributed.
- Text or Data Mining for Commercial Purposes: Not Permitted: Users may not engage in text or data mining of the article for commercial purposes without obtaining explicit permission from the author(s).
-
Attribution and Citation:
- Proper attribution and citation of the published work should be provided when using or referring to content from JITCS. This includes clearly indicating the authors, the title of the article, the journal name (JITCS), the volume/issue number, the publication year, and the article's DOI (Digital Object Identifier) when available.
- When adapting or modifying the published content, proper attribution to the original source should be given, and the adapted or modified content should be shared under the same CC BY-NC-ND 4.0 license.
-
Plagiarism and Copyright Infringement:
- JITCS considers plagiarism and copyright infringement as serious ethical violations. Authors are responsible for ensuring that their submitted work is original and does not infringe upon the copyright or intellectual property rights of others.
- Any allegations of plagiarism or copyright infringement will be investigated promptly and thoroughly. If proven, appropriate actions, including rejection of the manuscript, retraction of the published article, or other corrective measures, will be taken.
-
Open Access Licensing:
- JITCS supports open access publishing and encourages authors to consider publishing their work under the CC BY-NC-ND 4.0 license to promote the dissemination and use of knowledge in the field of information technology and cyber security.
- The specific terms and conditions of the CC BY-NC-ND 4.0 license will be clearly indicated on the published articles.
-
Policy Review: This Copyright Notice will be periodically reviewed and updated to ensure its continued relevance and compliance with copyright laws, ethical standards, and open access principles in scholarly publishing. Any updates or revisions to the notice will be communicated to the relevant stakeholders.
By adhering to this Copyright Notice, JITCS aims to protect the rights of authors, promote proper attribution and citation practices, and facilitate the responsible and legal use of the published content in accordance with the CC BY-NC-ND 4.0 license.