Dialect Classification of the Javanese Language Using the K-Nearest Neighbor

Brilliant Filby; Utomo Pujianto; Jehad A. H. Hammad; Aji Prasetya Wibawa

doi:10.30996/jitcs.12213

Dialect Classification of the Javanese Language Using the K-Nearest Neighbor

Authors

Brilliant Filby Universitas Negeri Malang
Utomo Pujianto Universitas Negeri Malang https://orcid.org/0000-0001-9195-363X
Jehad A. H. Hammad Al-Quds Open University https://orcid.org/0000-0002-2108-4069
Aji Prasetya Wibawa Universitas Negeri Malang https://orcid.org/0000-0002-6653-2697

DOI:

https://doi.org/10.30996/jitcs.12213

Keywords:

case folding, Javanese dialect, K-Nearest Neighbor, Natural Langugae Processing, Synthetic Minority Oversampling Technique, tokenizing

Abstract

Indonesia is rich in ethnic and cultural diversity, each reflected in its unique linguistic characteristics. One way to preserve the Javanese language is by conducting research on its dialects. This study aims to classify three main dialects in Java Island—East Java, Central Java, and West Java—using text data from online sources. The classification process includes preprocessing (tokenizing, case folding, and word weighting), data balancing with the Synthetic Minority Oversampling Technique (SMOTE), and classification using the K-Nearest Neighbor (K-NN) algorithm. This study highlights the importance of dialect recognition in supporting the preservation of the Javanese language and the development of linguistic technology applications. Testing using 10-fold cross-validation showed the best performance at , with an accuracy of 94.05%, precision of 95.83%, and recall of 94.44%. These findings significantly support computational linguistics research and the preservation of regional languages.

Downloads

Download data is not yet available.

Author Biographies

Brilliant Filby, Universitas Negeri Malang

Department of Informatics Engineering

Utomo Pujianto, Universitas Negeri Malang

Department of Informatics Engineering

Jehad A. H. Hammad, Al-Quds Open University

Department of Computer Information Systems

Aji Prasetya Wibawa, Universitas Negeri Malang

Department of Electrical and Informatics Engineering

References

Anandarajan, M., Hill, C., & Nolan, T. (2019). Practical Text Analytics: Maximizing the Value of Text Data. Springer Cham. doi:https://doi.org/10.1007/978-3-319-95663-3

Ardhana, A. P. (2018). Klasifikasi Tingkatan Bahasa pada Artikel Berbahasa Jawa dengan Metode Multinomial Naïve Bayes. Surakarta, Indonesia: Universitas Sebelas Maret. Retrieved from https://digilib.uns.ac.id/dokumen/detail/58424/

Asiyah, S. N. (2016). Online News Classification Using Support Vector Machine and K-Nearest Neighbor. Surabaya, Indonesia: Institut Teknologi Sepuluh Nopember. Retrieved from https://repository.its.ac.id/62883/1/1314105016-Undergradute%20Thesis.pdf

Ayub, M. (2007). Proses Data Mining dalam Sistem Pembelajaran Berbantuan Komputer. Jurnal Sistem Informasi, 2(1), 21-30. Retrieved from https://www.researchgate.net/profile/Mewati-Ayub/publication/237692809_Proses_Data_Mining_dalam_Sistem_Pembelajaran_Berbantuan_Komputer/links/5aeefe5c0f7e9b01d3e2bd70/Proses-Data-Mining-dalam-Sistem-Pembelajaran-Berbantuan-Komputer.pdf?_tp=eyJjb250ZXh0Ijp

Briliani, A., Irawan, B., & Setianingsih, C. (2019). Hate Speech Detection in Indonesian Language on Instagram Comment Section Using K-Nearest Neighbor Classification Method. 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS) (pp. 98-104). Bali, Indonesia: IEEE. doi:https://doi.org/10.1109/IoTaIS47347.2019.8980398

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. doi:https://doi.org/10.1613/jair.953

Denny, M. J., & Spirling, A. (2018). Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It. Political Analysis, 26(2), 168-189. doi:https://doi.org/10.1017/pan.2017.44

Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465. doi:https://doi.org/10.1016/j.ins.2018.06.056

Elder, J., Miner, G., & Nisbet, B. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Middlesex County, United States: Academic Press. Retrieved from https://books.google.co.id/books?id=-B6amxqygTMC&dq=G.+Miner,+Practical+Text+Mining+and+Statistical+Analysis+for+Non-Structured+Text+Data+Applications.+Elsevier+Science,+2012.&lr=&hl=id&source=gbs_navlinks_s

Ethnologue. (2013, Feb 28). Methodology. Retrieved from Ethnologue: https://www.ethnologue.com/methodology/

Florensa, R. (2021). Peningkatan Kecepatan Pencarian K-Nearest Neighbour Berbasis Clustering pada Dialek Bahasa Minang. Yogyakarta, Indonesia: Universitas Gadjah Mada. Retrieved from https://etd.repository.ugm.ac.id/penelitian/detail/205771

Irfa, A. A., Adiwijaya, A., & Mubarok, M. S. (2018). Klasifikasi Topik Berita Berbahasa Indonesia Menggunakan k-Nearest Neighbor. Proceedings of Engineering. 5, pp. 3631-3640. Bandung, Indonesia: Universitas Telkom. Retrieved from https://core.ac.uk/download/pdf/299923375.pdf

Irfan, R. (2020). Analisis Perbandingan Algoritma K-Nearest Neighbor dan Support Vector Machine pada Pengklasifikasian Hadits Shahih Muslim. Jakarta, Indonesia: Universitas Islam Negeri Syarif Hidayatullah. Retrieved from https://repository.uinjkt.ac.id/dspace/bitstream/123456789/55999/1/RENALDY%20IRFAN-FST.pdf

Isnain, A. R., Supriyanto, J., & Kharisma, M. P. (2021). Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 15(2), 121-130. Retrieved from https://jurnal.ugm.ac.id/ijccs/issue/view/4602

Jumeilah, F. S. (2017). Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 1(1), 19 - 25. doi:https://doi.org/10.29207/resti.v1i1.11

Junaidi, J., Yani, J., & Rismayeti, R. (2016). Variasi Inovasi Leksikal Bahasa Melayu Riau di Kecamatan Pulau Merbau. Jurnal Pustaka Budaya, 3(1), 1-16. Retrieved from https://journal.unilak.ac.id/index.php/pb/article/view/582

Kadhim, A. I. (2018). An Evaluation of Preprocessing Techniques for Text Classification. International Journal of Computer Science and Information Security (IJCSIS), 16(6). Retrieved from https://www.researchgate.net/profile/Ammar-Kadhim-4/publication/329339664_An_Evaluation_of_Preprocessing_Techniques_for_Text_Classification/links/5c1b6aa6a6fdccfc705ae648/An-Evaluation-of-Preprocessing-Techniques-for-Text-Classification.pdf?_tp=eyJjb250ZX

Kannan, S., & Gurusamy, V. (2014). Preprocessing Techniques for Text Mining. International Journal of Computer Science & Communication Networks, 5(1), 7-16. Retrieved from https://www.researchgate.net/profile/Vairaprakash-Gurusamy/publication/273127322_Preprocessing_Techniques_for_Text_Mining/links/54f8319e0cf210398e949292/Preprocessing-Techniques-for-Text-Mining.pdf

Khamar, K. (2013). Short Text Classification Using kNN Based on Distance Function. International Journal of Advanced Research in Computer and Communication Engineering, 2(4), 1916-1919. Retrieved from https://www.academia.edu/download/38502879/knn.pdf

Kumar, A., & Paul, A. (2016). Mastering Text Mining with R. Birmingham, UK: Packt Publishing. Retrieved from https://www.oreilly.com/library/view/mastering-text-mining/9781783551811/

Liao, Y., & Vemuri, V. (2002). Use of K-Nearest Neighbor classifier for intrusion detection. Computers & Security, 21(5), 439-448. doi:https://doi.org/10.1016/S0167-4048(02)00514-X

Mughnyanti, M. (2020). Analisis penggunaan Manhattan distance dan euclidean distance pada algoritma x-means dalam pengelompokan data. Medan, Indonesia: Universitas Sumatera Utara. Retrieved from https://digilib.usu.ac.id/en/detail.php?ib=201023104920853&i=

Nurjanah, W. E., Perdana, R. S., & Fauzi, M. A. (2017). Analisis Sentimen Terhadap Tayangan Televisi Berdasarkan Opini Masyarakat pada Media Sosial Twitter menggunakan Metode K-Nearest Neighbor dan Pembobotan Jumlah Retweet. JPTIIK (Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer), 1(12), 1750–1757. Retrieved from https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/631

Palinoan, V. W. (2014). Sistem Klasifikasi Dokumen Bahasa Jawa dengan Metode K-Nearest Neighbour. Sleman, Indonesia: Universitas Sanata Dharma. Retrieved from https://repository.usd.ac.id/4346/

Pamungkas, R. D., & Hidayatullah, A. F. (2021). Tinjauan Literatur : Identifikasi Dialek Dengan Deep Learning. Automata. 2. Yogyakarta, Indonesia: Universitas Islam Indonesia. Retrieved from https://journal.uii.ac.id/AUTOMATA/article/view/17390

Purnomo, G. W. (2021). Identifikasi Asal Daerah Berdasarkan Logat Manusia dengan Metode Linear Predictive Coding (LPC) dan K-Nearest Neighbor (K-NN). Bandung, Indonesia: Universitas Telkom. Retrieved from https://openlibrary.telkomuniversity.ac.id/home/catalog/id/175067/slug/identifikasi-asal-daerah-berdasarkan-logat-manusia-dengan-metode-linear-predictive-coding-lpc-dan-k-nearest-neighbor-k-nn-.html

Sarkar, D. (2019). Text Analytics with Python: A Practitioner's Guide to Natural Language Processing. Apress Berkeley. doi:https://doi.org/10.1007/978-1-4842-4354-1

Sarwono, J. (2012). Metode Riset Online: Teori, Praktik, dan Pembuatan Apliaksi (Menggunakan HTML, PHP, dan CSS). Jakarta, Indonesia: Elex Media Komputindo. Retrieved from https://books.google.co.id/books?id=dttMDwAAQBAJ&hl=id&source=gbs_navlinks_s

Srividhya, V., & Anitha, R. (2010). Evaluating Preprocessing Techniques in Text Categorization. International Journal of Computer Science and Application, 47(11), 49-51. Retrieved from http://sinhgad.edu/ijcsa-2012/pdfpapers/1_11.pdf

Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based Framework for Text Categorization. Procedia Engineering, 69, pp. 1356-1364. doi:https://doi.org/10.1016/j.proeng.2014.03.129

Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104-112. doi:https://doi.org/10.1016/j.ipm.2013.08.006

Vijayarani, S., Ilamathi, J., & Nithya, N. (2015). Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16. Retrieved from https://www.ttcenter.ir/ArticleFiles/ENARTICLE/3783.pdf

Wahyono, W., Trisna, I. N., Sariwening, S. L., Fajar, M., & Wijayanto, D. (2020). Comparison of distance measurement on k-nearest neighbour in textual data classification. Jurnal Teknologi dan Sistem Komputer, 8(1), 54-58. doi:https://doi.org/10.14710/jtsiskom.8.1.2020.54-58

Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Singapore: Springer. doi:https://doi.org/10.1007/978-981-16-0100-2

Downloads

Published

2024-12-31

How to Cite

Filby, B., Pujianto, U., Hammad, J. A. H., & Wibawa, A. P. (2024). Dialect Classification of the Javanese Language Using the K-Nearest Neighbor. Journal of Information Technology and Cyber Security, 2(2), 111–122. https://doi.org/10.30996/jitcs.12213

Download Citation

Issue

Vol. 2 No. 2 (2024): July

Section

Research Article

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright Notice based on COPE (Committee on Publication Ethics) for JITCS: Journal of Information Technology and Cyber Security

Ownership and Copyright:
1. JITCS: Journal of Information Technology and Cyber Security respects the intellectual property rights of authors. The copyright for individual articles published in JITCS is retained by the respective authors, unless otherwise specified.
2. The articles published in JITCS are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0), which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial, and no modifications or adaptations are made.
3. JITCS serves as the initial publisher of the articles, providing them with the first publication platform.
Permissions and Usage:
1. Distribution for Non-Commercial Purposes: Permitted: Users are allowed to distribute the article for non-commercial purposes, provided the original work is properly cited and no modifications or adaptations are made.
2. Distribution for Commercial Purposes: Not Permitted: The article may not be distributed for any commercial purposes without obtaining prior written permission from the author(s).
3. Inclusion in a Collective Work (e.g., Anthology) for Non-Commercial Purposes: Permitted: Users are allowed to include the article in a collective work, such as an anthology, as long as the use is non-commercial and the work remains unchanged.
4. Inclusion in a Collective Work for Commercial Purposes: Not Permitted: The article may not be included in any collective work or anthology intended for commercial purposes without prior permission from the author(s).
5. Creation and Distribution of Revised Versions, Adaptations, or Derivative Works (e.g., Translation) for Non-Commercial Purposes: Not Permitted: Users may not create or distribute revised versions, adaptations, or derivative works, including translations, for non-commercial purposes.
6. Creation and Distribution of Revised Versions, Adaptations, or Derivative Works for Commercial Purposes: Not Permitted: Users may not create or distribute revised versions, adaptations, or derivative works, including translations, for commercial purposes.
7. Text or Data Mining for Non-Commercial Purposes: Permitted: Users are permitted to engage in text or data mining of the article for non-commercial research purposes, provided the original work is properly attributed.
8. Text or Data Mining for Commercial Purposes: Not Permitted: Users may not engage in text or data mining of the article for commercial purposes without obtaining explicit permission from the author(s).
Attribution and Citation:
1. Proper attribution and citation of the published work should be provided when using or referring to content from JITCS. This includes clearly indicating the authors, the title of the article, the journal name (JITCS), the volume/issue number, the publication year, and the article's DOI (Digital Object Identifier) when available.
2. When adapting or modifying the published content, proper attribution to the original source should be given, and the adapted or modified content should be shared under the same CC BY-NC-ND 4.0 license.
Plagiarism and Copyright Infringement:
1. JITCS considers plagiarism and copyright infringement as serious ethical violations. Authors are responsible for ensuring that their submitted work is original and does not infringe upon the copyright or intellectual property rights of others.
2. Any allegations of plagiarism or copyright infringement will be investigated promptly and thoroughly. If proven, appropriate actions, including rejection of the manuscript, retraction of the published article, or other corrective measures, will be taken.
Open Access Licensing:
1. JITCS supports open access publishing and encourages authors to consider publishing their work under the CC BY-NC-ND 4.0 license to promote the dissemination and use of knowledge in the field of information technology and cyber security.
2. The specific terms and conditions of the CC BY-NC-ND 4.0 license will be clearly indicated on the published articles.
Policy Review: This Copyright Notice will be periodically reviewed and updated to ensure its continued relevance and compliance with copyright laws, ethical standards, and open access principles in scholarly publishing. Any updates or revisions to the notice will be communicated to the relevant stakeholders.

By adhering to this Copyright Notice, JITCS aims to protect the rights of authors, promote proper attribution and citation practices, and facilitate the responsible and legal use of the published content in accordance with the CC BY-NC-ND 4.0 license.

ISSN
ISSN (Print)	: 2987-3878
ISSN (Online)	: 2987-386X

Female Authors:	36%
Acceptance rate:	43%
Desk Reject Rate:	25%
After Review Reject Rate:	28%
Submission to 1st decision:	20 days
Submission to acceptance:	77 days
Acceptance to publication:	40 days
Note: The time here is an average.

Dialect Classification of the Javanese Language Using the K-Nearest Neighbor

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Brilliant Filby, Universitas Negeri Malang

Utomo Pujianto, Universitas Negeri Malang

Jehad A. H. Hammad, Al-Quds Open University

Aji Prasetya Wibawa, Universitas Negeri Malang

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission