Web Scraping and Winnowing Algorithms for Plagiarism Detection of Final Project Titles

Main Article Content

Neng Ika Kurniati Alam Rahmatulloh Ridwan Nur Qomar


Plagiarism in research can occur due to accident or intentional. Plagiarism is an act that violates copyright and includes actions that harm others. In submitting the title of the research, for example, for the final assignment research, not a few students who repeatedly submitted titles were rejected and considered doing plagiarism because the title proposed had already existed before. Then we need a system that can detect the similarity between the titles to be submitted and the existing titles so that it is expected to reduce the occurrence of plagiarism. This study uses a winnowing algorithm to find the percentage similarity between titles. The Google Scholar will be used to obtain data on research titles that have been previously available as comparison titles. Web scraping with CURL (Client URLs) and simple HTML DOM parser is used to retrieve title data from Google Scholar. The results of the study with the application of a Winnowing algorithm to find the percentage similarity to data from Google Scholar were able to present a percentage of similarities in percent with the category of mild, moderate or severe plagiarism, while also helping early detection as prevention of plagiarism.


Download data is not yet available.

Article Details

How to Cite
KURNIATI, Neng Ika; RAHMATULLOH, Alam; QOMAR, Ridwan Nur. Web Scraping and Winnowing Algorithms for Plagiarism Detection of Final Project Titles. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, [S.l.], p. 73-83, aug. 2019. ISSN 2541-5832. Available at: <https://ojs.unud.ac.id/index.php/lontar/article/view/48024>. Date accessed: 21 sep. 2019. doi: https://doi.org/10.24843/LKJITI.2019.v10.i02.p02.


[1] N. Knock dan R. Davison, “Dealing with Plagiarism in the Information Systems,” MIS Quarterly, vol. 27, pp. 511-532, 2003.
[2] Mulyana, “Pencegahan Tindak Plagiarisme Dalam Penulisan Skripsi,” Cakrawala Pendidikan, 2010.
[3] A. Y. Gasparyan, B. Nurmashev, B. Seksenbayev, V. I. Trukhachev, E. I. Kostyukova dan G. D. Kitas, “Plagiarism in the Context of Education and Evolving Detection Strategies,” Journal of Korean Medical Science, vol. 32, no. 8, pp. 1220-1227, 2017.
[4] Google, “Tentang Google Cendikia,” [Online]. Available: https://scholar.google.com/intl/id/scholar/ about.html. [Diakses 9 September 2018].
[5] R. Gunawan, A. Rahmatulloh, I. Darmawan dan F. Firdaus, “Comparison of Web Scraping Techniques: Regular Expression, HTML DOM and Xpath,” dalam 2018 International Conference on Industrial Enterprise and System Engineering (IcoIESE 2018), Atlantis Press, 2019.
[6] B. G. Dastidar, D. Banerjee dan S. Sengupta, “An Intelligent Survey of Personalized Information Retrieval using Web Scraper,” International Journal of Education and Management Engineering, vol. 5, no. 3, pp. 24-31, 2016.
[7] M. Turland, “php| architect's Guide to Web Scraping with PHP,” Marco Tab ini&Associates, 2010.
[8] D. Stenberg, “CURL: curl groks URLs,” 2015.
[9] M. I. Khalid, PHP/CURL Book with Examples Version 1.8, 2006.
[10] V. B. Kadam dan G. K. Pakle, “A Survey on HTML Structure Aware and Tree Based Web Data Scraping Technique,” International Journal of Computer Science and Information Technologies (IJCSIT), vol. 5, no. 2, pp. 1655-1658, 2014.
[11] V. Janjic, “PHP Simple HTML DOM Parser: Editing HTML Elements in PHP,” 7 September 2011. [Online]. Available: https://phpbuilder.com/php-simple-html-dom-parser-editing-html-elements-in-php/. [Diakses 6 Oktober 2018].
[12] X. Duan, M. Wang dan J. Mu, “A Plagiarism Detection Algorithm based on Extended Winnowing,” dalam 2017 International Conference on Electronic Information Technology and Computer Engineering (EITCE 2017), 2017.
[13] S. Schleimer, D. S. Wilkerson dan A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” Proceedings of the ACM SIGMOD international conference on management of data, pp. 76-85, 2003.
[14] H. Tri Nugroho I, “Pengaruh Algoritma Stemming Nazief-Adriani Terhadap Kinerja Algoritma Winnowing Untuk Mendeteksi Plagiarisme Bahasa Indonesia,” ULTIMA Computing, vol.9, no. 1, pp. 36-40, 2017.
[15] N. Alamsyah, “Perbandingan Algoritma Winnowing dengan Algoritma Rabin Karp untuk Mendeteksi Plagiarisme pada Kemiripan Teks Judul Skripsi,” Technologia, vol. 8, no. 3, pp. 124-134, 2017.
[16] I. P. A. Darmawan dan I. N. P. I. P. A. Dharmaadi, “Ekstrak Hirarki Data Dari Situs Web A-Z Animals Menggunakan Web Scraping,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 8, no. 3, pp. 124-134, 2017.
[17] V. Mitra, H. Sujaini dan A. B. Putra Negara, “Rancang Bangun Aplikasi Web Scraping Untuk Korpus Paralel Indonesia - Inggris Dengan Metode HTML DOM,” Jurnal Sistem dan Teknologi Informasi (JUSTIN), vol. 5, no. 1, pp. 36-41, 2017.
[18] Nurdin dan A. Munthoha, “Sistem Pendeteksi Kemiripan Judul Skripsi Menggunakan Algoritma Winnowing,” InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan), vol. 2, no. 1, pp. 90-97, 2017.
[19] I. Ruslan, A. Wibowo dan R. Lim, “Website Penelusuran Artikel Ilmiah dengan Memanfaatkan Parscit, Google Scholar, dan Mendeley Api,” Jurnal Infra, vol. 1, no. 2, 2013.
[20] K. Tiara, U. Rahardja dan I. A. Rosalinda, “Pemanfaatan Google Scholar Dan Citation Dalam Memenuhi Kebutuhan Pembuatan Skripsi Mahasiswa Pada Perguruan Tinggi,” Technomedia Journal (TMJ), vol. 1, no. 1, pp.95113, 2016.
[21] S. Sastroasmoro, “Beberapa Catatan tentang Plagiarisme,” Majalah Kedokteran Indonesia, vol. 57, no. 8, Agustus, 2007.
[22] J. D. Velásquez dan E. M. Taylor, “Tools for External Plagiarism Detection in DOCODE,” dalam WI-IAT '14 Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014.