METODE WEIGHTED MAXIMUM CAPTURING UNTUK KLASTERISASI DOKUMEN BERBASIS FREQUENT ITEMSETS

  • Gede Aditra Pradnyana Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember Kampus ITS Keputih Sukolilo, Surabaya, Jawa Timur
  • Arif Djunaidy Jurusan Sistem Informasi, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember Kampus ITS Keputih Sukolilo, Surabaya, Jawa Timur

Abstract

Document clustering based on frequent itemsets is one of the new document clustering method that can be usedto overcome the problem of high-dimensional space of the document being clustered. Maximum capturingtechnique is one document clustering algorithm based frequent itemsets that can generate better clusteringquality compared to those produced by other similar algorithms. The maximum capturing technique still has thelack or weakness, ie: not accounting for the weight of a word (item) in the calculation of frequent itemsets whenthe document similarity and the cluster formation process does not take into account the global information ofthe cluster previously formed. In this research developed a new method for clustering documents based frequentitemset namely weighted maximum capturing method (WMC), to correct deficiencies maksimum capturingaccuracy so that the quality of document clustering results can be improved. In the weighted maximumcapturing method, document similarity is computed by combining the cosine similarity method and Jaccardcoefficient based on the same number of frequent itemsets owned so that the weight of items in itemsets can betaken into account, while in the process of constructing cluster adapted single linkage agglomerativehierarchical clustering algorithm. Experimental results show the value of F-measure and purity of WMC methodis better than the earlier method, that is equal to 0.723 for the F-measure with improvement ratio of 2.8% and apurity value of 0.73 with improvement ratio 3.3%.

Downloads

Download data is not yet available.

Author Biographies

Gede Aditra Pradnyana, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember Kampus ITS Keputih Sukolilo, Surabaya, Jawa Timur
Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh NopemberKampus ITS Keputih Sukolilo, Surabaya, Jawa Timur
Arif Djunaidy, Jurusan Sistem Informasi, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember Kampus ITS Keputih Sukolilo, Surabaya, Jawa Timur
Jurusan Sistem Informasi, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh NopemberKampus ITS Keputih Sukolilo, Surabaya, Jawa Timur
Published
2013-09-01
How to Cite
PRADNYANA, Gede Aditra; DJUNAIDY, Arif. METODE WEIGHTED MAXIMUM CAPTURING UNTUK KLASTERISASI DOKUMEN BERBASIS FREQUENT ITEMSETS. Jurnal Ilmu Komputer, [S.l.], v. 6, n. 2, sep. 2013. ISSN 2622-321X. Available at: <https://ojs.unud.ac.id/index.php/jik/article/view/8408>. Date accessed: 19 apr. 2024.
Section
Articles

Keywords

document clustering; frequent itemsets;weighted maximum capturing; cosine similarity; jaccard coefficient