PENDEKATAN POSITIONAL TEXT GRAPH UNTUK PEMILIHAN KALIMAT REPRESENTATIF CLUSTER PADA PERINGKASAN MULTI-DOKUMEN

  • I Putu Gede Hendra Suputra Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember (ITS) Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
  • Agus Zainal Arifin Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
  • Anny Yuniarti Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Abstract

Coverage and saliency are major problems in Automatic Text Summarization. Sentence clusteringapproaches are methods able to provide good coverage on all topics, but the point to be considered is theselection of important sentence that can represent the cluster’s topic. The salient sentences selected asconstituent to the final summary should have information density so that can convey important informationcontained in the cluster. Information density from the sentence can be mined by extracting the sentenceinformation density (SID) feature that built from positional text graph approach of every sentence in the cluster.This paper proposed a cluster representative sentence selection strategy that used the positional text graphapproach in multi-document summarization. There are three concepts that used in this paper: (1) sentenceclustering based on similarity based histogram clustering, (2) cluster ordering based on cluster importance and(3) representative sentence selection based on sentence information density feature score. The candidatesummary sentence is a sentence that has greatest sentence information density feature score of a cluster. Trialsconducted on task 2 DUC 2004 dataset. ROUGE-1 measurement was used as performance metric to comparethe use of SID feature with other method namely Local Importance and Global Importance (LIGI). Test resultshowed that the use of SID feature was successfully outperform LIGI method based on ROUGE-1 values wherethe greatest average value of ROUGE-1 that achieved by SID features is 0.3915.

Downloads

Download data is not yet available.

Author Biographies

I Putu Gede Hendra Suputra, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember (ITS) Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Agus Zainal Arifin, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Anny Yuniarti, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Published
2013-09-01
How to Cite
SUPUTRA, I Putu Gede Hendra; ARIFIN, Agus Zainal; YUNIARTI, Anny. PENDEKATAN POSITIONAL TEXT GRAPH UNTUK PEMILIHAN KALIMAT REPRESENTATIF CLUSTER PADA PERINGKASAN MULTI-DOKUMEN. Jurnal Ilmu Komputer, [S.l.], v. 6, n. 2, sep. 2013. ISSN 2622-321X. Available at: <https://ojs.unud.ac.id/index.php/jik/article/view/8410>. Date accessed: 20 nov. 2024.
Section
Articles

Keywords

multi-document summarization; sentence clustering; similarity based histogram clustering; sentence information density; positional text graph