PENDEKATAN POSITIONAL TEXT GRAPH UNTUK PEMILIHAN KALIMAT REPRESENTATIF CLUSTER PADA PERINGKASAN MULTI-DOKUMEN

I Putu Gede Hendra Suputra; Agus Zainal Arifin; Anny Yuniarti

I Putu Gede Hendra Suputra Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember (ITS) Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Agus Zainal Arifin Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia
Anny Yuniarti Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Abstract

Coverage and saliency are major problems in Automatic Text Summarization. Sentence clusteringapproaches are methods able to provide good coverage on all topics, but the point to be considered is theselection of important sentence that can represent the cluster’s topic. The salient sentences selected asconstituent to the final summary should have information density so that can convey important informationcontained in the cluster. Information density from the sentence can be mined by extracting the sentenceinformation density (SID) feature that built from positional text graph approach of every sentence in the cluster.This paper proposed a cluster representative sentence selection strategy that used the positional text graphapproach in multi-document summarization. There are three concepts that used in this paper: (1) sentenceclustering based on similarity based histogram clustering, (2) cluster ordering based on cluster importance and(3) representative sentence selection based on sentence information density feature score. The candidatesummary sentence is a sentence that has greatest sentence information density feature score of a cluster. Trialsconducted on task 2 DUC 2004 dataset. ROUGE-1 measurement was used as performance metric to comparethe use of SID feature with other method namely Local Importance and Global Importance (LIGI). Test resultshowed that the use of SID feature was successfully outperform LIGI method based on ROUGE-1 values wherethe greatest average value of ROUGE-1 that achieved by SID features is 0.3915.

Downloads

Download data is not yet available.

Author Biographies

I Putu Gede Hendra Suputra, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember (ITS) Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Agus Zainal Arifin, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Anny Yuniarti, Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia

PENDEKATAN POSITIONAL TEXT GRAPH UNTUK PEMILIHAN KALIMAT REPRESENTATIF CLUSTER PADA PERINGKASAN MULTI-DOKUMEN

Abstract

Downloads

Author Biographies

Keywords

Submissions

Policies

People

Journal Accreditation

Other