Pemodelan Topik Pada Ulasan Hotel Menggunakan Metode BERTopic Dengan Prosedur c-TF-IDF
Abstract
User review data on travel guidance services can be useful textual data for other users. By knowing what topics are discussed in user reviews in hotel products, travel guidance service providers can group these reviews based on the topics discussed. In grouping textual data into several topics, the use of topic modeling methods can be done. In this study, the author uses the BERTopic method in modeling topics on user review data related to hotel products on one of the TripAdvisor travel guidance services. This study uses secondary data in the form of hotel reviews on the TripAdvisor site. Topic modeling with BERTopic begins with document embedding, dimensionality reduction (UMAP), clustering (HDBSCAN), and c-TF-IDF. Topic modeling using the BERTopic method resulted in 78 topics with a topic coherence value of 0.07287 and a topic diversity of 0.496154. The lower the number of topics to be generated, the value of topic coherence and topic diversity decreases
References
[2] Taecharungroj, V., “An Analysis of TripAdvisor Reviews of 127 Urban Rail Transit Networks Worldwide” Travel Behaviour and Society, vol. 26, p. 193-205, 2022.
[3] Putranto, Y., Sartono, B., dan Djuraidah, A., “Topic Modelling And Hotel Rating Prediction Based on Customer Review in Indonesia” International Journal of Management and Decision Making, vol. 20, no. 3, p. 282-307, 2021.
[4] Hendry, D., Darari, F., Nurfadillah, R., Khanna, G., Sun, M., Condylis, P. C., dan Taufik, N., “Topic Modeling for Customer Service Chats” International Conference on Advanced Computer Science and Information Systems (ICACSIS), p. 1-6, 2021.
[5] Alam, M. H., Ryu, W.-J., Lee, S., “Joint Multi-Grain Topic Sentiment: Modeling Semantic Aspects for Online Reviews” Information Sciences, vol. 339, p. 206–223, 2016.
[6] Grootendorst, M., “BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure” arXiv preprint arXiv:2203.05794, 2022.
[7] McInnes, L., Healy, J., & Melville, J., “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction” arXiv preprint arXiv:1802.03426, 2018.
[8] Allaoui, M., Kherfi, M. L., dan Cheriet, A., “Considerably improving clustering algorithms using umap dimensionality reduction technique: A comparative study” International Conference on Image and Signal Processing, p. 317–325, 2020.
[9] George, Shini, “Comparison of LDA and NMF Topic Modeling Techniques for Restaurant Reviews” Indian Journal of Natural Sciences, vol. 10, no. 6, p. 28210-28216, 2020.
[10] Terragni, S., Fersini, E., Galuzzi, B. G., Tropeano, P., dan Candelieri, A., “OCTIS: Comparing and Optimizing Topic Models is Simple!” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021, p. 263–270.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, the copyright of the article shall be assigned to JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya) as the publisher of the journal. Copyright encompasses exclusive rights to reproduce and deliver the article in all forms and media, as well as translations. The reproduction of any part of this journal (printed or online) will be allowed only with written permission from JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya). The Editorial Board of JNATIA (Jurnal Nasional Teknologi Informasi dan Aplikasinya) makes every effort to ensure that no wrong or misleading data, opinions, or statements be published in the journal.
This work is licensed under a Creative Commons Attribution 4.0 International License.