Sentiment Analysis of Hotel Reviews Using Logistic Regression and Random Forest Methods
Abstract
The hospitality industry plays a crucial role in tourism. The internet has changed how customers choose hotels, with online reviews becoming a key reference. Sentiment analysis of reviews helps understand customer preferences and satisfaction, supporting sustainable strategies. This study compares Logistic Regression and Random Forest models using 3,000 training data from 149 hotels in the top 15 destinations on TripAdvisor as of September 2022. Results show Logistic Regression achieved higher accuracy (94.33%) than Random Forest (91.33%), making it the preferred model for classifying 551,294 reviews into positive, negative, or neutral sentiments. Customer sentiment declined during the early 2020 pandemic but improved post-pandemic in 2022, supported by global campaigns and the preservation of tourist attractions. Culinary experiences and festivals also drew visitors. These findings aim to assist hotel practitioners and policymakers in improving services and strategies in the hospitality and tourism sectors