File size: 3,232 Bytes
1f34424 ae1c3ae 1f34424 f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 ae1c3ae f4d75d0 0dbcfa5 f4d75d0 0dbcfa5 7a01623 ae1c3ae e95d452 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
license: apache-2.0
datasets:
- nhull/tripadvisor-split-dataset-v2
language:
- en
pipeline_tag: text-classification
tags:
- sentiment-analysis
- logistic-regression
- text-classification
- hotel-reviews
- tripadvisor
- nlp
---
# Logistic Regression Sentiment Analysis Model
This model is a **Logistic Regression** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.
## Model Details
- **Model Type**: Logistic Regression
- **Task**: Sentiment Analysis
- **Input**: A hotel review (text)
- **Output**: Sentiment rating (1-5 stars)
- **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)
## Intended Use
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
---
**The model will return a sentiment rating** between 1 and 5 stars, where:
- 1: Very bad
- 2: Bad
- 3: Neutral
- 4: Good
- 5: Very good
---
### Dataset
The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:
- **Training Set**: 30,400 reviews
- **Validation Set**: 1,600 reviews
- **Test Set**: 8,000 reviews
All splits are balanced across five sentiment labels.
---
### Test Performance
Model predicts too high on average by `0.44`.
- **Test Accuracy**: 61.05% on the test set.
- **Classification Report**:
| Label | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| 1.0 | 0.70 | 0.73 | 0.71 | 1600 |
| 2.0 | 0.52 | 0.50 | 0.51 | 1600 |
| 3.0 | 0.57 | 0.54 | 0.55 | 1600 |
| 4.0 | 0.55 | 0.54 | 0.55 | 1600 |
| 5.0 | 0.71 | 0.74 | 0.72 | 1600 |
| **Accuracy** | - | - | **0.61** | 8000 |
| **Macro avg** | 0.61 | 0.61 | 0.61 | 8000 |
| **Weighted avg** | 0.61 | 0.61 | 0.61 | 8000 |
- **Confusion Matrix**:
| True \\ Predicted | 1 | 2 | 3 | 4 | 5 |
|-------------------|-------|-------|-------|-------|-------|
| 1 | 1165 | 384 | 41 | 3 | 7 |
| 2 | 432 | 805 | 315 | 31 | 17 |
| 3 | 61 | 314 | 857 | 311 | 57 |
| 4 | 3 | 48 | 264 | 870 | 415 |
| 5 | 6 | 10 | 32 | 365 | 1187 |
---
## Files Included
- **`validation_results_log_regression.csv`**: Contains correctly classified reviews with their real and predicted labels.
---
## Limitations
- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions. |