metadata

license: mit
datasets:
  - nhull/tripadvisor-split-dataset-v2
language:
  - en
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - logistic-regression
  - text-classification
  - hotel-reviews
  - tripadvisor
  - nlp

Logistic Regression Sentiment Analysis Model

This model is a Logistic Regression classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

Model Details

Model Type: Logistic Regression
Task: Sentiment Analysis
Input: A hotel review (text)
Output: Sentiment rating (1-5 stars)
Dataset Used: TripAdvisor sentiment dataset (balanced labels)

Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

How to Use the Model

Install the required dependencies:
```
pip install joblib
```

Download and load the model: You can download the model from Hugging Face and use it to predict sentiment.

Example code to download and use the model:

from huggingface_hub import hf_hub_download
import joblib

# Download model from Hugging Face
model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")

# Load the model
model = joblib.load(model_path)

# Predict sentiment of a review
def predict_sentiment(review):
    return model.predict([review])[0]

review = "This hotel was fantastic. The service was great and the room was clean."
print(f"Predicted sentiment: {predict_sentiment(review)}")

The model will return a sentiment rating between 1 and 5 stars, where:
- 1: Very bad
- 2: Bad
- 3: Neutral
- 4: Good
- 5: Very good

Model Evaluation

Test Accuracy: 61.05% on the test set.
Classification Report (Test Set):

Label	Precision	Recall	F1-score	Support
1.0	0.70	0.73	0.71	1600
2.0	0.52	0.50	0.51	1600
3.0	0.57	0.54	0.55	1600
4.0	0.55	0.54	0.55	1600
5.0	0.71	0.74	0.72	1600
Accuracy	-	-	0.61	8000
Macro avg	0.61	0.61	0.61	8000
Weighted avg	0.61	0.61	0.61	8000

Cross-validation Scores:

Metric	Value
Logistic Regression Cross-validation scores	[0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526]
Logistic Regression Mean Cross-validation score	0.6080

Limitations

The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.