metadata

license: mit
datasets:
  - nhull/tripadvisor-split-dataset-v2
language:
  - en
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - logistic-regression
  - text-classification
  - hotel-reviews
  - tripadvisor
  - nlp

Logistic Regression Sentiment Analysis Model

This model is a Logistic Regression classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

Model Details

Model Type: Logistic Regression
Task: Sentiment Analysis
Input: A hotel review (text)
Output: Sentiment rating (1-5 stars)
Dataset Used: TripAdvisor sentiment dataset (balanced labels)

Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

How to Use the Model

Install the required dependencies:
```
pip install joblib
```

Download and load the model: You can download the model from Hugging Face and use it to predict sentiment.

Example code to download and use the model:

from huggingface_hub import hf_hub_download
import joblib

# Download model from Hugging Face
model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")

# Load the model
model = joblib.load(model_path)

# Predict sentiment of a review
def predict_sentiment(review):
    return model.predict([review])[0]

review = "This hotel was fantastic. The service was great and the room was clean."
print(f"Predicted sentiment: {predict_sentiment(review)}")

The model will return a sentiment rating between 1 and 5 stars, where:
- 1: Very bad
- 2: Bad
- 3: Neutral
- 4: Good
- 5: Very good

Model Evaluation

Test Accuracy: 61.05% on the test set.
Classification Report (Test Set):

Label	Precision	Recall	F1-score	Support
1.0	0.70	0.73	0.71	1600
2.0	0.52	0.50	0.51	1600
3.0	0.57	0.54	0.55	1600
4.0	0.55	0.54	0.55	1600
5.0	0.71	0.74	0.72	1600
Accuracy	-	-	0.61	8000
Macro avg	0.61	0.61	0.61	8000
Weighted avg	0.61	0.61	0.61	8000

Cross-validation Scores:

Logistic Regression Cross-validation scores: [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526]
Logistic Regression Mean Cross-validation score: 0.6080