--- license: mit datasets: - nhull/tripadvisor-split-dataset-v2 language: - en pipeline_tag: text-classification tags: - sentiment-analysis - logistic-regression - text-classification - hotel-reviews - tripadvisor - nlp --- # Logistic Regression Sentiment Analysis Model This model is a **Logistic Regression** classifier trained on the **TripAdvisor sentiment analysis dataset**. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars. ## Model Details - **Model Type**: Logistic Regression - **Task**: Sentiment Analysis - **Input**: A hotel review (text) - **Output**: Sentiment rating (1-5 stars) - **Dataset Used**: TripAdvisor sentiment dataset (balanced labels) ## Intended Use This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review. ## How to Use the Model 1. **Install the required dependencies**: ```bash pip install joblib ``` 2. **Download and load the model**: You can download the model from Hugging Face and use it to predict sentiment. Example code to download and use the model: ```python from huggingface_hub import hf_hub_download import joblib # Download model from Hugging Face model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib") # Load the model model = joblib.load(model_path) # Predict sentiment of a review def predict_sentiment(review): return model.predict([review])[0] review = "This hotel was fantastic. The service was great and the room was clean." print(f"Predicted sentiment: {predict_sentiment(review)}") ``` 3. **The model will return a sentiment rating** between 1 and 5 stars, where: - 1: Very bad - 2: Bad - 3: Neutral - 4: Good - 5: Very good ## Model Evaluation - **Test Accuracy**: 61.05% on the test set. - **Classification Report** (Test Set): | Label | Precision | Recall | F1-score | Support | |-------|-----------|--------|----------|---------| | 1.0 | 0.70 | 0.73 | 0.71 | 1600 | | 2.0 | 0.52 | 0.50 | 0.51 | 1600 | | 3.0 | 0.57 | 0.54 | 0.55 | 1600 | | 4.0 | 0.55 | 0.54 | 0.55 | 1600 | | 5.0 | 0.71 | 0.74 | 0.72 | 1600 | | **Accuracy** | - | - | **0.61** | 8000 | | **Macro avg** | 0.61 | 0.61 | 0.61 | 8000 | | **Weighted avg** | 0.61 | 0.61 | 0.61 | 8000 | ### Cross-validation Scores: | Metric | Value | |------------------------------------|--------------------------------------------| | **Logistic Regression Cross-validation scores** | [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526] | | **Logistic Regression Mean Cross-validation score** | 0.6080 | ## Limitations - The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars). - The model was trained on the **TripAdvisor** dataset and may not generalize well to reviews from other sources or domains. - The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.