nhull's picture
Update README.md
e95d452 verified
|
raw
history blame
3.43 kB
metadata
license: mit
datasets:
  - nhull/tripadvisor-split-dataset-v2
language:
  - en
pipeline_tag: text-classification
tags:
  - sentiment-analysis
  - logistic-regression
  - text-classification
  - hotel-reviews
  - tripadvisor
  - nlp

Logistic Regression Sentiment Analysis Model

This model is a Logistic Regression classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.

Model Details

  • Model Type: Logistic Regression
  • Task: Sentiment Analysis
  • Input: A hotel review (text)
  • Output: Sentiment rating (1-5 stars)
  • Dataset Used: TripAdvisor sentiment dataset (balanced labels)

Intended Use

This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.

How to Use the Model

  1. Install the required dependencies:

    pip install joblib
    
  2. Download and load the model: You can download the model from Hugging Face and use it to predict sentiment.

    Example code to download and use the model:

    from huggingface_hub import hf_hub_download
    import joblib
    
    # Download model from Hugging Face
    model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")
    
    # Load the model
    model = joblib.load(model_path)
    
    # Predict sentiment of a review
    def predict_sentiment(review):
        return model.predict([review])[0]
    
    review = "This hotel was fantastic. The service was great and the room was clean."
    print(f"Predicted sentiment: {predict_sentiment(review)}")
    
  3. The model will return a sentiment rating between 1 and 5 stars, where:

    • 1: Very bad
    • 2: Bad
    • 3: Neutral
    • 4: Good
    • 5: Very good

Model Evaluation

  • Test Accuracy: 61.05% on the test set.

  • Classification Report (Test Set):

Label Precision Recall F1-score Support
1.0 0.70 0.73 0.71 1600
2.0 0.52 0.50 0.51 1600
3.0 0.57 0.54 0.55 1600
4.0 0.55 0.54 0.55 1600
5.0 0.71 0.74 0.72 1600
Accuracy - - 0.61 8000
Macro avg 0.61 0.61 0.61 8000
Weighted avg 0.61 0.61 0.61 8000

Cross-validation Scores:

Metric Value
Logistic Regression Cross-validation scores [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526]
Logistic Regression Mean Cross-validation score 0.6080

Limitations

  • The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
  • The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
  • The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.