nhull
/

logistic-regression-model

@@ -1,5 +1,5 @@
 ---
-license: mit
 datasets:
 - nhull/tripadvisor-split-dataset-v2
 language:
@@ -24,49 +24,38 @@ This model is a **Logistic Regression** classifier trained on the **TripAdvisor
 - **Task**: Sentiment Analysis
 - **Input**: A hotel review (text)
 - **Output**: Sentiment rating (1-5 stars)
-- **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)
 ## Intended Use
 This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
-## How to Use the Model
-1. **Install the required dependencies**:
-    ```bash
-    pip install joblib
-    ```
-2. **Download and load the model**:
-    You can download the model from Hugging Face and use it to predict sentiment.
-    Example code to download and use the model:
-    ```python
-    from huggingface_hub import hf_hub_download
-    import joblib
-    # Download model from Hugging Face
-    model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")
-    # Load the model
-    model = joblib.load(model_path)
-    # Predict sentiment of a review
-    def predict_sentiment(review):
-        return model.predict([review])[0]
-    review = "This hotel was fantastic. The service was great and the room was clean."
-    print(f"Predicted sentiment: {predict_sentiment(review)}")
-    ```
-3. **The model will return a sentiment rating** between 1 and 5 stars, where:
-   - 1: Very bad
-   - 2: Bad
-   - 3: Neutral
-   - 4: Good
-   - 5: Very good
-## Model Evaluation
 - **Test Accuracy**: 61.05% on the test set.
@@ -82,13 +71,8 @@ This model is designed to classify hotel reviews based on their sentiment. It as
 | **Accuracy** | -   | -      | **0.61**  | 8000    |
 | **Macro avg** | 0.61 | 0.61   | 0.61     | 8000    |
 | **Weighted avg** | 0.61 | 0.61 | 0.61     | 8000    |
-### Cross-validation Scores:
-| Metric                             | Value                                      |
-|------------------------------------|--------------------------------------------|
-| **Logistic Regression Cross-validation scores** | [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526] |
-| **Logistic Regression Mean Cross-validation score** | 0.6080                                     |
 ## Limitations

 ---
+license: apache-2.0
 datasets:
 - nhull/tripadvisor-split-dataset-v2
 language:
 - **Task**: Sentiment Analysis
 - **Input**: A hotel review (text)
 - **Output**: Sentiment rating (1-5 stars)
+- **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)
 ## Intended Use
 This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
+---
+**The model will return a sentiment rating** between 1 and 5 stars, where:
+   - 1: Very bad
+   - 2: Bad
+   - 3: Neutral
+   - 4: Good
+   - 5: Very good
+---
+### Dataset
+The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:
+- **Training Set**: 30,400 reviews
+- **Validation Set**: 1,600 reviews
+- **Test Set**: 8,000 reviews
+All splits are balanced across five sentiment labels.
+---
+### Test Performance
+Model predicts too high on average by `0.44`.
 - **Test Accuracy**: 61.05% on the test set.
 | **Accuracy** | -   | -      | **0.61**  | 8000    |
 | **Macro avg** | 0.61 | 0.61   | 0.61     | 8000    |
 | **Weighted avg** | 0.61 | 0.61 | 0.61     | 8000    |
+---
 ## Limitations