nhull commited on
Commit
ae1c3ae
·
verified ·
1 Parent(s): e5036f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -36
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - nhull/tripadvisor-split-dataset-v2
5
  language:
@@ -24,49 +24,38 @@ This model is a **Logistic Regression** classifier trained on the **TripAdvisor
24
  - **Task**: Sentiment Analysis
25
  - **Input**: A hotel review (text)
26
  - **Output**: Sentiment rating (1-5 stars)
27
- - **Dataset Used**: TripAdvisor sentiment dataset (balanced labels)
28
 
29
  ## Intended Use
30
 
31
  This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
32
 
33
- ## How to Use the Model
34
 
35
- 1. **Install the required dependencies**:
36
- ```bash
37
- pip install joblib
38
- ```
 
 
39
 
40
- 2. **Download and load the model**:
41
- You can download the model from Hugging Face and use it to predict sentiment.
42
 
43
- Example code to download and use the model:
44
- ```python
45
- from huggingface_hub import hf_hub_download
46
- import joblib
47
 
48
- # Download model from Hugging Face
49
- model_path = hf_hub_download(repo_id="your-username/logistic-regression-model", filename="logistic_regression_model.joblib")
50
 
51
- # Load the model
52
- model = joblib.load(model_path)
 
53
 
54
- # Predict sentiment of a review
55
- def predict_sentiment(review):
56
- return model.predict([review])[0]
57
 
58
- review = "This hotel was fantastic. The service was great and the room was clean."
59
- print(f"Predicted sentiment: {predict_sentiment(review)}")
60
- ```
61
 
62
- 3. **The model will return a sentiment rating** between 1 and 5 stars, where:
63
- - 1: Very bad
64
- - 2: Bad
65
- - 3: Neutral
66
- - 4: Good
67
- - 5: Very good
68
 
69
- ## Model Evaluation
70
 
71
  - **Test Accuracy**: 61.05% on the test set.
72
 
@@ -82,13 +71,8 @@ This model is designed to classify hotel reviews based on their sentiment. It as
82
  | **Accuracy** | - | - | **0.61** | 8000 |
83
  | **Macro avg** | 0.61 | 0.61 | 0.61 | 8000 |
84
  | **Weighted avg** | 0.61 | 0.61 | 0.61 | 8000 |
85
-
86
- ### Cross-validation Scores:
87
 
88
- | Metric | Value |
89
- |------------------------------------|--------------------------------------------|
90
- | **Logistic Regression Cross-validation scores** | [0.61463816, 0.609375, 0.62072368, 0.59703947, 0.59835526] |
91
- | **Logistic Regression Mean Cross-validation score** | 0.6080 |
92
 
93
  ## Limitations
94
 
 
1
  ---
2
+ license: apache-2.0
3
  datasets:
4
  - nhull/tripadvisor-split-dataset-v2
5
  language:
 
24
  - **Task**: Sentiment Analysis
25
  - **Input**: A hotel review (text)
26
  - **Output**: Sentiment rating (1-5 stars)
27
+ - **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)
28
 
29
  ## Intended Use
30
 
31
  This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
32
 
33
+ ---
34
 
35
+ **The model will return a sentiment rating** between 1 and 5 stars, where:
36
+ - 1: Very bad
37
+ - 2: Bad
38
+ - 3: Neutral
39
+ - 4: Good
40
+ - 5: Very good
41
 
42
+ ---
 
43
 
44
+ ### Dataset
 
 
 
45
 
46
+ The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:
 
47
 
48
+ - **Training Set**: 30,400 reviews
49
+ - **Validation Set**: 1,600 reviews
50
+ - **Test Set**: 8,000 reviews
51
 
52
+ All splits are balanced across five sentiment labels.
 
 
53
 
54
+ ---
 
 
55
 
56
+ ### Test Performance
 
 
 
 
 
57
 
58
+ Model predicts too high on average by `0.44`.
59
 
60
  - **Test Accuracy**: 61.05% on the test set.
61
 
 
71
  | **Accuracy** | - | - | **0.61** | 8000 |
72
  | **Macro avg** | 0.61 | 0.61 | 0.61 | 8000 |
73
  | **Weighted avg** | 0.61 | 0.61 | 0.61 | 8000 |
 
 
74
 
75
+ ---
 
 
 
76
 
77
  ## Limitations
78