boltuix
/

NeuroFeel

Model card Files Files and versions

xet

Community

boltuix commited on May 25

Commit

a59f42d

verified ·

1 Parent(s): 86247af

Update README.md

Browse files

Files changed (1) hide show

README.md +90 -169

README.md CHANGED Viewed

@@ -231,108 +231,7 @@ Confidence: 85.63%
 *Note*: Fine-tune the model for domain-specific tasks to boost accuracy.
-## Evaluation
-NeuroFeel was evaluated on an emotion classification task using 13 short-text samples relevant to IoT, social media, and mental health contexts. The model predicts one of 13 emotion labels, with success defined as the correct label being predicted.
-### Test Sentences
-| Sentence | Expected Emotion |
-|----------|------------------|
-| I love you so much! | Love |
-| This is absolutely disgusting! | Disgust |
-| I'm so happy with my new phone! | Happiness |
-| Why does this always break? | Anger |
-| I feel so alone right now. | Sadness |
-| What just happened?! | Surprise |
-| I'm terrified of this update failing. | Fear |
-| Meh, it's just okay. | Neutral |
-| I shouldn't have said that. | Shame |
-| I feel bad for forgetting. | Guilt |
-| Wait, what does this mean? | Confusion |
-| I really want that new gadget! | Desire |
-| Oh sure, like that's gonna work. | Sarcasm |
-### Evaluation Code
-```python
-from transformers import pipeline
-# Load the fine-tuned NeuroFeel model
-sentiment_analysis = pipeline("text-classification", model="boltuix/NeuroFeel")
-# Define label-to-emoji mapping
-label_to_emoji = {
-    "Sadness": "😢",
-    "Anger": "😠",
-    "Love": "❤️",
-    "Surprise": "😲",
-    "Fear": "😱",
-    "Happiness": "😄",
-    "Neutral": "😐",
-    "Disgust": "🤢",
-    "Shame": "🙈",
-    "Guilt": "😔",
-    "Confusion": "😕",
-    "Desire": "🔥",
-    "Sarcasm": "😏"
-}
-# Test data
-tests = [
-    ("I love you so much!", "Love"),
-    ("This is absolutely disgusting!", "Disgust"),
-    ("I'm so happy with my new phone!", "Happiness"),
-    ("Why does this always break?", "Anger"),
-    ("I feel so alone right now.", "Sadness"),
-    ("What just happened?!", "Surprise"),
-    ("I'm terrified of this update failing.", "Fear"),
-    ("Meh, it's just okay.", "Neutral"),
-    ("I shouldn't have said that.", "Shame"),
-    ("I feel bad for forgetting.", "Guilt"),
-    ("Wait, what does this mean?", "Confusion"),
-    ("I really want that new gadget!", "Desire"),
-    ("Oh sure, like that's gonna work.", "Sarcasm")
-]
-results = []
-# Run tests
-for text, expected in tests:
-    result = sentiment_analysis(text)[0]
-    predicted = result["label"].capitalize()
-    confidence = result["score"]
-    emoji = label_to_emoji.get(predicted, "❓")
-    results.append({
-        "sentence": text,
-        "expected": expected,
-        "predicted": predicted,
-        "confidence": confidence,
-        "emoji": emoji,
-        "pass": predicted == expected
-    })
-# Print results
-for r in results:
-    status = "✅ PASS" if r["pass"] else "❌ FAIL"
-    print(f"\n🔍 {r['sentence']}")
-    print(f"🎯 Expected: {r['expected']}")
-    print(f"🔝 Predicted: {r['predicted']} {r['emoji']} (Confidence: {r['confidence']:.4f})")
-    print(status)
-# Summary
-pass_count = sum(r["pass"] for r in results)
-print(f"\n🎯 Total Passed: {pass_count}/{len(tests)}")
-```
-### Sample Results (Hypothetical)
-- **Sentence**: I love you so much!
-  **Expected**: Love
-  **Predicted**: Love ❤️ (Confidence: 0.8563)
-  **Result**: ✅ PASS
-- **Sentence**: I feel so alone right now.
-  **Expected**: Sadness
-  **Predicted**: Sadness 😢 (Confidence: 0.8021)
-  **Result**: ✅ PASS
-- **Total Passed**: ~12/13 (varies with fine-tuning).
 NeuroFeel excels in classifying a wide range of emotions in short texts, particularly in IoT, social media, and mental health contexts. Fine-tuning enhances performance on subtle emotions like Sarcasm or Shame.
@@ -408,86 +307,108 @@ To adapt NeuroFeel for custom emotion detection tasks:
 1. **Prepare Dataset**: Collect labeled data with 13 emotion categories.
 2. **Fine-Tune with Hugging Face**:
    ```python
-    # !pip install transformers datasets torch --upgrade
-    import torch
-    from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
-    from datasets import Dataset
     import pandas as pd
-    # 1. Prepare the sample emotion dataset
-    data = {
-        "text": [
-            "I love you so much!",
-            "This is absolutely disgusting!",
-            "I'm so happy with my new phone!",
-            "Why does this always break?",
-            "I feel so alone right now."
-        ],
-        "label": [2, 7, 5, 1, 0]  # Emotions: 0 to 12
-    }
-    df = pd.DataFrame(data)
-    dataset = Dataset.from_pandas(df)
-    # 2. Load tokenizer and model
-    model_name = "boltuix/NeuroFeel"
-    tokenizer = AutoTokenizer.from_pretrained(model_name)
-    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=13)
-    # 3. Tokenize the dataset
-    def tokenize_function(examples):
-        return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)
-    tokenized_dataset = dataset.map(tokenize_function, batched=True)
-    # 4. Manually convert all fields to PyTorch tensors
-    def to_torch_format(example):
-        return {
-            "input_ids": torch.tensor(example["input_ids"]),
-            "attention_mask": torch.tensor(example["attention_mask"]),
-            "label": torch.tensor(example["label"])
-        }
-    tokenized_dataset = tokenized_dataset.map(to_torch_format)
-    # 5. Define training arguments
     training_args = TrainingArguments(
-        output_dir="./neurofeel_results",
         num_train_epochs=5,
         per_device_train_batch_size=16,
-        logging_dir="./neurofeel_logs",
         logging_steps=10,
-        save_steps=100,
-        eval_strategy="no",
-        learning_rate=2e-5,
         report_to="none"
     )
-    # 6. Initialize Trainer
     trainer = Trainer(
         model=model,
         args=training_args,
-        train_dataset=tokenized_dataset,
     )
-    # 7. Fine-tune the model
     trainer.train()
-    # 8. Save the fine-tuned model
-    model.save_pretrained("./fine_tuned_neurofeel")
-    tokenizer.save_pretrained("./fine_tuned_neurofeel")
-    # 9. Example inference
-    text = "I'm thrilled with the update!"
-    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
-    model.eval()
-    with torch.no_grad():
-        outputs = model(**inputs)
-        logits = outputs.logits
-        predicted_class = torch.argmax(logits, dim=1).item()
-    labels = ["Sadness", "Anger", "Love", "Surprise", "Fear", "Happiness", "Neutral", "Disgust", "Shame", "Guilt", "Confusion", "Desire", "Sarcasm"]
-    print(f"Predicted emotion for '{text}': {labels[predicted_class]}")
    ```
 3. **Deploy**: Export to ONNX or TensorFlow Lite for edge devices.

 *Note*: Fine-tune the model for domain-specific tasks to boost accuracy.
 NeuroFeel excels in classifying a wide range of emotions in short texts, particularly in IoT, social media, and mental health contexts. Fine-tuning enhances performance on subtle emotions like Sarcasm or Shame.
 1. **Prepare Dataset**: Collect labeled data with 13 emotion categories.
 2. **Fine-Tune with Hugging Face**:
    ```python
     import pandas as pd
+    from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
+    from sklearn.model_selection import train_test_split
+    import torch
+    from torch.utils.data import Dataset
+    # === 1. Load and preprocess data ===
+    dataset_path = '/content/dataset.csv'
+    df = pd.read_csv(dataset_path)
+    # Use the correct original column name 'Label' in dropna
+    df = df.dropna(subset=['Label'])  # Ensure no missing labels
+    df.columns = ['text', 'label']  # Normalize column names
+    # === 2. Encode labels ===
+    labels = sorted(df["label"].unique())
+    label_to_id = {label: idx for idx, label in enumerate(labels)}
+    id_to_label = {idx: label for label, idx in label_to_id.items()}
+    df['label'] = df['label'].map(label_to_id)
+    # === 3. Train/val split ===
+    train_texts, val_texts, train_labels, val_labels = train_test_split(
+        df['text'].tolist(), df['label'].tolist(), test_size=0.2, random_state=42
+    )
+    # === 4. Tokenizer ===
+    tokenizer = BertTokenizer.from_pretrained("boltuix/NeuroBERT-Pro")
+    # === 5. Dataset class ===
+    class SentimentDataset(Dataset):
+        def __init__(self, texts, labels, tokenizer, max_length=128):
+            self.texts = texts
+            self.labels = labels
+            self.tokenizer = tokenizer
+            self.max_length = max_length
+        def __len__(self):
+            return len(self.texts)
+        def __getitem__(self, idx):
+            encoding = self.tokenizer(
+                self.texts[idx],
+                padding='max_length',
+                truncation=True,
+                max_length=self.max_length,
+                return_tensors='pt'
+            )
+            return {
+                'input_ids': encoding['input_ids'].squeeze(0),
+                'attention_mask': encoding['attention_mask'].squeeze(0),
+                'labels': torch.tensor(self.labels[idx], dtype=torch.long)
+            }
+    # === 6. Load datasets ===
+    train_dataset = SentimentDataset(train_texts, train_labels, tokenizer)
+    val_dataset = SentimentDataset(val_texts, val_labels, tokenizer)
+    # === 7. Load model ===
+    model = BertForSequenceClassification.from_pretrained(
+        "boltuix/NeuroBERT-Pro",
+        num_labels=len(label_to_id)
+    )
+    # Optional: Ensure tensor layout is contiguous
+    for param in model.parameters():
+        param.data = param.data.contiguous()
+    # === 8. Training arguments ===
     training_args = TrainingArguments(
+        output_dir='./results',
+        run_name="NeuroFeel",
         num_train_epochs=5,
         per_device_train_batch_size=16,
+        per_device_eval_batch_size=16,
+        warmup_steps=500,
+        weight_decay=0.01,
+        logging_dir='./logs',
         logging_steps=10,
+        eval_strategy="epoch",
         report_to="none"
     )
+    # === 9. Trainer setup ===
     trainer = Trainer(
         model=model,
         args=training_args,
+        train_dataset=train_dataset,
+        eval_dataset=val_dataset
     )
+    # === 10. Train and evaluate ===
     trainer.train()
+    trainer.evaluate()
+    # === 11. Save model and label mappings ===
+    model.config.label2id = label_to_id
+    model.config.id2label = id_to_label
+    model.config.num_labels = len(label_to_id)
+    model.save_pretrained("./neuro-feel")
+    tokenizer.save_pretrained("./neuro-feel")
+    print("✅ Training complete. Model and tokenizer saved to ./neuro-feel")
    ```
 3. **Deploy**: Export to ONNX or TensorFlow Lite for edge devices.