boltuix commited on
Commit
a59f42d
·
verified ·
1 Parent(s): 86247af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -169
README.md CHANGED
@@ -231,108 +231,7 @@ Confidence: 85.63%
231
 
232
  *Note*: Fine-tune the model for domain-specific tasks to boost accuracy.
233
 
234
- ## Evaluation
235
-
236
- NeuroFeel was evaluated on an emotion classification task using 13 short-text samples relevant to IoT, social media, and mental health contexts. The model predicts one of 13 emotion labels, with success defined as the correct label being predicted.
237
-
238
- ### Test Sentences
239
- | Sentence | Expected Emotion |
240
- |----------|------------------|
241
- | I love you so much! | Love |
242
- | This is absolutely disgusting! | Disgust |
243
- | I'm so happy with my new phone! | Happiness |
244
- | Why does this always break? | Anger |
245
- | I feel so alone right now. | Sadness |
246
- | What just happened?! | Surprise |
247
- | I'm terrified of this update failing. | Fear |
248
- | Meh, it's just okay. | Neutral |
249
- | I shouldn't have said that. | Shame |
250
- | I feel bad for forgetting. | Guilt |
251
- | Wait, what does this mean? | Confusion |
252
- | I really want that new gadget! | Desire |
253
- | Oh sure, like that's gonna work. | Sarcasm |
254
-
255
- ### Evaluation Code
256
- ```python
257
- from transformers import pipeline
258
 
259
- # Load the fine-tuned NeuroFeel model
260
- sentiment_analysis = pipeline("text-classification", model="boltuix/NeuroFeel")
261
-
262
- # Define label-to-emoji mapping
263
- label_to_emoji = {
264
- "Sadness": "😢",
265
- "Anger": "😠",
266
- "Love": "❤️",
267
- "Surprise": "😲",
268
- "Fear": "😱",
269
- "Happiness": "😄",
270
- "Neutral": "😐",
271
- "Disgust": "🤢",
272
- "Shame": "🙈",
273
- "Guilt": "😔",
274
- "Confusion": "😕",
275
- "Desire": "🔥",
276
- "Sarcasm": "😏"
277
- }
278
-
279
- # Test data
280
- tests = [
281
- ("I love you so much!", "Love"),
282
- ("This is absolutely disgusting!", "Disgust"),
283
- ("I'm so happy with my new phone!", "Happiness"),
284
- ("Why does this always break?", "Anger"),
285
- ("I feel so alone right now.", "Sadness"),
286
- ("What just happened?!", "Surprise"),
287
- ("I'm terrified of this update failing.", "Fear"),
288
- ("Meh, it's just okay.", "Neutral"),
289
- ("I shouldn't have said that.", "Shame"),
290
- ("I feel bad for forgetting.", "Guilt"),
291
- ("Wait, what does this mean?", "Confusion"),
292
- ("I really want that new gadget!", "Desire"),
293
- ("Oh sure, like that's gonna work.", "Sarcasm")
294
- ]
295
-
296
- results = []
297
-
298
- # Run tests
299
- for text, expected in tests:
300
- result = sentiment_analysis(text)[0]
301
- predicted = result["label"].capitalize()
302
- confidence = result["score"]
303
- emoji = label_to_emoji.get(predicted, "❓")
304
- results.append({
305
- "sentence": text,
306
- "expected": expected,
307
- "predicted": predicted,
308
- "confidence": confidence,
309
- "emoji": emoji,
310
- "pass": predicted == expected
311
- })
312
-
313
- # Print results
314
- for r in results:
315
- status = "✅ PASS" if r["pass"] else "❌ FAIL"
316
- print(f"\n🔍 {r['sentence']}")
317
- print(f"🎯 Expected: {r['expected']}")
318
- print(f"🔝 Predicted: {r['predicted']} {r['emoji']} (Confidence: {r['confidence']:.4f})")
319
- print(status)
320
-
321
- # Summary
322
- pass_count = sum(r["pass"] for r in results)
323
- print(f"\n🎯 Total Passed: {pass_count}/{len(tests)}")
324
- ```
325
-
326
- ### Sample Results (Hypothetical)
327
- - **Sentence**: I love you so much!
328
- **Expected**: Love
329
- **Predicted**: Love ❤️ (Confidence: 0.8563)
330
- **Result**: ✅ PASS
331
- - **Sentence**: I feel so alone right now.
332
- **Expected**: Sadness
333
- **Predicted**: Sadness 😢 (Confidence: 0.8021)
334
- **Result**: ✅ PASS
335
- - **Total Passed**: ~12/13 (varies with fine-tuning).
336
 
337
  NeuroFeel excels in classifying a wide range of emotions in short texts, particularly in IoT, social media, and mental health contexts. Fine-tuning enhances performance on subtle emotions like Sarcasm or Shame.
338
 
@@ -408,86 +307,108 @@ To adapt NeuroFeel for custom emotion detection tasks:
408
  1. **Prepare Dataset**: Collect labeled data with 13 emotion categories.
409
  2. **Fine-Tune with Hugging Face**:
410
  ```python
411
- # !pip install transformers datasets torch --upgrade
412
-
413
- import torch
414
- from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
415
- from datasets import Dataset
416
  import pandas as pd
417
-
418
- # 1. Prepare the sample emotion dataset
419
- data = {
420
- "text": [
421
- "I love you so much!",
422
- "This is absolutely disgusting!",
423
- "I'm so happy with my new phone!",
424
- "Why does this always break?",
425
- "I feel so alone right now."
426
- ],
427
- "label": [2, 7, 5, 1, 0] # Emotions: 0 to 12
428
- }
429
- df = pd.DataFrame(data)
430
- dataset = Dataset.from_pandas(df)
431
-
432
- # 2. Load tokenizer and model
433
- model_name = "boltuix/NeuroFeel"
434
- tokenizer = AutoTokenizer.from_pretrained(model_name)
435
- model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=13)
436
-
437
- # 3. Tokenize the dataset
438
- def tokenize_function(examples):
439
- return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)
440
-
441
- tokenized_dataset = dataset.map(tokenize_function, batched=True)
442
-
443
- # 4. Manually convert all fields to PyTorch tensors
444
- def to_torch_format(example):
445
- return {
446
- "input_ids": torch.tensor(example["input_ids"]),
447
- "attention_mask": torch.tensor(example["attention_mask"]),
448
- "label": torch.tensor(example["label"])
449
- }
450
-
451
- tokenized_dataset = tokenized_dataset.map(to_torch_format)
452
-
453
- # 5. Define training arguments
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
454
  training_args = TrainingArguments(
455
- output_dir="./neurofeel_results",
 
456
  num_train_epochs=5,
457
  per_device_train_batch_size=16,
458
- logging_dir="./neurofeel_logs",
 
 
 
459
  logging_steps=10,
460
- save_steps=100,
461
- eval_strategy="no",
462
- learning_rate=2e-5,
463
  report_to="none"
464
  )
465
-
466
- # 6. Initialize Trainer
467
  trainer = Trainer(
468
  model=model,
469
  args=training_args,
470
- train_dataset=tokenized_dataset,
 
471
  )
472
-
473
- # 7. Fine-tune the model
474
  trainer.train()
475
-
476
- # 8. Save the fine-tuned model
477
- model.save_pretrained("./fine_tuned_neurofeel")
478
- tokenizer.save_pretrained("./fine_tuned_neurofeel")
479
-
480
- # 9. Example inference
481
- text = "I'm thrilled with the update!"
482
- inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
483
- model.eval()
484
- with torch.no_grad():
485
- outputs = model(**inputs)
486
- logits = outputs.logits
487
- predicted_class = torch.argmax(logits, dim=1).item()
488
-
489
- labels = ["Sadness", "Anger", "Love", "Surprise", "Fear", "Happiness", "Neutral", "Disgust", "Shame", "Guilt", "Confusion", "Desire", "Sarcasm"]
490
- print(f"Predicted emotion for '{text}': {labels[predicted_class]}")
491
  ```
492
  3. **Deploy**: Export to ONNX or TensorFlow Lite for edge devices.
493
 
 
231
 
232
  *Note*: Fine-tune the model for domain-specific tasks to boost accuracy.
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
235
 
236
  NeuroFeel excels in classifying a wide range of emotions in short texts, particularly in IoT, social media, and mental health contexts. Fine-tuning enhances performance on subtle emotions like Sarcasm or Shame.
237
 
 
307
  1. **Prepare Dataset**: Collect labeled data with 13 emotion categories.
308
  2. **Fine-Tune with Hugging Face**:
309
  ```python
 
 
 
 
 
310
  import pandas as pd
311
+ from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
312
+ from sklearn.model_selection import train_test_split
313
+ import torch
314
+ from torch.utils.data import Dataset
315
+
316
+ # === 1. Load and preprocess data ===
317
+ dataset_path = '/content/dataset.csv'
318
+ df = pd.read_csv(dataset_path)
319
+ # Use the correct original column name 'Label' in dropna
320
+ df = df.dropna(subset=['Label']) # Ensure no missing labels
321
+ df.columns = ['text', 'label'] # Normalize column names
322
+
323
+ # === 2. Encode labels ===
324
+ labels = sorted(df["label"].unique())
325
+ label_to_id = {label: idx for idx, label in enumerate(labels)}
326
+ id_to_label = {idx: label for label, idx in label_to_id.items()}
327
+ df['label'] = df['label'].map(label_to_id)
328
+
329
+ # === 3. Train/val split ===
330
+ train_texts, val_texts, train_labels, val_labels = train_test_split(
331
+ df['text'].tolist(), df['label'].tolist(), test_size=0.2, random_state=42
332
+ )
333
+
334
+ # === 4. Tokenizer ===
335
+ tokenizer = BertTokenizer.from_pretrained("boltuix/NeuroBERT-Pro")
336
+
337
+ # === 5. Dataset class ===
338
+ class SentimentDataset(Dataset):
339
+ def __init__(self, texts, labels, tokenizer, max_length=128):
340
+ self.texts = texts
341
+ self.labels = labels
342
+ self.tokenizer = tokenizer
343
+ self.max_length = max_length
344
+
345
+ def __len__(self):
346
+ return len(self.texts)
347
+
348
+ def __getitem__(self, idx):
349
+ encoding = self.tokenizer(
350
+ self.texts[idx],
351
+ padding='max_length',
352
+ truncation=True,
353
+ max_length=self.max_length,
354
+ return_tensors='pt'
355
+ )
356
+ return {
357
+ 'input_ids': encoding['input_ids'].squeeze(0),
358
+ 'attention_mask': encoding['attention_mask'].squeeze(0),
359
+ 'labels': torch.tensor(self.labels[idx], dtype=torch.long)
360
+ }
361
+
362
+ # === 6. Load datasets ===
363
+ train_dataset = SentimentDataset(train_texts, train_labels, tokenizer)
364
+ val_dataset = SentimentDataset(val_texts, val_labels, tokenizer)
365
+
366
+ # === 7. Load model ===
367
+ model = BertForSequenceClassification.from_pretrained(
368
+ "boltuix/NeuroBERT-Pro",
369
+ num_labels=len(label_to_id)
370
+ )
371
+
372
+ # Optional: Ensure tensor layout is contiguous
373
+ for param in model.parameters():
374
+ param.data = param.data.contiguous()
375
+
376
+ # === 8. Training arguments ===
377
  training_args = TrainingArguments(
378
+ output_dir='./results',
379
+ run_name="NeuroFeel",
380
  num_train_epochs=5,
381
  per_device_train_batch_size=16,
382
+ per_device_eval_batch_size=16,
383
+ warmup_steps=500,
384
+ weight_decay=0.01,
385
+ logging_dir='./logs',
386
  logging_steps=10,
387
+ eval_strategy="epoch",
 
 
388
  report_to="none"
389
  )
390
+
391
+ # === 9. Trainer setup ===
392
  trainer = Trainer(
393
  model=model,
394
  args=training_args,
395
+ train_dataset=train_dataset,
396
+ eval_dataset=val_dataset
397
  )
398
+
399
+ # === 10. Train and evaluate ===
400
  trainer.train()
401
+ trainer.evaluate()
402
+
403
+ # === 11. Save model and label mappings ===
404
+ model.config.label2id = label_to_id
405
+ model.config.id2label = id_to_label
406
+ model.config.num_labels = len(label_to_id)
407
+
408
+ model.save_pretrained("./neuro-feel")
409
+ tokenizer.save_pretrained("./neuro-feel")
410
+
411
+ print("✅ Training complete. Model and tokenizer saved to ./neuro-feel")
 
 
 
 
 
412
  ```
413
  3. **Deploy**: Export to ONNX or TensorFlow Lite for edge devices.
414