Text Classification
Transformers
Safetensors
English
bert
fill-mask
BERT
bert-mini
transformer
pre-training
nlp
tiny-bert
edge-ai
low-resource
micro-nlp
quantized
general-purpose
offline-assistant
intent-detection
real-time
embedded-systems
command-classification
voice-ai
eco-ai
english
lightweight
mobile-nlp
ner
semantic-search
contextual-ai
smart-devices
wearable-ai
privacy-first
Update README.md
Browse files
README.md
CHANGED
|
@@ -132,8 +132,8 @@ from transformers import pipeline
|
|
| 132 |
mlm_pipeline = pipeline("fill-mask", model="boltuix/bert-mini")
|
| 133 |
|
| 134 |
# Test example
|
| 135 |
-
result = mlm_pipeline("
|
| 136 |
-
print(result[0]["sequence"]) # Example output: "
|
| 137 |
```
|
| 138 |
|
| 139 |
## Quickstart: Text Classification
|
|
@@ -319,73 +319,76 @@ To adapt `bert-mini` for custom tasks (e.g., specific IoT commands):
|
|
| 319 |
1. **Prepare Dataset**: Collect labeled data (e.g., commands with intents or masked sentences).
|
| 320 |
2. **Fine-Tune with Hugging Face**:
|
| 321 |
```python
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
| 350 |
-
|
| 351 |
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
|
| 362 |
-
|
| 363 |
-
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
|
| 370 |
-
|
| 371 |
-
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
|
| 376 |
-
|
| 377 |
-
|
| 378 |
-
|
| 379 |
-
|
| 380 |
-
|
| 381 |
-
|
| 382 |
-
|
| 383 |
-
|
| 384 |
-
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
|
| 388 |
-
|
|
|
|
|
|
|
|
|
|
| 389 |
```
|
| 390 |
3. **Deploy**: Export to ONNX or TensorFlow Lite for edge devices.
|
| 391 |
|
|
|
|
| 132 |
mlm_pipeline = pipeline("fill-mask", model="boltuix/bert-mini")
|
| 133 |
|
| 134 |
# Test example
|
| 135 |
+
result = mlm_pipeline("The train arrived at the [MASK] on time.")
|
| 136 |
+
print(result[0]["sequence"]) # Example output: "The train arrived at the station on time."
|
| 137 |
```
|
| 138 |
|
| 139 |
## Quickstart: Text Classification
|
|
|
|
| 319 |
1. **Prepare Dataset**: Collect labeled data (e.g., commands with intents or masked sentences).
|
| 320 |
2. **Fine-Tune with Hugging Face**:
|
| 321 |
```python
|
| 322 |
+
# Install the datasets library
|
| 323 |
+
!pip install datasets
|
| 324 |
+
import torch
|
| 325 |
+
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
|
| 326 |
+
from datasets import Dataset
|
| 327 |
+
import pandas as pd
|
| 328 |
+
|
| 329 |
+
# Prepare sample dataset
|
| 330 |
+
data = {
|
| 331 |
+
"text": [
|
| 332 |
+
"Turn on the fan",
|
| 333 |
+
"Switch off the light",
|
| 334 |
+
"Invalid command",
|
| 335 |
+
"Activate the air conditioner",
|
| 336 |
+
"Turn off the heater",
|
| 337 |
+
"Gibberish input"
|
| 338 |
+
],
|
| 339 |
+
"label": [1, 1, 0, 1, 1, 0] # 1 for valid IoT commands, 0 for invalid
|
| 340 |
+
}
|
| 341 |
+
df = pd.DataFrame(data)
|
| 342 |
+
dataset = Dataset.from_pandas(df)
|
| 343 |
+
|
| 344 |
+
# Load tokenizer and model
|
| 345 |
+
model_name = "boltuix/bert-mini"
|
| 346 |
+
tokenizer = BertTokenizer.from_pretrained(model_name)
|
| 347 |
+
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
|
| 348 |
+
|
| 349 |
+
# Tokenize dataset
|
| 350 |
+
def tokenize_function(examples):
|
| 351 |
+
return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=64)
|
| 352 |
+
|
| 353 |
+
tokenized_dataset = dataset.map(tokenize_function, batched=True)
|
| 354 |
+
tokenized_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])
|
| 355 |
+
|
| 356 |
+
# Define training arguments
|
| 357 |
+
training_args = TrainingArguments(
|
| 358 |
+
output_dir="./bert_mini_results",
|
| 359 |
+
num_train_epochs=5,
|
| 360 |
+
per_device_train_batch_size=2,
|
| 361 |
+
logging_dir="./bert_mini_logs",
|
| 362 |
+
logging_steps=10,
|
| 363 |
+
save_steps=100,
|
| 364 |
+
# Changed evaluation_strategy to eval_strategy
|
| 365 |
+
eval_strategy="no", # Use 'no', 'steps', or 'epoch'
|
| 366 |
+
learning_rate=3e-5,
|
| 367 |
+
)
|
| 368 |
+
|
| 369 |
+
# Initialize Trainer
|
| 370 |
+
trainer = Trainer(
|
| 371 |
+
model=model,
|
| 372 |
+
args=training_args,
|
| 373 |
+
train_dataset=tokenized_dataset,
|
| 374 |
+
)
|
| 375 |
+
|
| 376 |
+
# Fine-tune
|
| 377 |
+
trainer.train()
|
| 378 |
+
|
| 379 |
+
# Save model
|
| 380 |
+
model.save_pretrained("./fine_tuned_bert_mini")
|
| 381 |
+
tokenizer.save_pretrained("./fine_tuned_bert_mini")
|
| 382 |
+
|
| 383 |
+
# Example inference
|
| 384 |
+
text = "Turn on the light"
|
| 385 |
+
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
|
| 386 |
+
model.eval()
|
| 387 |
+
with torch.no_grad():
|
| 388 |
+
outputs = model(**inputs)
|
| 389 |
+
logits = outputs.logits
|
| 390 |
+
predicted_class = torch.argmax(logits, dim=1).item()
|
| 391 |
+
print(f"Predicted class for '{text}': {'Valid IoT Command' if predicted_class == 1 else 'Invalid Command'}")
|
| 392 |
```
|
| 393 |
3. **Deploy**: Export to ONNX or TensorFlow Lite for edge devices.
|
| 394 |
|