# Model Card for BERT Slot Filling Model This BERT-based model is designed for slot filling tasks in natural language sentences, ideal for extracting specific information in applications like chatbots and virtual assistants. For example: input: Transfer $500 from checking to student savings output: transfer **[$500:B-amount]** from **[checking:B-account-from]** to **[student:B-account-to]** **[savings:I-account-to]** The model was trained with dataset https://github.com/SunLemuria/JointBERT-Tensorflow1/blob/master/data/ Please make yourself familiar with BERT: BERT for Joint Intent Classification and Slot Filling https://arxiv.org/pdf/1902.10909.pdf ## Model Details
Tag Definition
B-account-from Start of the source account in a transaction.
I-account-from Complement of the source account in a transaction.
B-account-to Start of the target account in a transaction.
I-account-to Complement of the target account in a transaction.
B-bill_type Start of the type of bill or service.
I-bill_type Complement of the type of bill or service.
B-transaction-from Start of the origin of a transaction or fraud.
I-transaction-from Complement of the origin of a transaction or fraud.
B-transaction-to Start of the destination or end of a transaction or fraud.
I-transaction-to Complement of the destination of a transaction or fraud.
B-amount Start of a specified amount of money.
I-amount Complement of a specified amount of money.
B-timeRange Start of a specific time range or date.
I-timeRange Complement of a specific time range or date.
### Model Description - **Developed by:** Andy González - **Model type:** Token Classification - **Language(s) (NLP):** English - **Finetuned from model [optional]:** bert-base-uncased ## How to Get Started with the Model ```python !pip install torch transformers import os import requests import torch from transformers import BertForTokenClassification, BertTokenizerFast # URL y archivo para los slots slots_url = 'https://huggingface.co/andgonzalez/bert-uncased-slot-filling/raw/main/slots.txt' slots_file = 'slots.txt' device = "cpu" # Descargar y guardar los slots si no existen if not os.path.exists(slots_file): response = requests.get(slots_url) response.raise_for_status() with open(slots_file, 'w') as file: file.write(response.text) # Leer los slots with open(slots_file, 'r') as file: slot_labels = file.read().splitlines() # Cargar el tokenizador y el modelo tokenizer = BertTokenizerFast.from_pretrained('andgonzalez/bert-uncased-slot-filling') model = BertForTokenClassification.from_pretrained('andgonzalez/bert-uncased-slot-filling') # Ejemplo sentence = "Transfer $500 from checking to student savings" inputs = tokenizer(sentence, truncation=True, padding='max_length', max_length=20, return_tensors="pt") inputs = {k: v.to(device) for k, v in inputs.items()} with torch.no_grad(): model.eval() outputs = model(**inputs) # Procesar los logits para obtener predicciones logits = outputs.logits predictions = torch.argmax(logits, dim=2).squeeze().cpu().numpy() words = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze().cpu().numpy()) # Inicializar skip_next skip_next = False # Formatear la oracion formatted_sentence = [] for i, (word, pred) in enumerate(zip(words, predictions)): if word not in ['[PAD]', '[SEP]', '[CLS]']: label = slot_labels[pred] if word == "$" and i + 1 < len(words) and words[i + 1].replace("##", "").isdigit(): next_word = words[i + 1].replace("##", "") combined_word = word + next_word formatted_word = f'[{combined_word}:{label}]' formatted_sentence.append(formatted_word) skip_next = True elif skip_next: skip_next = False continue elif not word.startswith("##"): if label != 'O': formatted_word = f'[{word}:{label}]' else: formatted_word = word formatted_sentence.append(formatted_word) formatted_sentence = ' '.join(formatted_sentence) print(formatted_sentence) ``` ## Training Details #### Metrics - Metrics used: Precision, Recall, F1-Score ### Results - **Epoch 1:** Loss: 1.3253, Precision: 0.5862, Recall: 0.5758, F1-Score: 0.5633 - **Epoch 2:** Loss: 0.3507, Precision: 0.7491, Recall: 0.7476, F1-Score: 0.7374 - **Epoch 3:** Loss: 0.2156, Precision: 0.8180, Recall: 0.8138, F1-Score: 0.8007 - **Epoch 4:** Loss: 0.1593, Precision: 0.8252, Recall: 0.8274, F1-Score: 0.8173 - **Epoch 5:** Loss: 0.1236, Precision: 0.8613, Recall: 0.8549, F1-Score: 0.8466 - **Epoch 6:** Loss: 0.0961, Precision: 0.8839, Recall: 0.8810, F1-Score: 0.8786 - **Epoch 7:** Loss: 0.0787, Precision: 0.8795, Recall: 0.8917, F1-Score: 0.8808 - **Epoch 8:** Loss: 0.0644, Precision: 0.8956, Recall: 0.8958, F1-Score: 0.8911 - **Epoch 9:** Loss: 0.0542, Precision: 0.8889, Recall: 0.9012, F1-Score: 0.8913 - **Epoch 10:** Loss: 0.0468, Precision: 0.8980, Recall: 0.9007, F1-Score: 0.8935 - **Best Model:** Epoch 8, Test Loss: 0.1588 ### Plots ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c8634b814338722a66fcfb/vRyiH2wa3T53xr6ev4g1C.png)