|
--- |
|
library_name: peft |
|
metrics: |
|
- precision |
|
- recall |
|
- f1 |
|
- accuracy |
|
base_model: NousResearch/Llama-2-7b-hf |
|
model-index: |
|
- name: billm-llama-7b-conll03-ner |
|
results: [] |
|
license: mit |
|
datasets: |
|
- conll2003 |
|
language: |
|
- en |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# billm-llama-7b-conll03-ner |
|
|
|
<a href="https://arxiv.org/abs/2310.01208"> |
|
<img src="https://img.shields.io/badge/Arxiv-2310.01208-blue.svg?style=flat-square" alt="https://arxiv.org/abs/2310.01208" /> |
|
</a> |
|
<a href="https://arxiv.org/abs/2311.05296"> |
|
<img src="https://img.shields.io/badge/Arxiv-2311.05296-yellow.svg?style=flat-square" alt="https://arxiv.org/abs/2311.05296" /> |
|
</a> |
|
|
|
This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) using [BiLLM](https://github.com/WhereIsAI/BiLLM). |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.1664 |
|
- Precision: 0.9243 |
|
- Recall: 0.9395 |
|
- F1: 0.9319 |
|
- Accuracy: 0.9860 |
|
|
|
## Inference |
|
|
|
```bash |
|
python -m pip install -U billm==0.1.1 |
|
``` |
|
|
|
```python |
|
from transformers import AutoTokenizer, pipeline |
|
from peft import PeftModel, PeftConfig |
|
from billm import MistralForTokenClassification |
|
|
|
|
|
label2id = {'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8} |
|
id2label = {v: k for k, v in label2id.items()} |
|
model_id = 'WhereIsAI/billm-llama-7b-conll03-ner' |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
peft_config = PeftConfig.from_pretrained(model_id) |
|
model = MistralForTokenClassification.from_pretrained( |
|
peft_config.base_model_name_or_path, |
|
num_labels=len(label2id), id2label=id2label, label2id=label2id |
|
) |
|
model = PeftModel.from_pretrained(model, model_id) |
|
# merge_and_unload is necessary for inference |
|
model = model.merge_and_unload() |
|
|
|
token_classifier = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple") |
|
sentence = "I live in Hong Kong. I am a student at Hong Kong PolyU." |
|
tokens = token_classifier(sentence) |
|
print(tokens) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0002 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 10 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |
|
|:-------------:|:-----:|:-----:|:---------------:|:---------:|:------:|:------:|:--------:| |
|
| 0.048 | 1.0 | 1756 | 0.0971 | 0.8935 | 0.9082 | 0.9008 | 0.9813 | |
|
| 0.0217 | 2.0 | 3512 | 0.0963 | 0.9182 | 0.9301 | 0.9241 | 0.9852 | |
|
| 0.0113 | 3.0 | 5268 | 0.1081 | 0.9265 | 0.9348 | 0.9306 | 0.9858 | |
|
| 0.0038 | 4.0 | 7024 | 0.1477 | 0.9216 | 0.9379 | 0.9297 | 0.9858 | |
|
| 0.0016 | 5.0 | 8780 | 0.1617 | 0.9199 | 0.9370 | 0.9284 | 0.9855 | |
|
| 0.0007 | 6.0 | 10536 | 0.1618 | 0.9235 | 0.9390 | 0.9312 | 0.9859 | |
|
| 0.0005 | 7.0 | 12292 | 0.1644 | 0.9245 | 0.9395 | 0.9319 | 0.9860 | |
|
| 0.0004 | 8.0 | 14048 | 0.1662 | 0.9248 | 0.9393 | 0.9320 | 0.9861 | |
|
| 0.0003 | 9.0 | 15804 | 0.1664 | 0.9248 | 0.9395 | 0.9321 | 0.9861 | |
|
| 0.0003 | 10.0 | 17560 | 0.1664 | 0.9243 | 0.9395 | 0.9319 | 0.9860 | |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.9.0 |
|
- Transformers 4.38.2 |
|
- Pytorch 2.0.1 |
|
- Datasets 2.16.0 |
|
- Tokenizers 0.15.0 |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@inproceedings{li2024bellm, |
|
title = "BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings", |
|
author = "Li, Xianming and Li, Jing", |
|
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics", |
|
year = "2024", |
|
publisher = "Association for Computational Linguistics" |
|
} |
|
|
|
@article{li2023label, |
|
title={Label supervised llama finetuning}, |
|
author={Li, Zongxi and Li, Xianming and Liu, Yuzhang and Xie, Haoran and Li, Jing and Wang, Fu-lee and Li, Qing and Zhong, Xiaoqin}, |
|
journal={arXiv preprint arXiv:2310.01208}, |
|
year={2023} |
|
} |
|
``` |
|
|