Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,60 @@ Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and
|
|
28 |
- 💾 Model (LoRA Adapters): Current repo saves the B2NER model LoRA adapter based on InternLM2.5-7B. See [20B model](https://huggingface.co/Umean/B2NER-Internlm2-20B-LoRA) for a 20B adapter.
|
29 |
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Cite
|
32 |
```
|
33 |
@article{yang2024beyond,
|
|
|
28 |
- 💾 Model (LoRA Adapters): Current repo saves the B2NER model LoRA adapter based on InternLM2.5-7B. See [20B model](https://huggingface.co/Umean/B2NER-Internlm2-20B-LoRA) for a 20B adapter.
|
29 |
|
30 |
|
31 |
+
## Sample Usage - Quick Demo
|
32 |
+
Here we show how to use our provided lora adapter to do quick demo with customized input. You can also refer to github repo's `src/demo.ipynb` to see our examples and reuse for your own demo.
|
33 |
+
- Prepare/download our LoRA checkpoint and corresponding backbone model.
|
34 |
+
- Load the model & tokenizer.
|
35 |
+
```python
|
36 |
+
import torch
|
37 |
+
from peft import PeftModel, PeftConfig
|
38 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
39 |
+
|
40 |
+
# Load the base model and tokenizer, use your own path/name
|
41 |
+
base_model_path = "/path/to/backbone_model"
|
42 |
+
base_model = AutoModelForCausalLM.from_pretrained(base_model_path,
|
43 |
+
trust_remote_code=True, torch_dtype=torch.float16)
|
44 |
+
tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
|
45 |
+
|
46 |
+
# Load and apply the PEFT model, point weight path to your own directory where an adapter_config.json is located
|
47 |
+
lora_weight_path = "/path/to/adapter"
|
48 |
+
config = PeftConfig.from_pretrained(lora_weight_path)
|
49 |
+
model = PeftModel.from_pretrained(base_model, lora_weight_path, torch_dtype=torch.bfloat16)
|
50 |
+
```
|
51 |
+
|
52 |
+
- Set `text` and `labels` for your NER demo. Prepare instructions and generate the answer. Below are an English example and a Chinese example based on our B2NER-InternLM2.5-7B (Both examples are out-of-domain data).
|
53 |
+
|
54 |
+
```python
|
55 |
+
## English Example ##
|
56 |
+
# Input your own text and target entity labels. The model will extract entities inside provided label set from text.
|
57 |
+
text = "what is a good 1990 s romance movie starring kelsy grammer"
|
58 |
+
labels = ["movie genre", "year or time period", "movie title", "movie actor", "movie age rating"]
|
59 |
+
|
60 |
+
instruction_template_en = "Given the label set of entities, please recognize all the entities in the text. The answer format should be \"entity label: entity; entity label: entity\". \nLabel Set: {labels_str} \n\nText: {text} \nAnswer:"
|
61 |
+
labels_str = ", ".join(labels)
|
62 |
+
final_instruction = instruction_template_en.format(labels_str=labels_str, text=text)
|
63 |
+
inputs = tokenizer([final_instruction], return_tensors="pt")
|
64 |
+
output = model.generate(**inputs, max_length=500)
|
65 |
+
generated_text = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
|
66 |
+
print(generated_text.split("Answer:")[-1])
|
67 |
+
# year or time period: 1990 s; movie genre: romance; movie actor: kelsy grammer
|
68 |
+
|
69 |
+
|
70 |
+
## 中文例子 ##
|
71 |
+
# 输入您自己的文本和目标实体类别标签。模型将从文本中提取出在提供的标签集内的实体。
|
72 |
+
text = "暴雪中国时隔多年之后再次举办了官方比赛,而Moon在星际争霸2中发挥不是很理想,对此Infi感觉Moon是哪里出了问题呢?"
|
73 |
+
labels = ["人名", "作品名->文字作品", "作品名->游戏作品", "作品名->影像作品", "组织机构名->政府机构", "组织机构名->公司", "组织机构名->其它", "地名"]
|
74 |
+
|
75 |
+
instruction_template_zh = "给定实体的标签范围,请识别文本中属于这些标签的所有实体。答案格式为 \"实体标签: 实体; 实体标签: 实体\"。\n标签范围: {labels_str}\n\n文本: {text} \n答案:"
|
76 |
+
labels_str = ", ".join(labels)
|
77 |
+
final_instruction = instruction_template_zh.format(labels_str=labels_str, text=text)
|
78 |
+
inputs = tokenizer([final_instruction], return_tensors="pt")
|
79 |
+
output = model.generate(**inputs, max_length=500)
|
80 |
+
generated_text = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
|
81 |
+
print(generated_text.split("答案:")[-1])
|
82 |
+
# 组织机构名->公司: 暴雪中国; 人名: Moon; 作品名->游戏作品: 星际争霸2; 人名: Infi
|
83 |
+
```
|
84 |
+
|
85 |
## Cite
|
86 |
```
|
87 |
@article{yang2024beyond,
|