Umean
/

B2NER-Internlm2.5-7B-LoRA

PEFT

Safetensors

English

Chinese

Model card Files Files and versions Community

Umean commited on Jul 24, 2024

Commit

00bc5a6

verified ·

1 Parent(s): 51837d2

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -0

README.md CHANGED Viewed

@@ -28,6 +28,60 @@ Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and
  - 💾 Model (LoRA Adapters): Current repo saves the B2NER model LoRA adapter based on InternLM2.5-7B. See [20B model](https://huggingface.co/Umean/B2NER-Internlm2-20B-LoRA) for a 20B adapter.
 ## Cite
 ```
 @article{yang2024beyond,

  - 💾 Model (LoRA Adapters): Current repo saves the B2NER model LoRA adapter based on InternLM2.5-7B. See [20B model](https://huggingface.co/Umean/B2NER-Internlm2-20B-LoRA) for a 20B adapter.
+## Sample Usage - Quick Demo
+Here we show how to use our provided lora adapter to do quick demo with customized input. You can also refer to github repo's `src/demo.ipynb` to see our examples and reuse for your own demo.
+ - Prepare/download our LoRA checkpoint and corresponding backbone model.
+ - Load the model & tokenizer.
+```python
+import torch
+from peft import PeftModel, PeftConfig
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load the base model and tokenizer, use your own path/name
+base_model_path = "/path/to/backbone_model"
+base_model = AutoModelForCausalLM.from_pretrained(base_model_path,
+                                                  trust_remote_code=True, torch_dtype=torch.float16)
+tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
+# Load and apply the PEFT model, point weight path to your own directory where an adapter_config.json is located
+lora_weight_path = "/path/to/adapter"
+config = PeftConfig.from_pretrained(lora_weight_path)
+model = PeftModel.from_pretrained(base_model, lora_weight_path, torch_dtype=torch.bfloat16)
+```
+ - Set `text` and `labels` for your NER demo. Prepare instructions and generate the answer. Below are an English example and a Chinese example based on our B2NER-InternLM2.5-7B (Both examples are out-of-domain data).
+```python
+## English Example ##
+# Input your own text and target entity labels. The model will extract entities inside provided label set from text.
+text = "what is a good 1990 s romance movie starring kelsy grammer"
+labels = ["movie genre", "year or time period", "movie title", "movie actor", "movie age rating"]
+instruction_template_en = "Given the label set of entities, please recognize all the entities in the text. The answer format should be \"entity label: entity; entity label: entity\". \nLabel Set: {labels_str} \n\nText: {text} \nAnswer:"
+labels_str = ", ".join(labels)
+final_instruction = instruction_template_en.format(labels_str=labels_str, text=text)
+inputs = tokenizer([final_instruction], return_tensors="pt")
+output = model.generate(**inputs, max_length=500)
+generated_text = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
+print(generated_text.split("Answer:")[-1])
+# year or time period: 1990 s; movie genre: romance; movie actor: kelsy grammer
+## 中文例子 ##
+# 输入您自己的文本和目标实体类别标签。模型将从文本中提取出在提供的标签集内的实体。
+text = "暴雪中国时隔多年之后再次举办了官方比赛，而Moon在星际争霸2中发挥不是很理想，对此Infi感觉Moon是哪里出了问题呢？"
+labels = ["人名", "作品名->文字作品", "作品名->游戏作品", "作品名->影像作品", "组织机构名->政府机构", "组织机构名->公司", "组织机构名->其它", "地名"]
+instruction_template_zh = "给定实体的标签范围，请识别文本中属于这些标签的所有实体。答案格式为 \"实体标签: 实体; 实体标签: 实体\"。\n标签范围: {labels_str}\n\n文本: {text} \n答案:"
+labels_str = ", ".join(labels)
+final_instruction = instruction_template_zh.format(labels_str=labels_str, text=text)
+inputs = tokenizer([final_instruction], return_tensors="pt")
+output = model.generate(**inputs, max_length=500)
+generated_text = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
+print(generated_text.split("答案:")[-1])
+# 组织机构名->公司: 暴雪中国; 人名: Moon; 作品名->游戏作品: 星际争霸2; 人名: Infi
+```
 ## Cite
 ```
 @article{yang2024beyond,