anthonyyazdaniml commited on
Commit
ea181c2
·
verified ·
1 Parent(s): 6c87fee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - aaditya/Llama3-OpenBioLLM-8B
7
+ ---
8
+
9
+ # OpenBioLLM-Text2Graph-8B
10
+
11
+ This model is a biomedical annotation model designed to generate named entity annotations from unlabeled biomedical text.
12
+ It was introduced in the paper [GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition](https://arxiv.org/abs/2504.00676).
13
+
14
+ This model enables **high-throughput, cost-efficient synthetic biomedical NER data generation**, serving as the synthetic annotation backbone for [GLiNER-BioMed models](https://huggingface.co/collections/knowledgator/gliner-biomed-67ecf1b7cc62e673dbc8b57f).
15
+
16
+ ## Usage
17
+
18
+ ```python
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer
20
+ import torch
21
+
22
+ model_name = "Ihor/OpenBioLLM-Text2Graph-8B"
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
25
+ tokenizer.chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|end_of_text|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
26
+
27
+ model = AutoModelForCausalLM.from_pretrained(
28
+ model_name,
29
+ device_map="auto",
30
+ torch_dtype=torch.bfloat16
31
+ )
32
+
33
+ MESSAGES = [
34
+ {
35
+ "role": "system",
36
+ "content": (
37
+ "You are an advanced assistant trained to process biomedical text for Named Entity Recognition (NER) and Relation Extraction (RE). "
38
+ "Your task is to analyze user-provided text and provided entities selecting all unique and contextually relevant entities, and infer directed relationships. "
39
+ "between these entities based on the context. Ensure that all relations exist only between annotated entities. "
40
+ "Entities and relationships should be human-readable and natural, reflecting real-world concepts and connections. "
41
+ "Output the annotated data in JSON format, structured as follows:\n\n"
42
+ """{"entities": [{"id": 0, "text": "ner_string_0", "type": "ner_type_string_0"}, {"id": 1, "text": "ner_string_1", "type": "ner_type_string_1"}], "relations": [{"head": 0, "tail": 1, "type": "re_type_string_0"}]}"""
43
+ "\n\nEnsure that the output captures all significant entities and their directed relationships in a clear and concise manner."
44
+ ),
45
+ },
46
+ {
47
+ "role": "user",
48
+ "content": (
49
+ 'Here is a text input: "John received 3mg of aspirin." '
50
+ """Here is the list of input entities: ['John', '3mg', 'aspirin']"""
51
+ "Analyze this text, select and classify the entities, and extract their relationships as per your instructions."
52
+ ),
53
+ }
54
+ ]
55
+
56
+ chat_prompt = tokenizer.apply_chat_template(
57
+ MESSAGES, tokenize=False, add_generation_prompt=True
58
+ )
59
+
60
+ inputs = tokenizer(chat_prompt, return_tensors="pt").to(model.device)
61
+
62
+ outputs = model.generate(
63
+ **inputs,
64
+ max_new_tokens=3000,
65
+ do_sample=False,
66
+ eos_token_id=tokenizer.eos_token_id,
67
+ pad_token_id=tokenizer.eos_token_id,
68
+ return_dict_in_generate=True
69
+ )
70
+
71
+ prompt_len = inputs["input_ids"].shape[-1]
72
+ generated_ids = outputs.sequences[0][prompt_len:]
73
+ response = tokenizer.decode(generated_ids, skip_special_tokens=True)
74
+ print(response)
75
+ ```
76
+
77
+ ## Citation
78
+
79
+ If you use this model, please cite:
80
+
81
+ ```bibtex
82
+ @misc{yazdani2025glinerbiomedsuiteefficientmodels,
83
+ title={GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
84
+ author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
85
+ year={2025},
86
+ eprint={2504.00676},
87
+ archivePrefix={arXiv},
88
+ primaryClass={cs.CL},
89
+ url={https://arxiv.org/abs/2504.00676},
90
+ }
91
+ ```