Ihor
/

OpenBioLLM-Text2Graph-8B

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- aaditya/Llama3-OpenBioLLM-8B
+---
+# OpenBioLLM-Text2Graph-8B
+This model is a biomedical annotation model designed to generate named entity annotations from unlabeled biomedical text.
+It was introduced in the paper [GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition](https://arxiv.org/abs/2504.00676).
+This model enables **high-throughput, cost-efficient synthetic biomedical NER data generation**, serving as the synthetic annotation backbone for [GLiNER-BioMed models](https://huggingface.co/collections/knowledgator/gliner-biomed-67ecf1b7cc62e673dbc8b57f).
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "Ihor/OpenBioLLM-Text2Graph-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+tokenizer.chat_template = "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|end_of_text|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    torch_dtype=torch.bfloat16
+)
+MESSAGES = [
+    {
+        "role": "system",
+        "content": (
+            "You are an advanced assistant trained to process biomedical text for Named Entity Recognition (NER) and Relation Extraction (RE). "
+            "Your task is to analyze user-provided text and provided entities selecting all unique and contextually relevant entities, and infer directed relationships. "
+            "between these entities based on the context. Ensure that all relations exist only between annotated entities. "
+            "Entities and relationships should be human-readable and natural, reflecting real-world concepts and connections. "
+            "Output the annotated data in JSON format, structured as follows:\n\n"
+            """{"entities": [{"id": 0, "text": "ner_string_0", "type": "ner_type_string_0"}, {"id": 1, "text": "ner_string_1", "type": "ner_type_string_1"}], "relations": [{"head": 0, "tail": 1, "type": "re_type_string_0"}]}"""
+            "\n\nEnsure that the output captures all significant entities and their directed relationships in a clear and concise manner."
+        ),
+    },
+    {
+        "role": "user",
+        "content": (
+            'Here is a text input: "John received 3mg of aspirin." '
+            """Here is the list of input entities: ['John', '3mg', 'aspirin']"""
+            "Analyze this text, select and classify the entities, and extract their relationships as per your instructions."
+        ),
+    }
+]
+chat_prompt = tokenizer.apply_chat_template(
+    MESSAGES, tokenize=False, add_generation_prompt=True
+)
+inputs = tokenizer(chat_prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=3000,
+    do_sample=False,
+    eos_token_id=tokenizer.eos_token_id,
+    pad_token_id=tokenizer.eos_token_id,
+    return_dict_in_generate=True
+)
+prompt_len = inputs["input_ids"].shape[-1]
+generated_ids = outputs.sequences[0][prompt_len:]
+response = tokenizer.decode(generated_ids, skip_special_tokens=True)
+print(response)
+```
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{yazdani2025glinerbiomedsuiteefficientmodels,
+      title={GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition},
+      author={Anthony Yazdani and Ihor Stepanov and Douglas Teodoro},
+      year={2025},
+      eprint={2504.00676},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2504.00676},
+}
+```