|
--- |
|
license: mit |
|
datasets: |
|
- Umean/B2NERD |
|
language: |
|
- en |
|
- zh |
|
library_name: peft |
|
--- |
|
|
|
# B2NER |
|
|
|
We present B2NERD, a cohesive and efficient dataset that can improve LLMs' generalization on the challenging Open NER task, refined from 54 existing English or Chinese datasets. |
|
Our B2NER models, trained on B2NERD, outperform GPT-4 by 6.8-12.0 F1 points and surpass previous methods in 3 out-of-domain benchmarks across 15 datasets and 6 languages. |
|
|
|
- ๐ Paper: [Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition](http://arxiv.org/abs/2406.11192) |
|
- ๐ฎ Github Repo: https://github.com/UmeanNever/B2NER |
|
- ๐ Data: See below data section. You can download from [HuggingFace](https://huggingface.co/datasets/Umean/B2NERD) or [Google Drive](https://drive.google.com/file/d/11Wt4RU48i06OruRca2q_MsgpylzNDdjN/view?usp=drive_link). |
|
- ๐พ Model (LoRA Adapters): Current repo saves the B2NER model LoRA adapter based on InternLM2.5-7B. See [20B model](https://huggingface.co/Umean/B2NER-Internlm2-20B-LoRA) for a 20B adapter. |
|
|
|
**See github repo for more information about model usage and this work.** |
|
|
|
# Cite |
|
``` |
|
@article{yang2024beyond, |
|
title={Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition}, |
|
author={Yang, Yuming and Zhao, Wantong and Huang, Caishuang and Ye, Junjie and Wang, Xiao and Zheng, Huiyuan and Nan, Yang and Wang, Yuran and Xu, Xueying and Huang, Kaixin and others}, |
|
journal={arXiv preprint arXiv:2406.11192}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|
|
|