Update README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
-
# π·οΈ EAI-
|
| 5 |
|
| 6 |
## π Model Description
|
| 7 |
|
| 8 |
-
EAI-
|
| 9 |
|
| 10 |
The model classifies documents across the following dimensions:
|
| 11 |
- **π Free Decimal Correspondence (FDC)**: Subject matter classification based on the Dewey Decimal System
|
|
@@ -35,8 +35,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
| 35 |
import random
|
| 36 |
|
| 37 |
# Load model and tokenizer
|
| 38 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 39 |
-
model = AutoModelForCausalLM.from_pretrained("
|
| 40 |
|
| 41 |
def chunk_text(text, max_char_per_doc=30000):
|
| 42 |
if len(text) <= max_char_per_doc:
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
# π·οΈ EAI-Distill-0.5b
|
| 5 |
|
| 6 |
## π Model Description
|
| 7 |
|
| 8 |
+
EAI-Distill-0.5b is a fine-tuned version of Qwen2.5-0.5B-Instruct designed for document classification across 12 taxonomic categories. This model is optimized for high-throughput classification of web documents and produces structured metadata for large-scale dataset curation.
|
| 9 |
|
| 10 |
The model classifies documents across the following dimensions:
|
| 11 |
- **π Free Decimal Correspondence (FDC)**: Subject matter classification based on the Dewey Decimal System
|
|
|
|
| 35 |
import random
|
| 36 |
|
| 37 |
# Load model and tokenizer
|
| 38 |
+
tokenizer = AutoTokenizer.from_pretrained("EssentialAI/EAI-Distill-0.5b", trust_remote_code=True)
|
| 39 |
+
model = AutoModelForCausalLM.from_pretrained("EssentialAI/EAI-Distill-0.5b")
|
| 40 |
|
| 41 |
def chunk_text(text, max_char_per_doc=30000):
|
| 42 |
if len(text) <= max_char_per_doc:
|