Safetensors
qwen2
Research-EAI commited on
Commit
1150f1a
Β·
verified Β·
1 Parent(s): e5c15c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # 🏷️ EAI-Taxonomy-0.5b
5
 
6
  ## πŸ“‹ Model Description
7
 
8
- EAI-Taxonomy-0.5b is a fine-tuned version of Qwen2.5-0.5B-Instruct designed for document classification across 12 taxonomic categories. This model is optimized for high-throughput classification of web documents and produces structured metadata for large-scale dataset curation.
9
 
10
  The model classifies documents across the following dimensions:
11
  - **πŸ“š Free Decimal Correspondence (FDC)**: Subject matter classification based on the Dewey Decimal System
@@ -35,8 +35,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
35
  import random
36
 
37
  # Load model and tokenizer
38
- tokenizer = AutoTokenizer.from_pretrained("your-org/EAI-Taxonomy-0.5b", trust_remote_code=True)
39
- model = AutoModelForCausalLM.from_pretrained("your-org/EAI-Taxonomy-0.5b")
40
 
41
  def chunk_text(text, max_char_per_doc=30000):
42
  if len(text) <= max_char_per_doc:
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # 🏷️ EAI-Distill-0.5b
5
 
6
  ## πŸ“‹ Model Description
7
 
8
+ EAI-Distill-0.5b is a fine-tuned version of Qwen2.5-0.5B-Instruct designed for document classification across 12 taxonomic categories. This model is optimized for high-throughput classification of web documents and produces structured metadata for large-scale dataset curation.
9
 
10
  The model classifies documents across the following dimensions:
11
  - **πŸ“š Free Decimal Correspondence (FDC)**: Subject matter classification based on the Dewey Decimal System
 
35
  import random
36
 
37
  # Load model and tokenizer
38
+ tokenizer = AutoTokenizer.from_pretrained("EssentialAI/EAI-Distill-0.5b", trust_remote_code=True)
39
+ model = AutoModelForCausalLM.from_pretrained("EssentialAI/EAI-Distill-0.5b")
40
 
41
  def chunk_text(text, max_char_per_doc=30000):
42
  if len(text) <= max_char_per_doc: