JetLM
/

SDAR-1.7B-Chat

Text Generation

Model card Files Files and versions

ybian-umd commited on Aug 20

Commit

553d10a

·

verified ·

1 Parent(s): ddbde3b

Update README.md

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -26,6 +26,59 @@ library_name: transformers
 > - **Fair Comparisons:** In rigorously controlled experiments, SDAR achieves **on-par general task performance** with strong AR baselines, ensuring credibility and reproducibility.
 > - **Superior Learning Efficiency:** On complex scientific reasoning tasks (e.g., GPQA, ChemBench, Physics), SDAR shows **clear gains over AR models** of the same scale, approaching or even exceeding leading closed-source systems.
 # Performance
 ### SDAR v.s. Qwen

 > - **Fair Comparisons:** In rigorously controlled experiments, SDAR achieves **on-par general task performance** with strong AR baselines, ensuring credibility and reproducibility.
 > - **Superior Learning Efficiency:** On complex scientific reasoning tasks (e.g., GPQA, ChemBench, Physics), SDAR shows **clear gains over AR models** of the same scale, approaching or even exceeding leading closed-source systems.
+# Inference
+## Using the tailored inference engine [JetEngine](https://github.com/Labman42/JetEngine)
+JetEngine enables more efficient inference compared to the built-in implementation.
+```bash
+git clone https://github.com/Labman42/JetEngine.git
+cd JetEngine
+pip install .
+```
+The following example shows how to quickly load a model with JetEngine and run a prompt end-to-end.
+```python
+import os
+from jetengine import LLM, SamplingParams
+from transformers import AutoTokenizer
+model_path = os.path.expanduser("/path/to/your/sdar-model")
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+# Initialize the LLM
+llm = LLM(
+    model_path,
+    enforce_eager=True,
+    tensor_parallel_size=1,
+    mask_token_id=151669,   # Optional: only needed for masked/diffusion models
+    block_length=4
+)
+# Set sampling/generation parameters
+sampling_params = SamplingParams(
+    temperature=1.0,
+    topk=0,
+    topp=1.0,
+    max_tokens=256,
+    remasking_strategy="low_confidence_dynamic",
+    block_length=4,
+    denoising_steps=4,
+    dynamic_threshold=0.9
+)
+# Prepare a simple chat-style prompt
+prompt = tokenizer.apply_chat_template(
+    [{"role": "user", "content": "Explain what reinforcement learning is in simple terms."}],
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Generate text
+outputs = llm.generate_streaming([prompt], sampling_params)
+```
 # Performance
 ### SDAR v.s. Qwen