|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
# Eagle Speculative Decoding Model Trained with BaldEagle |
|
BaldEagle Repo: https://github.com/NickL77/BaldEagle/ |
|
|
|
Experimental model with training-time test from Eagle 3 |
|
|
|
11.7% faster, 8.4% greater acceptance rate than Eagle 2 baseline |
|
- see below for baseline |
|
|
|
**Benchmarking w/ sglang** |
|
|
|
Increasing `speculative-num-steps` from 5 -> 8 based on https://github.com/SafeAILab/EAGLE/issues/209 |
|
|
|
``` |
|
python3 -m sglang.launch_server \ |
|
--model meta-llama/Meta-Llama-3-8B-Instruct \ |
|
--speculative-algo EAGLE \ |
|
--speculative-draft NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha \ |
|
--speculative-num-steps 8 \ |
|
--speculative-eagle-topk 8 \ |
|
--speculative-num-draft-tokens 64 \ |
|
--dtype bfloat16 \ |
|
--port 30000 \ |
|
--mem-fraction-static 0.65 |
|
``` |
|
|
|
> #questions: 80, Throughput: 169.49 token/s, Acceptance length: 3.98 |
|
> |
|
> runtime: 4 min 50 sec |
|
|
|
With `speculative-num-steps` equals 5. |
|
|
|
``` |
|
python3 -m sglang.launch_server \ |
|
--model meta-llama/Meta-Llama-3-8B-Instruct \ |
|
--speculative-algo EAGLE \ |
|
--speculative-draft NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha \ |
|
--speculative-num-steps 5 \ |
|
--speculative-eagle-topk 8 \ |
|
--speculative-num-draft-tokens 64 \ |
|
--dtype bfloat16 \ |
|
--port 30000 \ |
|
--mem-fraction-static 0.65 |
|
``` |
|
|
|
> #questions: 80, Throughput: 165.10 token/s, Acceptance length: 3.86 |
|
> |
|
> runtime: 5 min 10 sec |
|
|
|
|
|
Baseline: https://huggingface.co/NickL77/BaldEagle-Llama-3.1-8B-Instruct |
|
> #questions: 80, Throughput: 156.33 token/s, Acceptance length: 3.57 |
|
> |
|
> runtime: 5 min 24 sec |