---
library_name: transformers
tags: []
---
# Eagle Speculative Decoding Model Trained with BaldEagle
BaldEagle Repo: https://github.com/NickL77/BaldEagle/

Experimental model with training-time test from Eagle 3

11.7% faster, 8.4% greater acceptance rate than Eagle 2 baseline
- see below for baseline

**Benchmarking w/ sglang**

Increasing `speculative-num-steps` from 5 -> 8 based on https://github.com/SafeAILab/EAGLE/issues/209

```
python3 -m sglang.launch_server \
  --model meta-llama/Meta-Llama-3-8B-Instruct \
  --speculative-algo EAGLE \
  --speculative-draft NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha \
  --speculative-num-steps 8 \
  --speculative-eagle-topk 8 \
  --speculative-num-draft-tokens 64 \
  --dtype bfloat16 \
  --port 30000 \
  --mem-fraction-static 0.65
```

> #questions: 80, Throughput: 169.49 token/s, Acceptance length: 3.98
>
> runtime: 4 min 50 sec

With `speculative-num-steps` equals 5.

```
python3 -m sglang.launch_server \
  --model meta-llama/Meta-Llama-3-8B-Instruct \
  --speculative-algo EAGLE \
  --speculative-draft NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha \
  --speculative-num-steps 5 \
  --speculative-eagle-topk 8 \
  --speculative-num-draft-tokens 64 \
  --dtype bfloat16 \
  --port 30000 \
  --mem-fraction-static 0.65
```

> #questions: 80, Throughput: 165.10 token/s, Acceptance length: 3.86
>
> runtime: 5 min 10 sec


Baseline: https://huggingface.co/NickL77/BaldEagle-Llama-3.1-8B-Instruct
> #questions: 80, Throughput: 156.33 token/s, Acceptance length: 3.57
>
> runtime: 5 min 24 sec