NickL77
/

BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha

Model card Files Files and versions Community

BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha / README.md

NickL77's picture

Create README.md

845051b verified 4 months ago

|

history blame contribute delete

1.56 kB

	---
	library_name: transformers
	tags: []
	---
	# Eagle Speculative Decoding Model Trained with BaldEagle
	BaldEagle Repo: https://github.com/NickL77/BaldEagle/

	Experimental model with training-time test from Eagle 3

	11.7% faster, 8.4% greater acceptance rate than Eagle 2 baseline
	- see below for baseline

	Benchmarking w/ sglang

	Increasing `speculative-num-steps` from 5 -> 8 based on https://github.com/SafeAILab/EAGLE/issues/209

	```
	python3 -m sglang.launch_server \
	--model meta-llama/Meta-Llama-3-8B-Instruct \
	--speculative-algo EAGLE \
	--speculative-draft NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha \
	--speculative-num-steps 8 \
	--speculative-eagle-topk 8 \
	--speculative-num-draft-tokens 64 \
	--dtype bfloat16 \
	--port 30000 \
	--mem-fraction-static 0.65
	```

	> #questions: 80, Throughput: 169.49 token/s, Acceptance length: 3.98
	>
	> runtime: 4 min 50 sec

	With `speculative-num-steps` equals 5.

	```
	python3 -m sglang.launch_server \
	--model meta-llama/Meta-Llama-3-8B-Instruct \
	--speculative-algo EAGLE \
	--speculative-draft NickL77/BaldEagle-TTT-Llama-3.1-8B-Instruct-alpha \
	--speculative-num-steps 5 \
	--speculative-eagle-topk 8 \
	--speculative-num-draft-tokens 64 \
	--dtype bfloat16 \
	--port 30000 \
	--mem-fraction-static 0.65
	```

	> #questions: 80, Throughput: 165.10 token/s, Acceptance length: 3.86
	>
	> runtime: 5 min 10 sec


	Baseline: https://huggingface.co/NickL77/BaldEagle-Llama-3.1-8B-Instruct
	> #questions: 80, Throughput: 156.33 token/s, Acceptance length: 3.57
	>
	> runtime: 5 min 24 sec