Edit Models filters

Misc

compressed-tensors

Inference Endpoints

AutoTrain Compatible

text-generation-inference

8-bit precision

Misc with no match

4-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

781

Full-text search

Active filters: compressed-tensors

nm-testing/tinyllama-oneshot-w4a16-channel-v2

Text Generation • Updated Oct 9, 2024 • 5.05k • 1

nm-testing/tinyllama-oneshot-w4a16-channel-v3

Text Generation • Updated Jun 7, 2024 • 63

nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change

Text Generation • Updated Oct 9, 2024 • 16.1k

nm-testing/tinyllama-oneshot-w8a8-test-static-shape-change-v3

Text Generation • Updated Aug 30, 2024 • 6

nm-testing/tinyllama-oneshot-w8a8-channel-dynamic-token-v2

Text Generation • Updated Oct 9, 2024 • 6.38k

nm-testing/tinyllama-oneshot-w8-channel-a8-tensor

Text Generation • Updated Oct 9, 2024 • 6.45k

nm-testing/llama-3-instruct-w8a8-dyn-per-token-test

Text Generation • Updated Oct 9, 2024 • 3

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dyn-Per-Token

Text Generation • Updated Oct 9, 2024 • 5

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-Dyn-Per-Token-2048-Samples

Text Generation • Updated Oct 9, 2024 • 5.51k

nm-testing/tinyllama-oneshot-w8a16-per-channel

Text Generation • Updated Oct 9, 2024 • 4.66k

nm-testing/Meta-Llama-3-8B-Instruct-W8-Channel-A8-Dynamic-Per-Token-Test

Text Generation • Updated Oct 9, 2024 • 151

nm-testing/Meta-Llama-3-8B-Instruct-W4-Group128-A16-Test

Text Generation • Updated Oct 9, 2024 • 79

nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test

Text Generation • Updated Oct 9, 2024 • 5.65k

nm-testing/Meta-Llama-3-8B-FP8-compressed-tensors-test-bos

Text Generation • Updated Oct 9, 2024 • 8

nm-testing/TinyLlama-1.1B-compressed-tensors-kv-cache-scheme

Text Generation • Updated Oct 9, 2024 • 31.9k

nm-testing/Meta-Llama-3-8B-Instruct-W4A16-compressed-tensors-test

Text Generation • Updated Oct 9, 2024 • 77

nm-testing/Qwen2-0.5B-Instruct

Text Generation • Updated Oct 9, 2024 • 75

neuralmagic/Llama-2-7b-chat-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 633 • 1

neuralmagic/Meta-Llama-3-8B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 1.54k • 2

neuralmagic/Qwen2-1.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 1.33k

nm-testing/Qwen2-1.5B-Instruct-W8A16-Channelwise

Text Generation • Updated Oct 9, 2024 • 79

neuralmagic/Qwen2-0.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 243

neuralmagic/Phi-3-medium-128k-instruct-quantized.w4a16

Text Generation • Updated Oct 9, 2024 • 422k • 3

neuralmagic/Qwen2-7B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 519

neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 186

neuralmagic/Qwen2-72B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 121 • 1

nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V

Text Generation • Updated Oct 9, 2024 • 4

nm-testing/Meta-Llama-3-8B-Instruct-W8A8-FP8-Channelwise-compressed-tensors

Text Generation • Updated Oct 9, 2024 • 1.11k

nm-testing/Meta-Llama-3-8B-Instruct-Non-Uniform-compressed-tensors

Text Generation • Updated Oct 9, 2024 • 3

nm-testing/nonuniform

Text Generation • Updated Oct 9, 2024 • 5