Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

arxiv: 2210.17323

AutoTrain Compatible

text-generation-inference

Inference Endpoints

4-bit precision

8-bit precision

Carbon Emissions

Misc with no match

text-embeddings-inference

Mixture of Experts

Models

152

Full-text search

Active filters: 2210.17323

daedalus314/Marx-3B-V2-GPTQ

Text Generation • Updated Oct 12, 2023 • 8

TRAC-MTRY/traclm-v2-7b-instruct-GPTQ

Text Generation • Updated Dec 22, 2023

iproskurina/bloom-1b7-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 116

iproskurina/bloom-3b-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 90

iproskurina/bloom-560m-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 109

iproskurina/bloom-1b1-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 111

iproskurina/bloom-7b1-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 12 • 2

iproskurina/opt-350m-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 118

iproskurina/opt-1.3b-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 144

iproskurina/opt-2.7b-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 91

iproskurina/opt-6.7b-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 105

iproskurina/opt-13b-GPTQ-4bit-g128

Text Generation • Updated Sep 24, 2024 • 18

neuralmagic/zephyr-7b-beta-marlin

Text Generation • Updated Mar 6, 2024 • 223

neuralmagic/TinyLlama-1.1B-Chat-v1.0-marlin

Text Generation • Updated Mar 6, 2024 • 4.89k • 1

neuralmagic/OpenHermes-2.5-Mistral-7B-marlin

Text Generation • Updated Mar 6, 2024 • 799 • 2

neuralmagic/Nous-Hermes-2-Yi-34B-marlin

Text Generation • Updated Mar 6, 2024 • 15 • 5

softmax/Llama-2-70b-chat-hf-marlin

Text Generation • Updated Mar 17, 2024 • 24

softmax/falcon-180B-chat-marlin

Text Generation • Updated Mar 21, 2024 • 10

smpanaro/Llama-2-7b-NuGPTQ

Text Generation • Updated Oct 12, 2024 • 17 • 1

TRAC-MTRY/traclm-v3-7b-instruct-GPTQ

Text Generation • Updated May 2, 2024

astronomer/Llama-3-8B-Instruct-GPTQ-8-Bit

Text Generation • Updated Apr 22, 2024 • 487 • 25

astronomer/Llama-3-8B-GPTQ-8-Bit

Text Generation • Updated Apr 22, 2024 • 81 • 2

astronomer/Llama-3-8B-GPTQ-4-Bit

Text Generation • Updated Apr 22, 2024 • 109 • 6

SwastikM/Llama-2-7B-Chat-text2code

Text Generation • Updated May 19, 2024 • 39 • 4

davidxmle/Llama-3-8B-Instruct-GPTQ-4-Bit-Debug

Text Generation • Updated Apr 30, 2024 • 12

drbh/flash-attention-pre-compile

Updated May 29, 2024

neuralmagic/Llama-2-7b-chat-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 614 • 1

neuralmagic/Meta-Llama-3-8B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 5.6k • 2

neuralmagic/Qwen2-1.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 1.57k

neuralmagic/Qwen2-0.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9, 2024 • 253