This model repository contains files in GGUF format for the Yi 34B LLaMA, compatible with LLaMA modeling, based on the work from the chargoddard/Yi-34B-Llama repository.

Based of the work of chargoddard's:

  • Tensors have been renamed to match the standard LLaMA.
  • Model can be loaded without trust_remote_code, but the tokenizer can not.

Converted & Quantized Files

Yi-34B-Llamafied Model Options

The following tables list the available Yi-34B-Llamafied model files with their respective quantization methods and characteristics.

Key:

  • Size: File size relative to the original.
  • Quality Loss: The amount of quality loss due to quantization.
Q-Method File Name Size Quality Loss Recommended
Q2 Yi-34B-Llama_Q2_K Smallest Extreme (not recommended)
Q3 Yi-34B-Llama_Q3_K_S Very Small Very High
Q3 Yi-34B-Llama_Q3_K_M Very Small Very High
Q3 Yi-34B-Llama_Q3_K_L Small Substantial
Q4 Yi-34B-Llama_Q4_K_S Small Significant
Q4 Yi-34B-Llama_Q4_K_M Medium Balanced Recommended
Q5 Yi-34B-Llama_Q5_K_S Large Low Recommended
Q5 Yi-34B-Llama_Q5_K_M Large Very Low Recommended
Q6 Yi-34B-Llama_Q6_K Very Large Extremely Low
Q8 Yi-34B-Llama_Q8_0 Very Large Extremely Low (not recommended)

Please choose the model that best suits your needs based on the size and quality loss trade-offs.

Downloads last month
11
GGUF
Model size
34.4B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.