README.md · simustar/Yi-34B-Llama-GGUF at 9382952043f566eb1e62117cc0eabda5fe17cbd7

metadata

pipeline_tag: text-generation

This model repository contains files in GGUF format for the Yi 34B LLaMA, compatible with LLaMA modeling, based on the work from the chargoddard/Yi-34B-Llama repository.

Based of the work of chargoddard's:

Tensors have been renamed to match the standard LLaMA.
Model can be loaded without trust_remote_code, but the tokenizer can not.

Converted & Quantized Files

Q2

Yi-34B-Llama_Q2_K.gguf with quantization method Q2_K (smallest, extreme quality loss - not recommended).

Q3

Yi-34B-Llama_Q3_K_S.gguf with quantization method Q3_K_S (very small, very high quality loss).
Yi-34B-Llama_Q3_K_M.gguf with quantization method Q3_K_M (very small, very high quality loss).
Yi-34B-Llama_Q3_K_L.gguf with quantization method Q3_K_L (small, substantial quality loss).

Q4

Yi-34B-Llama_Q4_K_S.gguf with quantization method Q4_K_S (small, significant quality loss).
Yi-34B-Llama_Q4_K_M.gguf with quantization method Q4_K_M (medium, balanced quality - recommended).

Q5

Yi-34B-Llama_Q5_K_S.gguf with quantization method Q5_K_S (large, low quality loss - recommended).
Yi-34B-Llama_Q5_K_M.gguf with quantization method Q5_K_M (large, very low quality loss - recommended).

Q6

Yi-34B-Llama_Q6_K.gguf with quantization method Q6_K (very large, extremely low quality loss).

Q8

Yi-34B-Llama_Q8_0.gguf with quantization method Q8_0 (very large, extremely low quality loss - not recommended).