Downtown-Case
/

nbeerbower_EVA-Gutenberg3-Qwen2.5-32B-exl2-4bpw-8K-Cal

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

nbeerbower_EVA-Gutenberg3-Qwen2.5-32B-exl2-4bpw-8K-Cal / README.md

Downtown-Case's picture

Upload folder using huggingface_hub

4e30c65 verified 21 days ago

|

history blame contribute delete

1.58 kB

	---
	license: apache-2.0
	library_name: transformers
	base_model:
	- EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
	datasets:
	- jondurbin/gutenberg-dpo-v0.1
	- nbeerbower/gutenberg2-dpo
	- nbeerbower/gutenberg-moderne-dpo
	---

	# Quantization

	Quantized using the default exllamav2 quantization script/dataset, with the following changes:
	- Context length for the calibration/quantization phases were both forced to 8192, as the script does not respect CLI changes by default and simply uses 512/2048 as context lengths.
	- Fewer rows, but ultimately, much more data was used.
	- A few rows of an "extra" dataset, with some examples of long, coherent text and this model's chat tokens, were added to the dataset.

	The goal is less degredation from quantization at long context. But I tried to stay as close to default exl2 quantization parameters as possible, as straying too far from them only seems to degrade performance.

	![image/png](https://huggingface.co/nbeerbower/mistral-nemo-gutenberg3-12B/resolve/main/gutenberg3.png?download=true)

	# EVA-Gutenberg3-Qwen2.5-32B

	[EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2) finetuned on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), [nbeerbower/gutenberg2-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg2-dpo), and [nbeerbower/gutenberg-moderne-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg-moderne-dpo).

	### Method

	[ORPO tuned](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) with 8x A100 for 2 epochs.