gair-prox
/

RedPJ-ProX-1.7B

Model card Files Files and versions Community

RedPJ-ProX-1.7B / README.md

koalazf99's picture

Update README.md

fd3bc36 verified 4 months ago

|

history blame contribute delete

1.36 kB

	---
	license: apache-2.0
	datasets:
	- gair-prox/RedPajama-pro
	language:
	- en
	tags:
	- llama
	---

	# RedPJ-ProX-1.7B

	<p align="center">
	<img src="prox-teaser.png">
	</p>

	[ArXiv](http://arxiv.org/abs/2409.17115) \| [Models](https://huggingface.co/gair-prox/RedPJ-ProX-1.7B) \| [Data](https://huggingface.co/datasets/gair-prox/RedPajama-pro) \| [Code](https://github.com/GAIR-NLP/program-every-example)

	RedPJ-ProX-1.7B is a small language model. It was and trained on the [RedPajama-V2-pro](https://huggingface.co/datasets/gair-prox/RedPajama-pro) for 50B tokens.

	## Evaluations

	ProX models are evaluated over 10 language model benchmarks in zero-shot setting.

	\| \| ArC-c \| ARC-e \| CSQA \| HellaS \| MMLU \| OBQA \| PiQA \| SIQA \| WinoG \| SciQ \| AVG \|
	\|-----------------------\|-------\|-------\|-------\|-----------\|-------\|-------\|-------\|-------\|-------\|-------\|------\|
	\| raw \| 26.9 \| 51.4 \| 32.4 \| 47.3 \| 29.3 \| 32.2 \| 69.7 \| 39.6 \| 52.1 \| 79.1 \| 46.0 \|
	\| ours \| 31.1 \| 60.7 \| 29.8 \| 51.0 \| 31.7 \| 33.2 \| 70.9 \| 39.2 \| 53.3 \| 79.1 \| 48.0 \|

	### Citation
	```
	@article{zhou2024programming,
	title={Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale},
	author={Zhou, Fan and Wang, Zengzhi and Liu, Qian and Li, Junlong and Liu, Pengfei},
	journal={arXiv preprint arXiv:2409.17115},
	year={2024}
	}
	```