Jellyfish-8B / README.md

Update README.md

783ac46 verified 10 months ago

4.7 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	---
	# Jellyfish-8B
	<!-- Provide a quick summary of what the model is/does. -->
	<!--
	<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
	-->
	<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>


	## Model Details
	Jellyfish-8B is a large language model equipped with 8 billion parameters.
	We fine-tuned the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model using the datasets pertinent to data preprocessing tasks.
	The training data include two parts:
	* Jellyfish-13B training data
	* GPT4 generated reasoning data for data preprocessing tasks.

	<!-- Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. -->

	More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).

	- Developed by: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
	- Contact: [email protected]
	- Funded by: NEC Corporation, Osaka University
	- Language(s) (NLP): English
	- License: Non-Commercial Creative Commons license (CC BY-NC-4.0)
	- Finetuned from model: [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	## Citation

	If you find our work useful, please give us credit by citing:

	```
	@article{zhang2023jellyfish,
	title={Jellyfish: A Large Language Model for Data Preprocessing},
	author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
	journal={arXiv preprint arXiv:2312.01678},
	year={2023}
	}
	```

	## Performance on seen tasks

	\| Task \| Type \| Dataset \| Non-LLM SoTA<sup>1</sup> \| GPT-3.5<sup>2</sup> \| GPT-4<sup>2</sup> \| Jellyfish-13B\| Jellyfish-7B \| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
	\| Entity Matching \| Seen \| Fodors-Zagats \| 100 \| 100 \| 100 \| 100 \| 100 \| 92.68 \|
	\| Entity Matching \| Seen \| Beer \| 94.37\| 96.30 \| 100 \| 96.77 \| 96.55\| 96.30 \|
	\| Entity Matching \| Seen \| iTunes-Amazon \| 97.06\| 96.43 \| 100 \| 98.11 \| 96.30\| 92.00 \|
	\| Entity Matching \| Seen \| DBLP-ACM \| 98.99\| 96.99 \| 97.44 \| 98.98 \| 98.88\| 98.76 \|
	\| Entity Matching \| Seen \| DBLP-GoogleScholar \| 95.60\| 76.12 \| 91.87 \| 98.51 \| 95.15\| 93.20 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58\| 66.53 \| 74.21 \| 81.34 \| 80.83 \| 74.49 \|
	\| Entity Matching \| Unseen \| Walmart-Amazon \| 86.76\| 86.17 \| 90.27 \| 89.42 \| 85.64 \| 89.97 \|
	\| Entity Matching \| Unseen \| Abt-Buy \| 89.33 \| -- \| 92.77 \| 89.58 \| 82.38 \| 92.54 \|
	\| Data Imputation \| Seen \| Restaurant \| 77.20\| 94.19 \| 97.67 \| 94.19 \| 88.37 \| 87.21 \|
	\| Data Imputation \| Seen \| Buy \| 96.50\| 98.46 \| 100 \| 100 \| 96.62 \| 92.31 \|
	\| Data Imputation \| Unseen \| Filpkart \| 68.00 \| -- \| 89.94 \| 81.68 \| 79.44\| 90.17 \|
	\| Data Imputation \| Unseen \| Phone \| 86.70\| -- \| 90.79 \| 87.21 \| 85.00\| 83.92 \|
	\| Error Detection \| Seen \| Hosptial \| 94.40\| 90.74 \| 90.74 \| 95.59 \| 96.27 \| 80.72\|
	\| Error Detection \| Seen \| Adult \| 99.10\| 92.01 \| 92.01 \| 99.33 \| 91.96 \| 81.72\|
	\| Error Detection \| Unseen \| Flights \| 81.00 \| -- \| 83.48 \| 82.52 \| 66.92 \| 75.18 \|
	\| Error Detection \| Unseen \| Rayyan \| 79.00\| -- \| 81.95 \| 90.65 \| 69.82 \| 91.54 \|
	\| Schema Matching \| Seen \| Sythea \| 38.50\| 57.14 \| 66.67 \| 36.36 \| 44.44 \| 27.27 \|
	\| Schema Matching \| Seen \| MIMIC \| 20.00\| -- \| 40.00 \| 40.00 \| 40.00 \| 34.04\|
	\| Schema Matching \| Unseen \| CMS \| 50.00\| -- \| 19.35 \| 59.29 \| 13.79 \| 56.72\|

	_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish-13B and Jellyfish-Interpreter, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
	_Accuracy as the metric for data imputation and the F1 score for other tasks._

	## Performance on unseen tasks

	### Column Type Annotation

	\| Dataset \| RoBERTa (159 shots)<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 \| Jellfish-13B\| Jellyfish-7B \| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----\|----\|
	\| SOTAB \| 79.20 \| 89.47 \| 91.55 \| 82.00 \| 80.89 \| 67.21\|

	_Few-shot is disabled for Jellyfish-13B._

	1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)

	### Attribute Value Extraction

	\| Dataset \|Stable Beluga 2 70B<sup>1</sup> \| SOLAR 70B<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 <sup>1</sup>\| Jellfish-13B \| Jellyfish-7B\| Jellyfish-8B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ----\| ----\|
	\| AE-110k \| 52.10 \| 49.20 \| 61.30 \| 55.50 \| 58.12 \| 76.85\| 69.78\|
	\| OA-Mine \| 50.80 \| 55.20 \| 62.70 \| 68.90 \| 55.96 \| 76.04\| 78.83\|


	## Prompt Template
	```
	[INST]:

	<prompt> (without the <>)

	[\INST]]
	```