Spaces:

NCSOFT
/

VARCO_Arena

Running

App Files Files Community

VARCO_Arena / guide_mds /input_jsonls_en.md

sonsus

others

c2ba4d5 3 months ago

preview code

raw

history blame

1.74 kB

	#### \[EN\] Upload guide (`jsonl`)
	Basic Requirements
	* Upload one `jsonl` file per model (e.g., five files to compare five LLMs)
	* ⚠️ Important: All `jsonl` files must have the same number of rows
	* ⚠️ Important: The `model_id` field must be unique within and across all files

	Required Fields
	* Per Model Fields
	* `model_id`: Unique identifier for the model (recommendation: keep it short)
	* `generated`: The LLM's response to the test instruction

	* Required only for Translation (`translation_pair` prompt need those. See `streamlit_app_local/user_submit/mt/llama5.jsonl`)
	* `source_lang`: input language (e.g. Korean, KR, kor, ...)
	* `target_lang`: output language (e.g. English, EN, ...)

	* Common Fields (Must be identical across all files)
	* `instruction`: The input prompt or test instruction given to the model
	* `task`: Category label used to group results (useful when using different evaluation prompts per task)

	Example Format
	```python
	# model1.jsonl
	{"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"}
	{"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"}

	# model2.jsonl
	{"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"}
	{"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"}
	...
	..
	.
	```
	Use Case Example
	If you want to compare different prompting strategies for the same model:
	* Use the same `instruction` across files (using unified test scenarios).
	* `generated` responses of each prompting strategy will vary across the files.
	* Use descriptive `model_id` values like "prompt1", "prompt2", etc.