seastar105
/

Phi-4-mm-inst-zeroth-kor

Safetensors

Korean

phi4mm

custom_code

Eval Results

Model card Files Files and versions Community

seastar105 commited on 1 day ago

Commit

5b63f65

verified ·

1 Parent(s): c6bd99c

Update README.md

Browse files

Files changed (1) hide show

README.md +71 -6

README.md CHANGED Viewed

@@ -17,16 +17,16 @@ model-index:
     metrics:
     - type: bleu
       name: ko2en
-      value: To-be-filled
     - type: bleu
       name: ko2en-cot
-      value: To-be-filled
     - type: bleu
       name: en2ko (ko-mecab)
-      value: To-be-filled
     - type: bleu
       name: en2ko-cot (ko-mecab)
-      value: To-be-filled
   - task:
       type: automatic-speech-recognition
     dataset:
@@ -35,7 +35,7 @@ model-index:
     metrics:
     - type: cer
       name: test CER
-      value: To-be-filled
 language:
 - ko
 ---
@@ -47,4 +47,69 @@ model is trained only 174 steps on zeroth train set, and main purpose is to chec
 ## Evaluation
-ASR on zeroth-test set and fleurs ko <-> en speech translation result will be filled.

     metrics:
     - type: bleu
       name: ko2en
+      value: 7.07
     - type: bleu
       name: ko2en-cot
+      value: 9.19
     - type: bleu
       name: en2ko (ko-mecab)
+      value: 13.08
     - type: bleu
       name: en2ko-cot (ko-mecab)
+      value: 9.35
   - task:
       type: automatic-speech-recognition
     dataset:
     metrics:
     - type: cer
       name: test CER
+      value: 7.02
 language:
 - ko
 ---
 ## Evaluation
+ASR on zeroth-test set and Speech translation on fleurs ko <-> en speech translation result. script is [here](https://gist.github.com/seastar105/d1d8983b27611370528e3b194dcc5577#file-evaluate-py), and used 1 A40.
+|  Model   | zeroth-test | fleurs-ko2en | fleurs-ko2en-cot | fleurs-en2ko | fleurs-en2ko-cot |
+|----------|------------|--------------|------------------|--------------|------------------|
+| original |   195.92   |     5.62     |       2.45       |     6.87     |       4.35       |
+| finetune (this model) |    7.02    |     7.07     |       9.19       |    13.08     |       9.35       |
+## Example script
+```python
+orig_model_path = "microsoft/Phi-4-multimodal-instruct"
+ft_model_path = "seastar105/Phi-4-mm-inst-zeroth-kor"
+generation_config = GenerationConfig.from_pretrained(orig_model_path, 'generation_config.json')
+processor = AutoProcessor.from_pretrained(orig_model_path, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    ft_model_path,
+    trust_remote_code=True,
+    torch_dtype='auto',
+    _attn_implementation='flash_attention_2',
+).cuda()
+user_prompt = '<|user|>'
+assistant_prompt = '<|assistant|>'
+prompt_suffix = '<|end|>'
+# task prompt is from technical report
+asr_prompt = f'{user_prompt}<|audio_1|>Transcribe the audio clip into text.{prompt_suffix}{assistant_prompt}'
+ast_ko_prompt = f'{user_prompt}<|audio_1|>Translate the audio to Korean.{prompt_suffix}{assistant_prompt}'
+ast_cot_ko_prompt = f'{user_prompt}<|audio_1|>Transcribe the audio to text, and then translate the audio to Korean. Use <sep> as a separator between the original transcript and the translation.{prompt_suffix}{assistant_prompt}'
+ast_en_prompt = f'{user_prompt}<|audio_1|>Translate the audio to English.{prompt_suffix}{assistant_prompt}'
+ast_cot_en_prompt = f'{user_prompt}<|audio_1|>Transcribe the audio to text, and then translate the audio to English. Use <sep> as a separator between the original transcript and the translation.{prompt_suffix}{assistant_prompt}'
+asr_ds = load_dataset("kresnik/zeroth_korean", split="test")
+ast_ds = load_dataset("seastar105/fleurs_ko_en_test", split="train")
+# ASR
+item = asr_ds[0]
+audio = (item["audio"]["array"], item["audio"]["sampling_rate"])
+inputs = processor(text=asr_prompt, audios=[audio], return_tensors='pt').to(model.device)
+generate_ids = model.generate(
+    **inputs,
+    max_new_tokens=max_new_tokens,
+    generation_config=generation_config,
+)
+generate_ids = generate_ids[:, inputs['input_ids'].shape[1] :]
+response = processor.batch_decode(
+    generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)[0] # "몬토 킬은 자녀들이 사랑을 제대로 못 받고 크면 매우 심각한 결과가 초래된다는 결론을 내렸습니다"
+# AST, EN -> KO
+item = ast_ds[-1]
+audio = (item["en_audio"]["array"], item["en_audio"]["sampling_rate"])
+inputs = processor(text=ast_en, audios=[audio], return_tensors='pt').to(model.device)
+generate_ids = model.generate(
+    **inputs,
+    max_new_tokens=max_new_tokens,
+    generation_config=generation_config,
+)
+generate_ids = generate_ids[:, inputs['input_ids'].shape[1] :]
+response = processor.batch_decode(
+    generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)[0] # "가장 쉽게 접근 가능한 식물 자원은 잎과 légumes에서 접근 가능한 단백질이었을 것이다가요 하지만 이것들은 고형상 동물처럼 우리에게 소화하기 어렵습니다만 그것들이 끓여 있다면요"
+```