File size: 1,491 Bytes
f1eaffc
02d60e3
 
 
 
 
 
 
 
 
 
afcebe8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1eaffc
02d60e3
f1eaffc
02d60e3
f1eaffc
02d60e3
f1eaffc
 
 
02d60e3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
datasets:
- kresnik/zeroth_korean
metrics:
- bleu
- cer
base_model:
- microsoft/Phi-4-multimodal-instruct
model-index:
- name: Phi-4-mm-inst-zeroth-kor
  results:
  - task:
      type: speech-to-text-translation
    dataset:
      type: seastar105/fleurs_ko_en_test
      name: fleurs (ko-en test intersection)
    metrics:
    - type: bleu
      name: ko2en
      value: To-be-filled
    - type: bleu
      name: ko2en-cot
      value: To-be-filled
    - type: bleu
      name: en2ko (ko-mecab)
      value: To-be-filled
    - type: bleu
      name: en2ko-cot (ko-mecab)
      value: To-be-filled
  - task:
      type: automatic-speech-recognition
    dataset:
      type: kresnik/zeroth_korean
      name: zeroth_korean test
    metrics:
    - type: cer
      name: test CER
      value: To-be-filled
language:
- ko
---
This model is fine-tuned from [microsoft/Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) on [kresnik/zeroth_korean](https://huggingface.co/datasets/kresnik/zeroth_korean) dataset only 1 epoch.

script for fine-tuning is [here](https://gist.github.com/seastar105/d1d8983b27611370528e3b194dcc5577#file-main-py), adapted from phi-4 repository example

model is trained only 174 steps on zeroth train set, and main purpose is to check if only korean ASR training can expand to other speech tasks(e.g. speech-to-text-translation)

## Evaluation

ASR on zeroth-test set and fleurs ko <-> en speech translation result will be filled.