balaji1312 commited on
Commit
7f5d7b4
·
1 Parent(s): a6dd6b8

Update model

Browse files
Files changed (1) hide show
  1. README.md +1390 -2
README.md CHANGED
@@ -1,2 +1,1390 @@
1
- # balaji1312/asr_train_asr_wavlm_transformer_raw_en_bpe1024_valid.acc.best
2
- This model was uploaded to Hugging Face by balaji1312.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language:
7
+ datasets:
8
+ -
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `balaji1312/jibo_kids_wavlm_aed_transformer`
15
+
16
+ This model was trained by using recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+
26
+ pip install -e .
27
+ cd egs2/jibo_kids/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model balaji1312/jibo_kids_wavlm_aed_transformer
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Thu Jan 30 06:18:01 EST 2025`
35
+ - python version: `3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202402`
37
+ - pytorch version: `pytorch 2.4.0`
38
+ - Git hash: `c46aa9a594ff83d52cbf61d84c5650325d1ce527`
39
+ - Commit date: `Sun Oct 13 14:39:31 2024 -0400`
40
+
41
+ ## exp/asr_train_asr_wavlm_transformer_raw_en_bpe1024
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_asr_model_valid.acc.ave/test|1044|3686|34.9|41.4|23.7|20.9|86.0|60.2|
47
+ |decode_asr_asr_model_valid.acc.best/test|1044|3686|56.1|31.4|12.5|8.1|52.0|62.3|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |decode_asr_asr_model_valid.acc.ave/test|1044|16215|54.8|17.3|28.0|25.5|70.7|60.2|
54
+ |decode_asr_asr_model_valid.acc.best/test|1044|16215|75.4|8.1|16.6|9.4|34.1|62.3|
55
+
56
+ ### TER
57
+
58
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
59
+ |---|---|---|---|---|---|---|---|---|
60
+ |decode_asr_asr_model_valid.acc.ave/test|1044|5220|45.9|29.6|24.5|21.3|75.4|60.2|
61
+ |decode_asr_asr_model_valid.acc.best/test|1044|5220|64.5|18.0|17.5|10.4|45.9|62.3|
62
+
63
+ ## exp/asr_train_asr_wavlm_transformer_raw_en_bpe1024/decode_asr_asr_model_valid.acc.best
64
+ ### WER
65
+
66
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
67
+ |---|---|---|---|---|---|---|---|---|
68
+ |org/dev|853|2372|59.8|31.2|8.9|7.2|47.3|64.0|
69
+
70
+ ### CER
71
+
72
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
73
+ |---|---|---|---|---|---|---|---|---|
74
+ |org/dev|853|9855|78.3|7.3|14.3|8.4|30.1|64.0|
75
+
76
+ ### TER
77
+
78
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
79
+ |---|---|---|---|---|---|---|---|---|
80
+ |org/dev|853|3590|68.2|16.2|15.6|6.4|38.3|64.0|
81
+
82
+ ## exp/asr_train_asr_wavlm_transformer_raw_en_bpe1024/decode_asr_asr_model_valid.acc.ave
83
+ ### WER
84
+
85
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
86
+ |---|---|---|---|---|---|---|---|---|
87
+ |org/dev|853|2372|39.9|39.2|21.0|15.8|75.9|60.3|
88
+
89
+ ### CER
90
+
91
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
92
+ |---|---|---|---|---|---|---|---|---|
93
+ |org/dev|853|9855|56.7|15.4|27.9|19.1|62.4|60.3|
94
+
95
+ ### TER
96
+
97
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
98
+ |---|---|---|---|---|---|---|---|---|
99
+ |org/dev|853|3590|51.5|25.8|22.7|13.6|62.1|60.3|
100
+
101
+ ## ASR config
102
+
103
+ <details><summary>expand</summary>
104
+
105
+ ```
106
+ config: conf/tuning/train_asr_wavlm_transformer.yaml
107
+ print_config: false
108
+ log_level: INFO
109
+ drop_last_iter: false
110
+ dry_run: false
111
+ iterator_type: sequence
112
+ valid_iterator_type: null
113
+ output_dir: exp/asr_train_asr_wavlm_transformer_raw_en_bpe1024
114
+ ngpu: 1
115
+ seed: 2022
116
+ num_workers: 4
117
+ num_att_plot: 0
118
+ dist_backend: nccl
119
+ dist_init_method: env://
120
+ dist_world_size: null
121
+ dist_rank: null
122
+ local_rank: 0
123
+ dist_master_addr: null
124
+ dist_master_port: null
125
+ dist_launcher: null
126
+ multiprocessing_distributed: false
127
+ unused_parameters: false
128
+ sharded_ddp: false
129
+ use_deepspeed: false
130
+ deepspeed_config: null
131
+ cudnn_enabled: true
132
+ cudnn_benchmark: false
133
+ cudnn_deterministic: false
134
+ use_tf32: false
135
+ collect_stats: false
136
+ write_collected_feats: false
137
+ max_epoch: 100
138
+ patience: null
139
+ val_scheduler_criterion:
140
+ - valid
141
+ - loss
142
+ early_stopping_criterion:
143
+ - valid
144
+ - loss
145
+ - min
146
+ best_model_criterion:
147
+ - - valid
148
+ - acc
149
+ - max
150
+ keep_nbest_models: 4
151
+ nbest_averaging_interval: 0
152
+ grad_clip: 5.0
153
+ grad_clip_type: 2.0
154
+ grad_noise: false
155
+ accum_grad: 4
156
+ no_forward_run: false
157
+ resume: true
158
+ train_dtype: float32
159
+ use_amp: true
160
+ log_interval: 400
161
+ use_matplotlib: true
162
+ use_tensorboard: true
163
+ create_graph_in_tensorboard: false
164
+ use_wandb: false
165
+ wandb_project: null
166
+ wandb_id: null
167
+ wandb_entity: null
168
+ wandb_name: null
169
+ wandb_model_log_interval: -1
170
+ detect_anomaly: false
171
+ use_adapter: false
172
+ adapter: lora
173
+ save_strategy: all
174
+ adapter_conf: {}
175
+ pretrain_path: null
176
+ init_param: []
177
+ ignore_init_mismatch: false
178
+ freeze_param:
179
+ - frontend.upstream
180
+ num_iters_per_epoch: null
181
+ batch_size: 20
182
+ valid_batch_size: null
183
+ batch_bins: 1200000
184
+ valid_batch_bins: null
185
+ category_sample_size: 10
186
+ train_shape_file:
187
+ - exp/asr_stats_raw_en_bpe1024/train/speech_shape
188
+ - exp/asr_stats_raw_en_bpe1024/train/text_shape.bpe
189
+ valid_shape_file:
190
+ - exp/asr_stats_raw_en_bpe1024/valid/speech_shape
191
+ - exp/asr_stats_raw_en_bpe1024/valid/text_shape.bpe
192
+ batch_type: numel
193
+ valid_batch_type: null
194
+ fold_length:
195
+ - 80000
196
+ - 150
197
+ sort_in_batch: descending
198
+ shuffle_within_batch: false
199
+ sort_batch: descending
200
+ multiple_iterator: false
201
+ chunk_length: 500
202
+ chunk_shift_ratio: 0.5
203
+ num_cache_chunks: 1024
204
+ chunk_excluded_key_prefixes: []
205
+ chunk_default_fs: null
206
+ chunk_max_abs_length: null
207
+ chunk_discard_short_samples: true
208
+ train_data_path_and_name_and_type:
209
+ - - dump/raw/train/wav.scp
210
+ - speech
211
+ - sound
212
+ - - dump/raw/train/text
213
+ - text
214
+ - text
215
+ valid_data_path_and_name_and_type:
216
+ - - dump/raw/dev/wav.scp
217
+ - speech
218
+ - sound
219
+ - - dump/raw/dev/text
220
+ - text
221
+ - text
222
+ multi_task_dataset: false
223
+ allow_variable_data_keys: false
224
+ max_cache_size: 0.0
225
+ max_cache_fd: 32
226
+ allow_multi_rates: false
227
+ valid_max_cache_size: null
228
+ exclude_weight_decay: false
229
+ exclude_weight_decay_conf: {}
230
+ optim: adam
231
+ optim_conf:
232
+ lr: 0.002
233
+ weight_decay: 1.0e-06
234
+ scheduler: warmuplr
235
+ scheduler_conf:
236
+ warmup_steps: 15000
237
+ token_list:
238
+ - <blank>
239
+ - <unk>
240
+ - .
241
+ - ▁I
242
+ - ▁AND
243
+ - ''''
244
+ - ▁A
245
+ - ▁YOU
246
+ - S
247
+ - ▁IT
248
+ - T
249
+ - ▁TO
250
+ - ▁THE
251
+ - ▁LIKE
252
+ - ▁THAT
253
+ - ▁NO
254
+ - ▁BECAUSE
255
+ - ▁ONE
256
+ - ▁THEN
257
+ - ▁DON
258
+ - ▁TEETH
259
+ - ▁TWO
260
+ - ▁FIVE
261
+ - ▁KNOW
262
+ - ▁MY
263
+ - ▁SO
264
+ - ▁YOUR
265
+ - ▁IS
266
+ - ▁THEM
267
+ - ▁DO
268
+ - ▁SIX
269
+ - ▁THREE
270
+ - ▁G
271
+ - ▁U
272
+ - ▁TEN
273
+ - ▁FOUR
274
+ - ▁GET
275
+ - ▁O
276
+ - ▁K
277
+ - ▁B
278
+ - ▁L
279
+ - ▁N
280
+ - ▁S
281
+ - ▁E
282
+ - ▁M
283
+ - ▁BRUSH
284
+ - ▁THIS
285
+ - ▁T
286
+ - ▁CAN
287
+ - ▁SEVEN
288
+ - ▁EIGHT
289
+ - ▁C
290
+ - ▁HAVE
291
+ - ▁PUT
292
+ - ▁MAKE
293
+ - ▁W
294
+ - ▁J
295
+ - ▁F
296
+ - ▁IN
297
+ - ▁P
298
+ - ▁NINE
299
+ - ▁Y
300
+ - ▁D
301
+ - ▁V
302
+ - ▁OKAY
303
+ - ▁Q
304
+ - ▁Z
305
+ - ▁ZERO
306
+ - ▁IF
307
+ - ▁H
308
+ - ▁WHAT
309
+ - ▁COUNT
310
+ - ING
311
+ - ▁R
312
+ - ▁X
313
+ - ▁OF
314
+ - ▁HOW
315
+ - ▁
316
+ - ▁WANT
317
+ - ▁COLOR
318
+ - ▁JUST
319
+ - ▁WITH
320
+ - ▁ON
321
+ - N
322
+ - ▁AN
323
+ - ▁MIX
324
+ - ▁COLORS
325
+ - ▁THEY
326
+ - ▁YEAH
327
+ - ▁YES
328
+ - ▁UP
329
+ - ▁BLUE
330
+ - ▁BY
331
+ - ▁GO
332
+ - M
333
+ - ▁THERE
334
+ - ▁ALL
335
+ - ▁OR
336
+ - ▁CLEAN
337
+ - ED
338
+ - ▁SEE
339
+ - ▁BUT
340
+ - ▁USE
341
+ - ▁FOR
342
+ - ▁BE
343
+ - ▁TOOTHPASTE
344
+ - ▁WAS
345
+ - ▁UM
346
+ - ▁LETTER
347
+ - ▁NEED
348
+ - ▁HE
349
+ - ▁WILL
350
+ - ▁PLUS
351
+ - ▁DOG
352
+ - ▁RED
353
+ - RE
354
+ - ▁PURPLE
355
+ - ▁NOT
356
+ - ▁CAVITIES
357
+ - ▁OH
358
+ - ▁ARE
359
+ - ▁THINK
360
+ - ▁WHY
361
+ - ▁SHE
362
+ - ▁DID
363
+ - ▁HAT
364
+ - Y
365
+ - ▁PAINT
366
+ - ▁BRUSHING
367
+ - ▁BOX
368
+ - ▁TOOTHBRUSH
369
+ - ▁SICK
370
+ - ▁OUT
371
+ - ▁ME
372
+ - ▁JUG
373
+ - ▁DOES
374
+ - ▁FLU
375
+ - ▁MAKES
376
+ - ▁WIG
377
+ - ▁SH
378
+ - ▁MAN
379
+ - ▁WE
380
+ - ▁MORE
381
+ - OULD
382
+ - ▁PLAY
383
+ - ▁SOME
384
+ - ▁JIBO
385
+ - ▁GREEN
386
+ - ▁VAN
387
+ - ▁NUMBER
388
+ - ▁YELLOW
389
+ - ▁REALLY
390
+ - D
391
+ - ▁WHITE
392
+ - ▁PINK
393
+ - ▁WATER
394
+ - ▁QUIZ
395
+ - ▁NOW
396
+ - ▁UH
397
+ - ▁DIFFERENT
398
+ - ▁RIGHT
399
+ - IND
400
+ - ▁SAY
401
+ - ▁TREE
402
+ - LL
403
+ - CH
404
+ - ▁HELP
405
+ - ▁HUNDRED
406
+ - ▁LOOK
407
+ - ▁COULD
408
+ - ▁COUNTING
409
+ - ▁WAY
410
+ - ▁MAYBE
411
+ - ▁EASY
412
+ - ▁WOULD
413
+ - ▁BLACK
414
+ - ▁TAKE
415
+ - ▁HER
416
+ - ▁LI
417
+ - E
418
+ - TTLE
419
+ - F
420
+ - ▁AL
421
+ - ▁THING
422
+ - ▁ELSE
423
+ - ▁WELL
424
+ - LY
425
+ - ▁TOGETHER
426
+ - ▁WHEN
427
+ - ▁SIDE
428
+ - ▁CAVITY
429
+ - ▁FIRST
430
+ - ▁DOWN
431
+ - ▁DAY
432
+ - ▁OTHER
433
+ - ▁HERE
434
+ - ▁CUBES
435
+ - ▁COUNTED
436
+ - ▁EVERY
437
+ - ▁SA
438
+ - ▁TELL
439
+ - ▁DAD
440
+ - ▁ORANGE
441
+ - ▁SAME
442
+ - ▁SOMETIMES
443
+ - ▁MANY
444
+ - OTHER
445
+ - ID
446
+ - ▁WON
447
+ - ▁BIT
448
+ - ▁HI
449
+ - ▁TOO
450
+ - ▁TIME
451
+ - UH
452
+ - ▁WAIT
453
+ - ▁NOTHING
454
+ - ▁FALL
455
+ - ▁NAME
456
+ - ▁LOT
457
+ - ▁THAN
458
+ - ▁EH
459
+ - ▁MEAN
460
+ - ▁NEW
461
+ - W
462
+ - H
463
+ - ▁TOOTH
464
+ - ER
465
+ - ▁FLOSS
466
+ - ▁START
467
+ - ▁BROWN
468
+ - ▁STACK
469
+ - ▁NOPE
470
+ - ▁GOOD
471
+ - A
472
+ - L
473
+ - ▁LET
474
+ - ▁WHI
475
+ - O
476
+ - ▁ALREADY
477
+ - ▁INAUDIBLE
478
+ - ▁MOUTH
479
+ - ▁EAT
480
+ - ▁HAS
481
+ - ▁DONE
482
+ - ▁THOSE
483
+ - ▁BETTER
484
+ - ▁FUN
485
+ - ▁GERMS
486
+ - TO
487
+ - ▁UMM
488
+ - CK
489
+ - SO
490
+ - EVEN
491
+ - ▁WASH
492
+ - ▁ACTUALLY
493
+ - ▁DRINK
494
+ - ▁FRIEND
495
+ - ▁REMEMBER
496
+ - ▁SUGAR
497
+ - ▁SOMETHING
498
+ - ▁HARD
499
+ - ▁COME
500
+ - ▁PAINTING
501
+ - ▁SPI
502
+ - ▁AT
503
+ - I
504
+ - TER
505
+ - ▁MUCH
506
+ - ▁GUESS
507
+ - ▁HIM
508
+ - ▁HA
509
+ - IGHT
510
+ - Z
511
+ - ▁FRO
512
+ - ▁IMPORTANT
513
+ - ▁AGAIN
514
+ - ▁STUFF
515
+ - ▁BACK
516
+ - ▁BUGS
517
+ - ▁NIGHT
518
+ - ▁ADD
519
+ - G
520
+ - ▁EA
521
+ - HIS
522
+ - K
523
+ - EVER
524
+ - ▁TH
525
+ - ▁DARK
526
+ - ▁FORGOT
527
+ - ▁MOM
528
+ - BODY
529
+ - ▁UHHUH
530
+ - ▁BAD
531
+ - ▁TURN
532
+ - ▁ANY
533
+ - AH
534
+ - EL
535
+ - U
536
+ - AKING
537
+ - VERY
538
+ - ▁GONNA
539
+ - ▁FOUAH
540
+ - ▁SURE
541
+ - ▁PULL
542
+ - ▁LONG
543
+ - ▁KEEP
544
+ - ES
545
+ - P
546
+ - ▁WAYS
547
+ - TING
548
+ - ALLY
549
+ - VE
550
+ - ONE
551
+ - ▁QUESTION
552
+ - ▁PAPER
553
+ - ▁STU
554
+ - YTHING
555
+ - ▁SHOW
556
+ - ▁CALLED
557
+ - ▁LOVE
558
+ - ▁MM
559
+ - ▁TRY
560
+ - ▁BYE
561
+ - ▁TOP
562
+ - LD
563
+ - ▁MMM
564
+ - ▁PE
565
+ - ▁NUMBERS
566
+ - BLE
567
+ - PLE
568
+ - ▁CUBE
569
+ - OUT
570
+ - R
571
+ - ▁BOTTOM
572
+ - ▁FAVORITE
573
+ - ▁SPANISH
574
+ - ▁TONGUE
575
+ - ▁SCHOOL
576
+ - ▁TWENTY
577
+ - ▁MHM
578
+ - ▁FRONT
579
+ - ▁STAY
580
+ - ▁SPELL
581
+ - ▁TEEF
582
+ - ▁LAST
583
+ - ▁GUM
584
+ - ▁HOLD
585
+ - TY
586
+ - ▁GROUPS
587
+ - ▁OFF
588
+ - ▁EQUALS
589
+ - ▁FINGERS
590
+ - ▁QUI
591
+ - RAB
592
+ - ▁MEANS
593
+ - AW
594
+ - ▁UHH
595
+ - IT
596
+ - WEE
597
+ - ▁CH
598
+ - ▁AM
599
+ - ▁SI
600
+ - RY
601
+ - SIX
602
+ - ▁WI
603
+ - ▁BEAUTIFUL
604
+ - ▁DENTIST
605
+ - ▁HEALTHY
606
+ - ▁HURT
607
+ - ▁ZEWO
608
+ - ▁KNEW
609
+ - ▁MATH
610
+ - ▁BOY
611
+ - ▁HOLE
612
+ - ▁DIRTY
613
+ - ▁YET
614
+ - ▁EX
615
+ - ▁STARTED
616
+ - ▁LIGHT
617
+ - ▁THESE
618
+ - ▁CU
619
+ - B
620
+ - ▁THINGS
621
+ - ▁GRA
622
+ - ▁WHO
623
+ - ▁TWOS
624
+ - ▁CIRCLE
625
+ - ▁YO
626
+ - ▁FINGER
627
+ - ▁BA
628
+ - CE
629
+ - OTH
630
+ - X
631
+ - IR
632
+ - MOST
633
+ - ▁LEARN
634
+ - FIVE
635
+ - CI
636
+ - ▁ANSWER
637
+ - ▁EASIER
638
+ - ▁LAUGHS
639
+ - ▁MORNING
640
+ - ▁MOUTHWASH
641
+ - ▁PICTURE
642
+ - ▁RINSE
643
+ - ▁FORGET
644
+ - ▁SISTER
645
+ - ▁THOUGH
646
+ - ▁TALKING
647
+ - ▁GROW
648
+ - ▁WHERE
649
+ - ▁MINUTES
650
+ - ▁SUP
651
+ - ▁WISH
652
+ - ▁OUR
653
+ - ▁STI
654
+ - ▁FLOSSING
655
+ - SIC
656
+ - EPT
657
+ - ▁BIG
658
+ - PER
659
+ - ▁AH
660
+ - TH
661
+ - TEN
662
+ - EN
663
+ - ▁FAI
664
+ - ▁ONES
665
+ - ▁EQUAL
666
+ - ▁SP
667
+ - KAY
668
+ - SIDE
669
+ - WAYS
670
+ - ▁AROUND
671
+ - ▁PRETTY
672
+ - ▁RAINBOW
673
+ - ▁VIOLET
674
+ - ▁LEFT
675
+ - ▁GIRL
676
+ - ▁SENSE
677
+ - ▁SOUND
678
+ - ▁EYES
679
+ - ▁EVERYTHING
680
+ - ▁GUY
681
+ - ▁SHINY
682
+ - ▁ELEVEN
683
+ - ▁READY
684
+ - ▁STICK
685
+ - ▁FROG
686
+ - ▁FOOD
687
+ - ▁KEY
688
+ - DE
689
+ - ▁PL
690
+ - ▁PART
691
+ - OVE
692
+ - ▁PR
693
+ - ▁ROT
694
+ - ▁TEE
695
+ - ▁WERE
696
+ - VER
697
+ - ▁DIS
698
+ - ▁HEY
699
+ - USH
700
+ - OH
701
+ - IN
702
+ - ISH
703
+ - OVER
704
+ - EEN
705
+ - ▁MIND
706
+ - ▁AB
707
+ - SE
708
+ - SH
709
+ - DENTAL
710
+ - OOL
711
+ - ET
712
+ - AR
713
+ - ICK
714
+ - NA
715
+ - ENT
716
+ - ▁BU
717
+ - AT
718
+ - UNTI
719
+ - OW
720
+ - OK
721
+ - ▁EL
722
+ - ▁MA
723
+ - ▁QU
724
+ - ▁WOR
725
+ - ▁SIN
726
+ - AKE
727
+ - AND
728
+ - ▁PRETEND
729
+ - ▁BUS
730
+ - ▁PLA
731
+ - ▁CALL
732
+ - ▁ONETWOTHREEFOUR
733
+ - ▁CLASS
734
+ - ▁CONNECT
735
+ - ▁DISCOVER
736
+ - ▁HOUSE
737
+ - ▁RABBIT
738
+ - ▁SQUEEZE
739
+ - ▁THOUSAND
740
+ - ▁ROBOT
741
+ - ▁SCRUB
742
+ - ▁SMELL
743
+ - EXT
744
+ - ▁BROTHER
745
+ - ▁PILE
746
+ - ▁BOTTLE
747
+ - ▁PAINTBRUSH
748
+ - IMA
749
+ - ▁CROCODILE
750
+ - ▁JUMP
751
+ - ▁CANNOT
752
+ - ▁TWICE
753
+ - ▁STOP
754
+ - UNCH
755
+ - ▁SKIN
756
+ - ▁TUR
757
+ - ▁MOVING
758
+ - IES
759
+ - ▁FAST
760
+ - ▁PRETENDING
761
+ - EEP
762
+ - ▁SHAKING
763
+ - ▁MAY
764
+ - ▁FAKE
765
+ - ▁AWAY
766
+ - ▁DI
767
+ - ▁HAPP
768
+ - ▁DUH
769
+ - OO
770
+ - ▁JUH
771
+ - LE
772
+ - ▁HUH
773
+ - ▁BUH
774
+ - BOOK
775
+ - WENT
776
+ - ▁CA
777
+ - OSE
778
+ - EM
779
+ - IC
780
+ - AG
781
+ - ▁LETTERS
782
+ - IS
783
+ - EW
784
+ - ONG
785
+ - V
786
+ - AL
787
+ - PAY
788
+ - REE
789
+ - EE
790
+ - ▁TIMES
791
+ - ▁SPIN
792
+ - UR
793
+ - CU
794
+ - GER
795
+ - ▁TR
796
+ - ▁AW
797
+ - UGH
798
+ - UT
799
+ - ▁BL
800
+ - ▁SL
801
+ - ▁FORT
802
+ - ▁GE
803
+ - EA
804
+ - ▁TA
805
+ - GU
806
+ - ▁FINISH
807
+ - ▁UN
808
+ - READ
809
+ - THER
810
+ - DAY
811
+ - ▁BLA
812
+ - ▁ARTIST
813
+ - ▁BACKWARDS
814
+ - ▁DOCTOR
815
+ - ▁DREAMS
816
+ - ▁EXPLA
817
+ - ▁MIDDLE
818
+ - ▁MOUSE
819
+ - ▁PROB
820
+ - ▁RINSING
821
+ - ▁STRAIGHT
822
+ - ▁SUNFLOWER
823
+ - ▁TOOTHPICK
824
+ - ▁TWELVE
825
+ - ▁VULTURE
826
+ - ▁CONFUS
827
+ - TION
828
+ - ▁HOME
829
+ - ▁OPEN
830
+ - ▁SORRY
831
+ - ▁BORING
832
+ - ▁MINE
833
+ - ▁ENOUGH
834
+ - ▁HELLO
835
+ - ▁BORED
836
+ - RITE
837
+ - ▁TOWER
838
+ - ▁BUIL
839
+ - ▁ODD
840
+ - ▁UNP
841
+ - ▁APPLY
842
+ - ▁ANYMORE
843
+ - ▁FOUW
844
+ - APE
845
+ - OUNT
846
+ - ▁FIFT
847
+ - ▁ZEBRA
848
+ - ▁LION
849
+ - ▁BLAH
850
+ - ▁BLOCK
851
+ - ▁COP
852
+ - ▁HMM
853
+ - ▁ASK
854
+ - ▁BAB
855
+ - ▁DARKER
856
+ - ▁HEAR
857
+ - ▁CHO
858
+ - ▁CLOSE
859
+ - ▁JACK
860
+ - ▁FULL
861
+ - ▁CUP
862
+ - ▁WHE
863
+ - ▁IDEA
864
+ - ▁PIRATES
865
+ - ▁SPE
866
+ - ▁HEAD
867
+ - ▁GIVE
868
+ - ▁END
869
+ - DER
870
+ - ▁HAND
871
+ - ▁BOXES
872
+ - ▁BEST
873
+ - ▁LEARNING
874
+ - ▁MESS
875
+ - ▁MOST
876
+ - ▁FLA
877
+ - LIT
878
+ - ▁AC
879
+ - ▁AHW
880
+ - ▁FUH
881
+ - ▁LU
882
+ - ▁SSS
883
+ - OWN
884
+ - ▁PUH
885
+ - ▁PW
886
+ - POS
887
+ - ▁CIRCLES
888
+ - ENS
889
+ - LK
890
+ - ▁PLAYING
891
+ - AIL
892
+ - AP
893
+ - PIT
894
+ - NG
895
+ - ▁LETTU
896
+ - IK
897
+ - DDING
898
+ - HH
899
+ - PPER
900
+ - ▁GW
901
+ - ABL
902
+ - OL
903
+ - ▁KID
904
+ - DING
905
+ - ▁KA
906
+ - ERS
907
+ - ▁FI
908
+ - LIP
909
+ - ▁SE
910
+ - ▁TREES
911
+ - UN
912
+ - ▁RO
913
+ - ATE
914
+ - ND
915
+ - ▁FO
916
+ - ICE
917
+ - IF
918
+ - HW
919
+ - AY
920
+ - ▁BIGGE
921
+ - UST
922
+ - ▁DE
923
+ - ▁KI
924
+ - ▁LOS
925
+ - ▁THA
926
+ - ▁PAN
927
+ - IL
928
+ - MB
929
+ - ▁BOO
930
+ - SPE
931
+ - 'NO'
932
+ - ACK
933
+ - ▁FIN
934
+ - C
935
+ - ▁GROUP
936
+ - ▁GERM
937
+ - EAD
938
+ - ▁SOMETIME
939
+ - LZ
940
+ - IVE
941
+ - UP
942
+ - TWO
943
+ - HIRT
944
+ - HRO
945
+ - JELLYFISH
946
+ - ▁PAR
947
+ - PART
948
+ - IBO
949
+ - WHAT
950
+ - KEY
951
+ - FOUR
952
+ - AME
953
+ - ANGE
954
+ - EC
955
+ - TIME
956
+ - ▁REAL
957
+ - ELEPHANT
958
+ - ▁BATHROOM
959
+ - ▁BIVY
960
+ - ▁BRACES
961
+ - ▁FLOWER
962
+ - ▁GARFIELD
963
+ - ▁GARGLE
964
+ - ▁KOALA
965
+ - ▁PROBLEMS
966
+ - ▁SEVENEIGHTNINE
967
+ - ▁STINKY
968
+ - ▁SWORD
969
+ - ▁UPPERCASE
970
+ - EMBER
971
+ - FUL
972
+ - ▁SEPARAT
973
+ - ▁BEFORE
974
+ - ▁BROKE
975
+ - ▁LOUD
976
+ - ▁MONSTER
977
+ - ▁MOUF
978
+ - ▁POOP
979
+ - ▁SHINNY
980
+ - ▁DRAW
981
+ - ▁MAILBOX
982
+ - ▁HUNGRY
983
+ - ▁BREAK
984
+ - ▁SARA
985
+ - ▁JOB
986
+ - ▁WATCH
987
+ - ▁SPARKL
988
+ - ▁SHORT
989
+ - ▁WEEK
990
+ - ▁BIRD
991
+ - ▁MOMMY
992
+ - ▁LOOSE
993
+ - ▁GREAT
994
+ - ▁PRETTIER
995
+ - ▁SMIL
996
+ - ▁FACE
997
+ - ▁HAV
998
+ - ▁PIECE
999
+ - ▁FUNNY
1000
+ - ▁UNDER
1001
+ - ▁SLOWER
1002
+ - ACT
1003
+ - ▁PLEA
1004
+ - ▁VEHA
1005
+ - ▁PEAR
1006
+ - ▁FEEL
1007
+ - ▁SPIDER
1008
+ - ▁WORSE
1009
+ - ▁SWI
1010
+ - ▁AYE
1011
+ - UNU
1012
+ - ▁EVER
1013
+ - ▁HOPE
1014
+ - ▁SIGN
1015
+ - AK
1016
+ - UIZ
1017
+ - ▁SOFT
1018
+ - ▁POP
1019
+ - ▁TEEH
1020
+ - ▁DEH
1021
+ - IBLE
1022
+ - ▁SIDEWAYS
1023
+ - ROT
1024
+ - ▁ORDER
1025
+ - ▁FINISHED
1026
+ - ▁JELLYFISH
1027
+ - ▁FELL
1028
+ - KEU
1029
+ - ▁IMPO
1030
+ - HEAD
1031
+ - UM
1032
+ - ▁PRESS
1033
+ - ▁SECONDS
1034
+ - ▁LEA
1035
+ - ▁MOLD
1036
+ - LLUH
1037
+ - ▁READ
1038
+ - ▁ONETWO
1039
+ - ▁LINE
1040
+ - FE
1041
+ - ▁FOH
1042
+ - ▁HOT
1043
+ - ▁FOU
1044
+ - ▁MOH
1045
+ - ▁DEN
1046
+ - ▁WIN
1047
+ - ▁NINETY
1048
+ - IRTY
1049
+ - ▁TWEE
1050
+ - OUR
1051
+ - IRED
1052
+ - TLE
1053
+ - ▁HEH
1054
+ - ▁JU
1055
+ - PASTE
1056
+ - ▁FEVER
1057
+ - ▁WR
1058
+ - ▁PAI
1059
+ - MINT
1060
+ - TEEN
1061
+ - ▁WASHING
1062
+ - ▁BI
1063
+ - ▁NAH
1064
+ - DY
1065
+ - ▁RA
1066
+ - ▁DA
1067
+ - AHW
1068
+ - ▁YUH
1069
+ - ULL
1070
+ - ▁WL
1071
+ - UHTY
1072
+ - ▁SHO
1073
+ - ▁CUH
1074
+ - ASTE
1075
+ - OOD
1076
+ - ▁LAM
1077
+ - ▁CI
1078
+ - OLD
1079
+ - UNN
1080
+ - NUH
1081
+ - OCK
1082
+ - US
1083
+ - ▁SM
1084
+ - MPLE
1085
+ - ▁HIT
1086
+ - ▁THRO
1087
+ - ▁DEU
1088
+ - HOLE
1089
+ - ▁THINKING
1090
+ - UBB
1091
+ - ▁FU
1092
+ - ▁PI
1093
+ - ▁SMO
1094
+ - ▁VO
1095
+ - AN
1096
+ - UG
1097
+ - ▁HM
1098
+ - UE
1099
+ - GLE
1100
+ - ▁MOV
1101
+ - LI
1102
+ - ▁BLU
1103
+ - PORT
1104
+ - ▁WED
1105
+ - ▁TRI
1106
+ - ▁CHE
1107
+ - CA
1108
+ - ▁SC
1109
+ - ▁STO
1110
+ - ▁BED
1111
+ - ▁TELLS
1112
+ - ▁MI
1113
+ - OR
1114
+ - TTER
1115
+ - NES
1116
+ - OUGH
1117
+ - ▁AR
1118
+ - ROW
1119
+ - UA
1120
+ - AB
1121
+ - IG
1122
+ - OF
1123
+ - MAN
1124
+ - RK
1125
+ - OUN
1126
+ - ROUGH
1127
+ - LUH
1128
+ - DENT
1129
+ - ▁PIE
1130
+ - LAP
1131
+ - KUH
1132
+ - OT
1133
+ - RSE
1134
+ - ▁LA
1135
+ - ▁PAST
1136
+ - ▁ANOTH
1137
+ - OP
1138
+ - EP
1139
+ - ▁LATE
1140
+ - AM
1141
+ - LU
1142
+ - ▁WOO
1143
+ - HUH
1144
+ - ▁CER
1145
+ - OU
1146
+ - IPP
1147
+ - ▁CO
1148
+ - EH
1149
+ - TE
1150
+ - WHERE
1151
+ - ASH
1152
+ - PPY
1153
+ - WAY
1154
+ - RO
1155
+ - SHE
1156
+ - OST
1157
+ - AIN
1158
+ - ▁SECOND
1159
+ - ▁PIRATE
1160
+ - ▁MINUTE
1161
+ - ABET
1162
+ - ▁DIFFEREN
1163
+ - BE
1164
+ - IGH
1165
+ - ▁COO
1166
+ - ▁WHA
1167
+ - BRUSH
1168
+ - TRA
1169
+ - ▁PRES
1170
+ - ▁TRYIN
1171
+ - ▁GIV
1172
+ - OPE
1173
+ - SHIN
1174
+ - STRO
1175
+ - SIGN
1176
+ - ▁PLU
1177
+ - ZEBRA
1178
+ - LION
1179
+ - CROCODILE
1180
+ - LATE
1181
+ - UF
1182
+ - EQUALS
1183
+ - COME
1184
+ - UBE
1185
+ - J
1186
+ - TOGETHER
1187
+ - MAYBE
1188
+ - BOX
1189
+ - CLEAN
1190
+ - THEY
1191
+ - JIBO
1192
+ - EASY
1193
+ - MOUTH
1194
+ - ▁TALK
1195
+ - ▁SKI
1196
+ - WHY
1197
+ - WICE
1198
+ - CAUSE
1199
+ - UMP
1200
+ - TRIC
1201
+ - ▁CLOS
1202
+ - ▁SEVE
1203
+ - ▁DIRT
1204
+ - ▁NUMB
1205
+ - YOU
1206
+ - PRI
1207
+ - ▁JIB
1208
+ - ETTIER
1209
+ - FFERENT
1210
+ - ERCASE
1211
+ - ROOM
1212
+ - ▁DIFF
1213
+ - ▁JELLY
1214
+ - ▁SEVENEIGHT
1215
+ - ORGE
1216
+ - ▁YELL
1217
+ - DRA
1218
+ - ▁SLOW
1219
+ - ▁MON
1220
+ - ▁BUG
1221
+ - YPE
1222
+ - ▁BRU
1223
+ - COL
1224
+ - PUS
1225
+ - WO
1226
+ - INET
1227
+ - NGRY
1228
+ - BRUSHING
1229
+ - ▁CUB
1230
+ - OCTO
1231
+ - HIC
1232
+ - UDE
1233
+ - RUB
1234
+ - MOR
1235
+ - LOCK
1236
+ - ▁BR
1237
+ - YOUR
1238
+ - ▁STR
1239
+ - ▁KNE
1240
+ - ▁CRO
1241
+ - ▁BO
1242
+ - UALLY
1243
+ - ▁TOOTHB
1244
+ - ▁ANYMO
1245
+ - UKU
1246
+ - ▁GUE
1247
+ - MA
1248
+ - ENTY
1249
+ - PHA
1250
+ - ▁QUE
1251
+ - PF
1252
+ - KE
1253
+ - NOW
1254
+ - ▁LAS
1255
+ - ▁SHIN
1256
+ - ARN
1257
+ - GE
1258
+ - ▁MAIL
1259
+ - RUSHING
1260
+ - Q
1261
+ - <sos/eos>
1262
+ init: null
1263
+ input_size: null
1264
+ ctc_conf:
1265
+ dropout_rate: 0.0
1266
+ ctc_type: builtin
1267
+ reduce: true
1268
+ ignore_nan_grad: null
1269
+ zero_infinity: true
1270
+ brctc_risk_strategy: exp
1271
+ brctc_group_strategy: end
1272
+ brctc_risk_factor: 0.0
1273
+ joint_net_conf: null
1274
+ use_preprocessor: true
1275
+ use_lang_prompt: false
1276
+ use_nlp_prompt: false
1277
+ token_type: bpe
1278
+ bpemodel: data/en_token_list/bpe_unigram1024/bpe.model
1279
+ non_linguistic_symbols: null
1280
+ cleaner: null
1281
+ g2p: null
1282
+ speech_volume_normalize: null
1283
+ rir_scp: null
1284
+ rir_apply_prob: 1.0
1285
+ noise_scp: null
1286
+ noise_apply_prob: 1.0
1287
+ noise_db_range: '13_15'
1288
+ short_noise_thres: 0.5
1289
+ aux_ctc_tasks: []
1290
+ frontend: s3prl
1291
+ frontend_conf:
1292
+ frontend_conf:
1293
+ upstream: wavlm_large
1294
+ download_dir: ./hub
1295
+ multilayer_feature: true
1296
+ fs: 16k
1297
+ specaug: specaug
1298
+ specaug_conf:
1299
+ apply_time_warp: true
1300
+ time_warp_window: 5
1301
+ time_warp_mode: bicubic
1302
+ apply_freq_mask: true
1303
+ freq_mask_width_range:
1304
+ - 0
1305
+ - 27
1306
+ num_freq_mask: 2
1307
+ apply_time_mask: true
1308
+ time_mask_width_ratio_range:
1309
+ - 0.0
1310
+ - 0.05
1311
+ num_time_mask: 5
1312
+ normalize: utterance_mvn
1313
+ normalize_conf: {}
1314
+ model: espnet
1315
+ model_conf:
1316
+ ctc_weight: 0.3
1317
+ lsm_weight: 0.1
1318
+ length_normalized_loss: false
1319
+ extract_feats_in_collect_stats: false
1320
+ preencoder: linear
1321
+ preencoder_conf:
1322
+ input_size: 1024
1323
+ output_size: 80
1324
+ encoder: transformer
1325
+ encoder_conf:
1326
+ output_size: 256
1327
+ attention_heads: 4
1328
+ linear_units: 1024
1329
+ num_blocks: 18
1330
+ dropout_rate: 0.1
1331
+ positional_dropout_rate: 0.1
1332
+ attention_dropout_rate: 0.1
1333
+ input_layer: conv2d2
1334
+ normalize_before: true
1335
+ postencoder: null
1336
+ postencoder_conf: {}
1337
+ decoder: transformer
1338
+ decoder_conf:
1339
+ attention_heads: 4
1340
+ linear_units: 2048
1341
+ num_blocks: 6
1342
+ dropout_rate: 0.1
1343
+ positional_dropout_rate: 0.1
1344
+ self_attention_dropout_rate: 0.1
1345
+ src_attention_dropout_rate: 0.1
1346
+ preprocessor: default
1347
+ preprocessor_conf: {}
1348
+ required:
1349
+ - output_dir
1350
+ - token_list
1351
+ version: '202402'
1352
+ distributed: false
1353
+ ```
1354
+
1355
+ </details>
1356
+
1357
+
1358
+
1359
+ ### Citing ESPnet
1360
+
1361
+ ```BibTex
1362
+ @inproceedings{watanabe2018espnet,
1363
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
1364
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
1365
+ year={2018},
1366
+ booktitle={Proceedings of Interspeech},
1367
+ pages={2207--2211},
1368
+ doi={10.21437/Interspeech.2018-1456},
1369
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
1370
+ }
1371
+
1372
+
1373
+
1374
+
1375
+
1376
+
1377
+ ```
1378
+
1379
+ or arXiv:
1380
+
1381
+ ```bibtex
1382
+ @misc{watanabe2018espnet,
1383
+ title={ESPnet: End-to-End Speech Processing Toolkit},
1384
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
1385
+ year={2018},
1386
+ eprint={1804.00015},
1387
+ archivePrefix={arXiv},
1388
+ primaryClass={cs.CL}
1389
+ }
1390
+ ```