openai
/

whisper-large

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions

ArthurZ HF Staff commited on Oct 7, 2022

Commit

7e3649f

·

1 Parent(s): e00fe1d

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -210,15 +210,14 @@ The "<|en|>" token is used to specify that the speech is in english and should b
 >>> # load dummy dataset and read soundfiles
 >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
->>> # tokenize
 >>> input_features = processor(ds[0]["audio"]["array"], return_tensors="pt").input_features
->>> # retrieve logits
->>> logits = model(input_features).logits
 >>> # take argmax and decode
 >>> predicted_ids = torch.argmax(logits, dim=-1)
 >>> transcription = processor.batch_decode(predicted_ids)
-['<|startoftranscript|><|en|><|notimestamps|> Mr']
 ```
 ### French to French

 >>> # load dummy dataset and read soundfiles
 >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
 >>> input_features = processor(ds[0]["audio"]["array"], return_tensors="pt").input_features
+>>> # Generate logits
+>>> logits = model(input_features, decoder_input_ids = torch.tensor([[50258]]).logits
 >>> # take argmax and decode
 >>> predicted_ids = torch.argmax(logits, dim=-1)
 >>> transcription = processor.batch_decode(predicted_ids)
+['<|en|>']
 ```
 ### French to French