clr commited on
Commit
aaec87b
·
1 Parent(s): 3c81006

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +17 -14
app.py CHANGED
@@ -64,20 +64,20 @@ def f1(langname,lang_aligner):
64
  bl = gr.Blocks()
65
 
66
  with bl:
67
-
 
 
 
 
 
 
68
  with gr.Row():
 
69
  gr.Markdown(
70
  """
71
- # Demo under construction
72
- ## 1. Choose a language to load
73
- ## 2. See a small sample of the selected corpus
74
- ## 3. Click the button below to view time-aligned prosody information for a random example (from the whole corpus, not necessarily the shown sample)
75
-
76
- Pitch is shown in dark blue and loudness is the light orange line.
77
- The pitch estimation, and the time-alignment of words to audio, are completely automated and there will be some inaccuracy.
78
- More information below.
79
  """ )
80
- lloadr = gr.Dropdown(["Faroese", "Icelandic"], label="Language")#, info="Loading the dataset takes some time")
81
 
82
  align_func = gr.State()#value=ctcalign.aligner(model_path="carlosdanielhernandezmena/wav2vec2-large-xlsr-53-icelandic-ep10-1000h",model_word_separator = '|',model_blank_token = '[PAD]'))
83
 
@@ -106,16 +106,19 @@ with bl:
106
  # ABOUT
107
 
108
  The Icelandic corpus is [samromur-asr](https://huggingface.co/datasets/language-and-voice-lab/samromur_asr), and Faroese uses [ravnursson-asr](https://huggingface.co/datasets/carlosdanielhernandezmena/ravnursson_asr).
109
-
 
 
 
 
110
  ### Pitch tracking (F0 estimation)
111
  Estimated pitch is shown in blue on the graphs, as tracked by [REAPER](https://github.com/google/REAPER).
112
 
113
  ### Intensity
114
- The orange line is root mean squared energy, which reflects loudness and is also a good indication of syllable placement, as it should line up with vowels and similar sounds.
115
-
116
- [ABOUT CTC ALIGNMENT - TODO]
117
 
118
  This is a work-in-progress basic demo for automatic prosodic annotation in Faroese and Icelandic.
 
119
  Contact caitlinr@ru.is / https://github.com/catiR/ when things break, or with ideas/suggestions about how to apply this.
120
  The source code is available under the Files tab at the top of the Space.
121
  """
 
64
  bl = gr.Blocks()
65
 
66
  with bl:
67
+ gr.Markdown(
68
+ """
69
+ # Demo under construction
70
+ ## 1. Choose a language to load
71
+ ## 2. See a small sample of the selected corpus
72
+ ## 3. Click the button below to view time-aligned prosody information for a random example
73
+ """ )
74
  with gr.Row():
75
+ lloadr = gr.Dropdown(["Faroese", "Icelandic"], label="Language")#, info="Loading the dataset takes some time")
76
  gr.Markdown(
77
  """
78
+ Pitch is shown in dark blue and loudness is the light orange line. The pitch estimation, and the time-alignment of words to audio, are completely automated and there will be some inaccuracy.
79
+ The random example can be from the whole corpus, not necessarily one of the visible rows. More information below.
 
 
 
 
 
 
80
  """ )
 
81
 
82
  align_func = gr.State()#value=ctcalign.aligner(model_path="carlosdanielhernandezmena/wav2vec2-large-xlsr-53-icelandic-ep10-1000h",model_word_separator = '|',model_blank_token = '[PAD]'))
83
 
 
106
  # ABOUT
107
 
108
  The Icelandic corpus is [samromur-asr](https://huggingface.co/datasets/language-and-voice-lab/samromur_asr), and Faroese uses [ravnursson-asr](https://huggingface.co/datasets/carlosdanielhernandezmena/ravnursson_asr).
109
+
110
+ ### Forced alignment
111
+ The prosody graphs are marked with time-alignments for the words found by [CTC decoding](https://pytorch.org/audio/main/tutorials/forced_alignment_tutorial.html). This uses wav2vec-2.0 based models ([Faroese](https://huggingface.co/carlosdanielhernandezmena/wav2vec2-large-xlsr-53-faroese-100h), [Icelandic](https://huggingface.co/carlosdanielhernandezmena/wav2vec2-large-xlsr-53-icelandic-ep10-1000h)) and tends to be more robust than Montreal Forced Aligner.
112
+ However, this aligner does not contain any phoneme representation, and therefore, segment alignments are for orthographic characters rather than phonemes. Especially in languages with shallow orthography, these letter alignments probably indicate something about the timing of sounds in a word, but the exact durations should not be taken too seriously especially in cases like doubled or silent letters.
113
+
114
  ### Pitch tracking (F0 estimation)
115
  Estimated pitch is shown in blue on the graphs, as tracked by [REAPER](https://github.com/google/REAPER).
116
 
117
  ### Intensity
118
+ The orange line is root mean squared energy, which reflects loudness and is also a good indication of syllable placement, as it should correspond to vowels and similar sounds.
 
 
119
 
120
  This is a work-in-progress basic demo for automatic prosodic annotation in Faroese and Icelandic.
121
+ So far, you cannot select or upload your own choice of sentence for analysis, nor search the corpora. Also, it does not display well when the sentence is too long. In that case, or if there are serious errors in the automated analyses, try another random sentence.
122
  Contact caitlinr@ru.is / https://github.com/catiR/ when things break, or with ideas/suggestions about how to apply this.
123
  The source code is available under the Files tab at the top of the Space.
124
  """