Spaces:

cjayic
/

sovits-overwatch2

Build error

App Files Files Community

cjayic commited on Dec 26, 2022

Commit

b131625

1 Parent(s): 6d4d33a

added some descriptions

Browse files

Files changed (2) hide show

README.md +7 -4
app.py +14 -4

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-title: Sovits Ow2
-emoji: 👀
-colorFrom: gray
 colorTo: gray
 sdk: gradio
 sdk_version: 3.15.0
@@ -10,4 +10,7 @@ pinned: false
 python_version: 3.7
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: SOVITS | Overwatch 2
+emoji: 🗣️
+colorFrom: orange
 colorTo: gray
 sdk: gradio
 sdk_version: 3.15.0
 python_version: 3.7
 ---
+# SOVITS OW2 - Voice Conversion Model
+This is a [SOVITS model](https://github.com/Francis-Komizu/Sovits) trained on every Overwatch 2 hero up to Kiriko (exception Bastion, please forgive me). The model was trained for 195000 iterations.
+It's not too great to be honest, unlike Soft-VC it doesn't appear to adjust the voice pitch to the target speaker. I added a pitch shift option, but it's pretty slow and doesn't really improve things most of the time, use at your own risk.

app.py CHANGED Viewed

@@ -28,7 +28,7 @@ _ = net_g.eval()
 _ = utils.load_checkpoint("logs/ow2/G_195000.pth", net_g, None)
-def infer(audio, speaker_id, pitch_shift, length_scale, noise_scale=.667, noise_scale_w=0.8):
     fname = audio
     source, sr = torchaudio.load(fname)
@@ -53,14 +53,24 @@ def infer(audio, speaker_id, pitch_shift, length_scale, noise_scale=.667, noise_
 demo = gradio.Interface(
     fn=infer,
     inputs=[
         gradio.Audio(label="Input Audio", type="filepath"),
         gradio.Dropdown(label="Target Voice", choices=["Ana", "Ashe", "Baptiste", "Brigitte", "Cassidy", "Doomfist", "D.Va", "Echo", "Genji", "Hanzo", "Junker Queen", "Junkrat", "Kiriko", "Lúcio", "Mei", "Mercy", "Moira", "Orisa", "Pharah", "Reaper", "Reinhardt", "Roadhog", "Sigma", "Sojourn", "Soldier_ 76", "Sombra", "Symmetra", "Torbjörn", "Tracer", "Widowmaker", "Winston", "Zarya", "Zenyatta"], type="index", value="Ana"),
-        gradio.Slider(label="Pitch Shift Input (+12 = up one octave)", minimum=-12.0, maximum=12.0, value=0, step=1),
-        gradio.Slider(label="Length Factor", minimum=0.1, maximum=2.0, value=1.0),
         gradio.Slider(label="Noise Scale (higher = more expressive and erratic)", minimum=0.0, maximum=2.0, value=.667),
         gradio.Slider(label="Noise Scale W (higher = more variation in cadence)", minimum=0.0, maximum=2.0, value=.8)
     ],
     outputs=[gradio.Audio(label="Audio as Target Voice")],
 )
 #demo.launch(share=True)
-demo.launch(server_name="0.0.0.0")

 _ = utils.load_checkpoint("logs/ow2/G_195000.pth", net_g, None)
+def infer(md, audio, speaker_id, pitch_shift, length_scale, noise_scale=.667, noise_scale_w=0.8):
     fname = audio
     source, sr = torchaudio.load(fname)
 demo = gradio.Interface(
     fn=infer,
     inputs=[
+        gradio.Markdown(
+        """
+        # SOVITS | Overwatch 2
+        Upload any voice recording and turn it into a mangled approximation of any* Overwatch 2 Hero!
+        SOVITS doesn't really appear to adjust the pitch to the target speaker, so it helps to have your input voice at a similar pitch to the target voice.
+        I added a pitch shift option to preprocess the input voice, but it's slow and sometimes outright broken, use at your own risk.
+        ( * up to Kiriko and without Bastion. Please forgive. )
+        """),
         gradio.Audio(label="Input Audio", type="filepath"),
         gradio.Dropdown(label="Target Voice", choices=["Ana", "Ashe", "Baptiste", "Brigitte", "Cassidy", "Doomfist", "D.Va", "Echo", "Genji", "Hanzo", "Junker Queen", "Junkrat", "Kiriko", "Lúcio", "Mei", "Mercy", "Moira", "Orisa", "Pharah", "Reaper", "Reinhardt", "Roadhog", "Sigma", "Sojourn", "Soldier_ 76", "Sombra", "Symmetra", "Torbjörn", "Tracer", "Widowmaker", "Winston", "Zarya", "Zenyatta"], type="index", value="Ana"),
+        gradio.Slider(label="Pitch Shift Input (+12 = up one octave, ⚠️ broken AF ⚠️)", minimum=-12.0, maximum=12.0, value=0, step=1),
+        gradio.Slider(label="Length Factor (higher = slower speech)", minimum=0.1, maximum=2.0, value=1.0),
         gradio.Slider(label="Noise Scale (higher = more expressive and erratic)", minimum=0.0, maximum=2.0, value=.667),
         gradio.Slider(label="Noise Scale W (higher = more variation in cadence)", minimum=0.0, maximum=2.0, value=.8)
     ],
     outputs=[gradio.Audio(label="Audio as Target Voice")],
 )
 #demo.launch(share=True)
+demo.launch(server_name="0.0.0.0")