Spaces:
Runtime error
Runtime error
csteinmetz1
commited on
Commit
·
fa298e8
1
Parent(s):
083ef90
README formatting
Browse files
README.md
CHANGED
@@ -1,23 +1,36 @@
|
|
1 |
-
|
2 |
-
Removing multiple audio effects from multiple sources with compositional audio effect removal using source separation and speech enhancement models.
|
3 |
|
4 |
-
|
|
|
5 |
|
6 |
-
[](https://opensource.org/licenses/Apache-2.0)
|
7 |
-
[](https://colab.research.google.com/drive/1LoLgL1YHzIQfILEayDmRUZzDZzJpD6rD)
|
8 |
[](https://arxiv.org/abs/1234.56789)
|
|
|
|
|
|
|
9 |
|
10 |
-
<img width="700px" src="remfx-headline.jpg">
|
11 |
|
12 |
Listening examples can be found [here](https://csteinmetz1.github.io/RemFX/).
|
13 |
|
|
|
|
|
|
|
14 |
## Abstract
|
|
|
15 |
|
16 |
Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects are applied with varying source content. This motivates a more general task, which we refer to as general purpose audio effect removal. We developed a dataset for this task using five audio effects across four different sources and used it to train and evaluate a set of existing architectures. We found that no single model performed optimally on all effect types and sources. To address this, we introduced <b>RemFX</b>, an approach designed to mirror the compositionality of applied effects. We first trained a set of the best-performing effect-specific
|
17 |
removal models and then leveraged an audio effect classification model to dynamically construct a graph of our models at inference. We found our approach to outperform single model baselines, although examples with many effects present remain challenging.
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
|
21 |
```
|
22 |
git clone https://github.com/mhrice/RemFx.git
|
23 |
cd RemFx
|
@@ -29,9 +42,10 @@ Due to incompatabilities with hearbaseline's dependencies (namely numpy/numba) a
|
|
29 |
<b>Please run the setup code before running any scripts.</b>
|
30 |
All scripts should be launched from the top level after installing.
|
31 |
|
32 |
-
|
33 |
This repo can be used for many different tasks. Here are some examples. Ensure you have run the setup code before running any scripts.
|
34 |
-
|
|
|
35 |
Here we will attempt to detect, then remove effects that are present in an audio file. For the best results, use a file from our [evaluation dataset](https://zenodo.org/record/8187288). We support detection and removal of the following effects: chorus, delay, distortion, dynamic range compression, and reverb.
|
36 |
|
37 |
First, we need to download the pytorch checkpoints from [zenodo](https://zenodo.org/record/8218621)
|
@@ -42,13 +56,13 @@ Then run the detect script. This repo contains an example file `example.wav` fro
|
|
42 |
```
|
43 |
scripts/remfx_detect.sh example.wav -o dry.wav
|
44 |
```
|
45 |
-
|
46 |
We provide a script to download and unzip the datasets used in table 4 of the paper.
|
47 |
```
|
48 |
scripts/download_eval_datasets.sh
|
49 |
```
|
50 |
|
51 |
-
|
52 |
|
53 |
If you'd like to train your own model and/or generate a dataset, you can download the starter datasets using the following command:
|
54 |
|
@@ -182,7 +196,7 @@ The dataset that is generated contains 8000 train examples, 1000 validation exam
|
|
182 |
|
183 |
Note: if training, this process will be done automatically at the start of training. To disable this, set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
|
184 |
|
185 |
-
|
186 |
Some relevant dataset/training parameters descriptions
|
187 |
- `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
|
188 |
- `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
|
@@ -195,20 +209,20 @@ Some relevant dataset/training parameters descriptions
|
|
195 |
- `datamodule.train_batch_size={batch_size}`. Change batch size (default: varies).
|
196 |
- `logger=wandb`. Use weights and biases logger (default: csv). Ensure you set the wandb environment variables (see training section).
|
197 |
|
198 |
-
|
199 |
- `umx`
|
200 |
- `demucs`
|
201 |
- `tcn`
|
202 |
- `dcunet`
|
203 |
- `dptnet`
|
204 |
|
205 |
-
|
206 |
- `cls_vggish`
|
207 |
- `cls_panns_pt`
|
208 |
- `cls_wav2vec2`
|
209 |
- `cls_wav2clip`
|
210 |
|
211 |
-
|
212 |
- `delay`
|
213 |
- `distortion`
|
214 |
- `chorus`
|
|
|
1 |
+
<div align="center">
|
|
|
2 |
|
3 |
+
# RemFx
|
4 |
+
General Purpose Audio Effect Removal
|
5 |
|
|
|
|
|
6 |
[](https://arxiv.org/abs/1234.56789)
|
7 |
+
[](https://colab.research.google.com/drive/1LoLgL1YHzIQfILEayDmRUZzDZzJpD6rD)
|
8 |
+
[](https://zenodo.org/record/8187288)
|
9 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
10 |
|
|
|
11 |
|
12 |
Listening examples can be found [here](https://csteinmetz1.github.io/RemFX/).
|
13 |
|
14 |
+
|
15 |
+
<img width="450px" src="remfx-headline.jpg">
|
16 |
+
|
17 |
## Abstract
|
18 |
+
</div>
|
19 |
|
20 |
Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects are applied with varying source content. This motivates a more general task, which we refer to as general purpose audio effect removal. We developed a dataset for this task using five audio effects across four different sources and used it to train and evaluate a set of existing architectures. We found that no single model performed optimally on all effect types and sources. To address this, we introduced <b>RemFX</b>, an approach designed to mirror the compositionality of applied effects. We first trained a set of the best-performing effect-specific
|
21 |
removal models and then leveraged an audio effect classification model to dynamically construct a graph of our models at inference. We found our approach to outperform single model baselines, although examples with many effects present remain challenging.
|
22 |
|
23 |
+
```bibtex
|
24 |
+
@inproceedings{rice2023remfx,
|
25 |
+
title={General Purpose Audio Effect Removal},
|
26 |
+
author={Rice, Matthew and Steinmetz, Christian J. and Fazekas, George and Reiss, Joshua D.},
|
27 |
+
booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics},
|
28 |
+
year={2023}
|
29 |
+
}
|
30 |
+
```
|
31 |
+
|
32 |
|
33 |
+
## Setup
|
34 |
```
|
35 |
git clone https://github.com/mhrice/RemFx.git
|
36 |
cd RemFx
|
|
|
42 |
<b>Please run the setup code before running any scripts.</b>
|
43 |
All scripts should be launched from the top level after installing.
|
44 |
|
45 |
+
## Usage
|
46 |
This repo can be used for many different tasks. Here are some examples. Ensure you have run the setup code before running any scripts.
|
47 |
+
|
48 |
+
### Run RemFX Detect on a single file
|
49 |
Here we will attempt to detect, then remove effects that are present in an audio file. For the best results, use a file from our [evaluation dataset](https://zenodo.org/record/8187288). We support detection and removal of the following effects: chorus, delay, distortion, dynamic range compression, and reverb.
|
50 |
|
51 |
First, we need to download the pytorch checkpoints from [zenodo](https://zenodo.org/record/8218621)
|
|
|
56 |
```
|
57 |
scripts/remfx_detect.sh example.wav -o dry.wav
|
58 |
```
|
59 |
+
### Download the [General Purpose Audio Effect Removal evaluation datasets](https://zenodo.org/record/8187288)
|
60 |
We provide a script to download and unzip the datasets used in table 4 of the paper.
|
61 |
```
|
62 |
scripts/download_eval_datasets.sh
|
63 |
```
|
64 |
|
65 |
+
### Download the starter datasets
|
66 |
|
67 |
If you'd like to train your own model and/or generate a dataset, you can download the starter datasets using the following command:
|
68 |
|
|
|
196 |
|
197 |
Note: if training, this process will be done automatically at the start of training. To disable this, set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
|
198 |
|
199 |
+
## Experimental parameters
|
200 |
Some relevant dataset/training parameters descriptions
|
201 |
- `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
|
202 |
- `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
|
|
|
209 |
- `datamodule.train_batch_size={batch_size}`. Change batch size (default: varies).
|
210 |
- `logger=wandb`. Use weights and biases logger (default: csv). Ensure you set the wandb environment variables (see training section).
|
211 |
|
212 |
+
### Effect Removal Models
|
213 |
- `umx`
|
214 |
- `demucs`
|
215 |
- `tcn`
|
216 |
- `dcunet`
|
217 |
- `dptnet`
|
218 |
|
219 |
+
### Effect Classification Models
|
220 |
- `cls_vggish`
|
221 |
- `cls_panns_pt`
|
222 |
- `cls_wav2vec2`
|
223 |
- `cls_wav2clip`
|
224 |
|
225 |
+
### Effects
|
226 |
- `delay`
|
227 |
- `distortion`
|
228 |
- `chorus`
|