Spaces:
Sleeping
Sleeping
File size: 8,121 Bytes
a559a3b 634cc2a 1395200 634cc2a 384b78d a559a3b 64a6fed a559a3b c1b80c0 568c3f1 a559a3b 568c3f1 a559a3b 634cc2a 1ef06f0 64a6fed a559a3b d8f7979 568c3f1 a559a3b 1395200 a559a3b d8f7979 a559a3b 384b78d 1ef06f0 a559a3b 64a6fed a559a3b c1b80c0 8cb3861 634cc2a a559a3b 634cc2a 1ef06f0 12e94ae 634cc2a a559a3b 8cb3861 a559a3b 384b78d 64a6fed 384b78d a559a3b c4b575c c1b80c0 634cc2a f0e35fe 634cc2a c1b80c0 f0e35fe 634cc2a 568c3f1 1ef06f0 64a6fed a559a3b d8f7979 a559a3b 64a6fed a559a3b d8f7979 64a6fed a559a3b 64a6fed a559a3b d8f7979 a559a3b 1395200 64a6fed d8f7979 64a6fed a559a3b 64a6fed a559a3b 1ef06f0 c1b80c0 12e94ae 634cc2a c1b80c0 a559a3b c1b80c0 a559a3b 634cc2a a559a3b f0e35fe c1b80c0 a559a3b c1b80c0 a559a3b 634cc2a c1b80c0 a559a3b 634cc2a c1b80c0 fb9ce8b a559a3b c4b575c a559a3b 83e6879 a559a3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# General Purpose Audio Effect Removal
Removing multiple audio effects from multiple sources using compositional audio effect removal and source separation and speech enhancement models.
This repo contains the code for the paper [General Purpose Audio Effect Removal](https://arxiv.org/abs/2110.00484). (Todo: Link broken, Add video, Add img, citation, license)
# Setup
```
git clone https://github.com/mhrice/RemFx.git
cd RemFx
git submodule update --init --recursive
pip install -e . ./umx
pip install --no-deps hearbaseline
```
Due to incompatabilities with hearbaseline's dependencies (namely numpy/numba) and our other packages, we need to install hearbaseline with no dependencies.
# Usage
This repo can be used for many different tasks. Here are some examples.
## Run RemFX Detect on a single file
First, need to download the checkpoints from [zenodo](https://zenodo.org/record/8179396)
```
scripts/download_checkpoints.sh
```
Then run the detect script. This repo contains an example file `example.wav` from our test dataset which contains 2 effects (chorus and delay) applied to a guitar.
```
scripts/remfx_detect.sh example.wav -o dry.wav
```
## Download the [General Purpose Audio Effect Removal evaluation datasets](https://zenodo.org/record/8187288)
```
scripts/download_eval_datasets.sh
```
## Download the starter datasets
```
python scripts/download.py vocalset guitarset dsd100 idmt-smt-drums
```
By default, the starter datasets are downloaded to `./data/remfx-data`. To change this, pass `--output_dir={path/to/datasets}` to `download.py`
Then set the dataset root :
```
export DATASET_ROOT={path/to/datasets}
```
## Training
Before training, it is important that you have downloaded the starter datasets (see above) and set `$DATASET_ROOT`.
This project uses the [pytorch-lightning](https://www.pytorchlightning.ai/index.html) framework and [hydra](https://hydra.cc/) for configuration management. All experiments are defined in `cfg/exp/`. To train with an existing experiment run
```
python scripts/train.py +exp={experiment_name}
```
Here are some selected experiment types from the paper, which use different datasets and configurations. See `cfg/exp/` for a full list of experiments and parameters.
| Experiment Type | Config Name | Example |
| ----------------------- | ------------ | ----------------- |
| Effect-specific | {effect} | +exp=chorus |
| Effect-specific + FXAug | {effect}_aug | +exp=chorus_aug |
| Monolithic (1 FX) | 5-1 | +exp=5-1 |
| Monolithic (<=5 FX) | 5-5_full | +exp=5-5_full |
| Classifier | 5-5_full_cls | +exp=5-5_full_cls |
To change the configuration, simply edit the experiment file, or override the configuration on the command line. A description of some of these variables is in the Misc. section below.
You can also create a custom experiment by creating a new experiment file in `cfg/exp/` and overriding the default parameters in `config.yaml`.
At the end of training, the train script will automatically evaluate the test set using the best checkpoint (by validation loss). If epoch 0 is not finished, it will throw an error. To evaluate a specific checkpoint, run
```
python scripts/test.py +exp={experiment_name} +ckpt_path="{path/to/checkpoint}" render_files=False
```
The checkpoints will be saved in `./logs/ckpts/{timestamp}`
Metrics and hyperparams will be logged in `./lightning_logs/{timestamp}`
By default, the dataset needed for the experiment is generated before training.
If you have generated the dataset separately (see Generate datasets used in the paper), be sure to set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
Also note that the training assumes you have a GPU. To train on CPU, set `accelerator=null` in the config or command-line.
### Logging
Default CSV logger
To use WANDB logger:
export WANDB_PROJECT={desired_wandb_project}
export WANDB_ENTITY={your_wandb_username}
## Panns pretrianed
```
wget https://zenodo.org/record/6332525/files/hear2021-panns_hear.pth
```
## Evaluate models on the General Purpose Audio Effect Removal evaluation datasets (Table 4 from the paper)
First download the General Purpose Audio Effect Removal evaluation datasets (see above).
To use the pretrained RemFX model, download the checkpoints
```
scripts/download_checkpoints.sh
```
Then run the evaluation script, select the RemFX configuration, between `remfx_oracle`, `remfx_detect`, and `remfx_all`. Then select N, the number of effects to remove.
```
scripts/eval.sh remfx_detect 0-0
scripts/eval.sh remfx_detect 1-1
scripts/eval.sh remfx_detect 2-2
scripts/eval.sh remfx_detect 3-3
scripts/eval.sh remfx_detect 4-4
scripts/eval.sh remfx_detect 5-5
```
To eval a custom monolithic model, first train a model (see Training)
Then run the evaluation script, with the config used and checkpoint_path.
```
scripts/eval.sh distortion_aug 0-0 -ckpt "logs/ckpts/2023-07-26-10-10-27/epoch\=05-valid_loss\=8.623.ckpt"
```
To eval a custom effect-specific model as part of the inference chain, first train a model (see Training), then edit `cfg/exp/remfx_{desired_configuration}.yaml -> ckpts -> {effect}`.
Then run the evaluation script.
```
scripts/eval.sh remfx_detect 0-0
```
The script assumes that RemFX_eval_datasets is in the top-level directory.
Metrics and hyperparams will be logged in `./lightning_logs/{timestamp}`
## Generate other datasets
The datasets used in the experiments are customly generated from the starter datasets. In short, for each training/val/testing example, we select a random 5.5s segment from one of the starter datasets and apply a random number of effects to it. The number of effects applied is controlled by the `num_kept_effects` and `num_removed_effects` parameters. The effects applied are controlled by the `effects_to_keep` and `effects_to_remove` parameters.
Before generating datasets, it is important that you have downloaded the starter datasets (see above) and set `$DATASET_ROOT`.
To generate one of the datasets used in the paper, use of the experiments defined in `cfg/exp/`.
For example, to generate the `chorus` FXAug dataset, which includes files with 5 possible effects, up to 4 kept effects (distortion, reverb, compression, delay), and 1 removed effects (chorus), run
```
python scripts/generate_dataset.py +exp=chorus_aug
```
See the Misc. section below for a description of the parameters.
By default, files are rendered to `{render_root} / processed / {string_of_effects} / {train|val|test}`.
If training, this process will be done automatically at the start of training. To disable this, set `render_files=False` in the config or command-line, and set `render_root={path/to/dataset}` if it is in a custom location.
# Misc.
## Experimental parameters
Some relevant dataset/training parameters descriptions
- `num_kept_effects={[min, max]}` range of <b> Kept </b> effects to apply to each file. Inclusive.
- `num_removed_effects={[min, max]}` range of <b> Removed </b> effects to apply to each file. Inclusive.
- `model={model}` architecture to use (see 'Effect Removal Models/Effect Classification Models')
- `effects_to_keep={[effect]}` Effects to apply but not remove (see 'Effects'). Used for FXAug.
- `effects_to_remove={[effect]}` Effects to remove (see 'Effects')
- `accelerator=null/'gpu'` Use GPU (1 device) (default: null)
- `render_files=True/False` Render files. Disable to skip rendering stage (default: True)
- `render_root={path/to/dir}`. Root directory to render files to (default: ./data)
- `datamodule.train_batch_size={batch_size}`. Change batch size (default: varies)
### Effect Removal Models
- `umx`
- `demucs`
- `tcn`
- `dcunet`
- `dptnet`
### Effect Classification Models
- `cls_vggish`
- `cls_panns_pt`
- `cls_wav2vec2`
- `cls_wav2clip`
### Effects
- `chorus`
- `compressor`
- `distortion`
- `reverb`
- `delay`
|