AEROMamba: Efficient Audio Super-Resolution

AI-Generated README - Original: GitHub | Demo


Model Overview

Architecture: Hybrid GAN + Mamba SSM
Task: 11.025 kHz β†’ 44.1 kHz audio upsampling
Key Improvements:

  • 14x faster inference vs AERO
  • 5x less GPU memory usage
  • 66.47 subjective score (vs AERO's 60.03)

Checkpoint: MUSDB18-HQ Model


Quick Start

# Installation
pip install torch==1.12.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install causal-conv1d==1.1.2 mamba-ssm==1.1.3

# Inference
from src.models.aeromamba import AEROMamba
import torchaudio

model = AEROMamba.load_from_checkpoint("checkpoint.th")
lr_audio, sr = torchaudio.load("low_res.wav")  # 11kHz input
hr_audio = model(lr_audio)  # 44.1kHz output

Performance (MUSDB18)

Metric Low-Res AERO AEROMamba
ViSQOL ↑ 1.82 2.90 2.93
LSD ↓ 3.98 1.34 1.23
Subjective ↑ 38.22 60.03 66.47

Hardware: 14x faster on RTX 3090 (0.087s vs 1.246s)


Training Data

MUSDB18-HQ:

  • 150 full-track music recordings
  • 44.1 kHz originals β†’ 11.025 kHz downsampled pairs
  • 87.5/12.5 train-val split

Citation

@inproceedings{Abreu2024lamir,
  author    = {Wallace Abreu and Luiz Wagner Pereira Biscainho},
  title     = {AEROMamba: Efficient Audio SR with GANs and SSMs},
  booktitle = {Proc. Latin American Music IR Workshop},
  year      = {2024}
}

This README was AI-generated based on original project materials. For training code and OLA inference scripts, visit the GitHub repo.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Spaces using innova-ai/AEROMamba 3