Model Details

Distill-NeuCodec is a version of NeuCodec with a compatible, distilled encoder.

The distilled encoder is 10x smaller in parameter count and uses ~7.5x less MACs at inference time.

The distilled model makes the following adjustments to the model:

Our work is largely based on extending the work of X-Codec2.0 and SQCodec.

Get Started

Use the code below to get started with the model.

To install from pypi in a dedicated environment, using Python 3.10 or above:

conda create -n neucodec python=3.10
conda activate neucodec
pip install neucodec

Then, to use in python:

import librosa
import torch
import torchaudio
from torchaudio import transforms as T
from neucodec import DistillNeuCodec
 
model = DistillNeuCodec.from_pretrained("neuphonic/distill-neucodec")
model.eval().cuda()   
 
y, sr = torchaudio.load(librosa.ex("libri1"))
if sr != 16_000:
    y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)

with torch.no_grad():
    fsq_codes = model.encode_code(y)
    # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
    print(f"Codes shape: {fsq_codes.shape}")  
    recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)

torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)

Training Details

The model was trained using the same data as the full model, with an additional distillation loss (MSE between distilled and original encoder ouputs).

Downloads last month
73
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train neuphonic/distill-neucodec

Collection including neuphonic/distill-neucodec