MFIR - Multi-Frame Image Restoration
A PyTorch model for multi-frame image restoration through temporal fusion and feature-level alignment. MFIR aligns and fuses features from multiple degraded frames to produce a high-quality restored image.
Model Description
MFIR takes 2-16 degraded frames of the same scene and combines them into a single high-quality output. Unlike single-image restoration methods that struggle with heavily degraded inputs, MFIR leverages complementary information across multiple frames - each frame captures slightly different details, and the model learns to extract and merge the best parts from each.
Architecture
Input Frames (B, N, 3, H, W)
β
βΌ
βββββββββββββββββββββββ
β Shared Encoder β ResNet-style feature extraction
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Deformable β Align frames using learned offsets
β Alignment β (3-layer cascade)
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Temporal Attention β Multi-head attention fusion
β Fusion β (4 heads)
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Decoder β PixelShuffle upsampling
βββββββββββββββββββββββ
β
βΌ
Output (B, 3, H, W)
Key Components
| Component | Description |
|---|---|
| Shared Encoder | Multi-scale feature extraction with residual blocks. 4x spatial downsampling. |
| Deformable Alignment | Cascaded deformable convolutions (3 layers) to align frames to reference. More robust than optical flow for degraded inputs. |
| Temporal Attention Fusion | Multi-head attention (4 heads) where reference frame is query, all frames are key/value. Learns per-pixel frame contributions. |
| Decoder | Progressive upsampling with PixelShuffle (2 stages, 4x total). |
Usage
Installation
pip install torch torchvision huggingface_hub
Inference
import torch
from huggingface_hub import hf_hub_download
# Download checkpoint
checkpoint_path = hf_hub_download(
repo_id="marduk-ra/MFIR",
filename="temporal_fusion_model.pth"
)
# Load model
ckpt = torch.load(checkpoint_path, map_location="cuda", weights_only=False)
# Model architecture code available at:
# https://github.com/marduk-ra/MFIR
from model import FeatureFusionModel, FeatureFusionConfig
config = FeatureFusionConfig.from_dict(ckpt["config"])
model = FeatureFusionModel(config)
model.load_state_dict(ckpt["state_dict"])
model.eval()
# Inference
# frames: (batch, num_frames, 3, height, width) tensor in [0, 1]
with torch.no_grad():
result = model(frames, ref_idx=0)
output = result["output"] # (batch, 3, height, width)
Web Demo
Try the model directly in your browser:
Model Details
| Parameter | Value |
|---|---|
| Input Channels | 3 (RGB) |
| Output Channels | 3 (RGB) |
| Max Frames | 16 |
| Min Frames | 2 |
| Encoder Channels | [64, 128, 256] |
| Deformable Groups | 8 |
| Deformable Layers | 3 |
| Attention Heads | 4 |
| Fusion Type | Attention |
| Parameters | ~10M |
| Checkpoint Size | 42 MB |
Example
Input Frames (5 degraded images):
Output (restored):
5 degraded input frames are fused into a single high-quality output.
The model works best when:
- Frames have slight variations (different noise patterns, blur, etc.)
- Frames are roughly aligned (same scene)
- Input resolution matches training resolution
Training
The model was trained on a custom dataset with the following specifications:
Dataset:
- 16,000 high-resolution source images
- Each image was used to generate 8 degraded input frames
- Multi-scale training: 128, 256, 512, and 1024 pixel resolutions
Degradation Pipeline:
- Random spatial shifts (simulating camera shake)
- Motion blur with varying kernel sizes and directions
- Gaussian and Poisson noise with random intensity
Training Configuration:
- Total epochs: 150 (progressive training)
- Optimizer: AdamW
- Loss: L1 + Perceptual (VGG) + SSIM + Color Correction
Limitations
- Requires multiple frames of the same scene
- Performance depends on frame quality variation
- GPU recommended for real-time processing
Citation
@software{karaarslan2026mfir,
author = {Karaarslan, Veli},
title = {MFIR: Multi-Frame Image Restoration},
year = {2026},
url = {https://github.com/allcodernet/MFIR}
}
License
MIT License - see LICENSE
Author
Veli Karaarslan - 2026