VNU-HCM Logo
HCMUS Logo

FSM-Net

An Efficient Frequency-Spatial Network for Real-World Deblurring

CVPR 2026 - NTIRE Workshop

Vinh-Thuan Ly


  • University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
    Vietnam National University, Ho Chi Minh City, Vietnam

Abstract

Real-world image deblurring demands both high-fidelity restoration and computational efficiency, a balance existing methods often struggle to achieve. In this paper, we propose FSM-Net (Frequency-Spatial Multi-branch Network), a highly efficient solution that secured 2nd place in the NTIRE 2026 Challenge on Efficient Real-World Deblurring.

FSM-Net pioneers a dual-domain approach: a novel Frequency Attention module explicitly recovers high-frequency structural details via FFT, while a Cross-Gated Vision E-Branchformer at the bottleneck captures global dependencies with linear complexity. To ensure robust convergence, we employ a progressive curriculum training strategy guided by a composite loss function (Multi-Scale Charbonnier, Structural Edge, and Frequency).

Evaluated on the RSBlur benchmark, FSM-Net achieves an outstanding 33.144 dB PSNR with only 4.94M parameters and 159.35 GMACs (at 1920x1200 resolution). By effectively pushing the Pareto frontier of efficiency and quality, FSM-Net establishes a strong baseline for resource-constrained image restoration.

Performance vs. Parameters

Pareto Frontier on the RSBlur Benchmark.

Real-world image deblurring is a fundamental yet challenging task in computer vision, serving as a crucial pre-processing step for downstream applications. While recent advancements have been dominated by Vision Transformers (ViTs) and large-scale CNNs, their massive parameter counts make them highly impractical for edge deployment.

The NTIRE 2026 Challenge on Efficient Real-World Deblurring highlights a critical paradigm shift: modern networks must deliver high-fidelity restoration while strictly minimizing computational complexity. As illustrated by the Pareto frontier above, FSM-Net establishes a new optimal trade-off between restoration quality and computational complexity compared to existing architectures.

Method

FSM-Net Overall Architecture.

Frequency-Spatial Block

Severe motion blur acts as a low-pass filter that severely degrades high-frequency structural details. We introduce a novel FSMBlock equipped with Frequency Attention. By applying Fast Fourier Transform (FFT), our model explicitly suppresses noise and enhances sharp edges directly in the frequency domain.

Cross-Gated Bottleneck

At the network's bottleneck, we integrate a Cross-Gated Vision E-Branchformer Block. This module processes information through two parallel branches: a local CNN for precise spatial detail recovery and a 4-head Transposed Attention branch for capturing global context with linear complexity.

Composite Loss & EMA

To ensure robust convergence, we employ a progressive curriculum training strategy guided by a composite loss function comprising Multi-Scale Charbonnier Loss, Structural Edge Loss, and Frequency Consistency Loss. An Exponential Moving Average (EMA) is also integrated to ensure training stability and improve generalization.

Visual Results

Qualitative evaluation on the NTIRE 2026 test set. FSM-Net effectively recovers sharp structural details and suppresses complex motion artifacts. Our Frequency Attention module accurately reconstructs dense high-frequency details, while explicitly eliminating ghosting artifacts.

Visual results of FSM-Net showing before and after deblurring

Comprehensive Benchmark

Table 2. Comprehensive Benchmark of State-of-the-Art Methods Across Multiple Datasets. For cross-dataset evaluation (RealBlur-R, RealBlur-J, and GoPro), FSM-Net was fine-tuned for only 5 epochs from the RSBlur pre-trained weights. "-" indicates that the metric or result is not explicitly reported or available for that specific configuration. FSM-Net achieves an exceptional balance between high-fidelity restoration and computational efficiency across diverse real-world and synthetic degradation profiles.

Method Param (M) ↓ Dataset Performance (PSNR ↑ / SSIM ↑)
RSBlur RealBlur-R RealBlur-J GoPro
SRN-Deblur3.7632.53 / 0.839838.65 / 0.965231.38 / 0.909128.36 / 0.9150
NAFNet-C16-L284.3532.42 / 0.8400---
DeblurGAN-v26.09-36.44 / 0.934729.69 / 0.8703-
MiMO-UNet6.1032.73 / 0.8457---
MiMO-UNet+16.133.37 / 0.8560---
MPRNet20.133.61 / 0.8614--30.96 / 0.9390
Nah et al.22.6-32.51 / 0.840627.87 / 0.827429.08 / 0.9135
Restormer26.133.69 / 0.8628--31.22 / 0.9420
Uformer-B50.933.98 / 0.8660--30.83 / 0.9520
FSM-Net (Ours) 4.94 33.16 / 0.8533 36.95 / 0.9585 29.45 / 0.8840 30.60 / 0.9068

Ablation Study

We conduct an accelerated 20-epoch ablation study at high resolution (1920x1200) to evaluate the individual contributions of our proposed modules under strict computational constraints. Deploying both the Frequency Attention (FAttn) and Vision E-Branchformer (Ebranch) yields the best balance of quality and efficiency.

Model FAttn Ebranch Param (M) MACS (G) PSNR ↑
Baseline4.35146.3331.39
+ FAttn4.66155.6631.57
+ Ebranch4.93159.3531.60
FSM-Net 4.94 159.35 31.83

Challenge Leaderboard

Evaluated on the NTIRE 2026 Efficient Real-World Deblurring Challenge (Public test phase). FSM-Net secured 2nd place overall with a PSNR of 33.144 dB and an SSIM of 0.8516, operating with a sub-second latency of just 0.276s.

Rank Team Name PSNR ↑ SSIM ↑ Time (s)
1jingjing et al.33.3900.8585N/A*
2 RobinLy (Ours) 33.144 0.8516 0.276
3weichow32.8830.8467N/A*
4licheng (Xiaomi)32.8050.8473N/A*
5zzhlttcyy32.6570.8450N/A

BibTeX

@InProceedings{Thuan_2026_CVPR,
    author    = {Thuan, Ly},
    title     = {FSM-Net: An Efficient Frequency-Spatial Network for Real-World Deblurring},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2026},
    pages     = {2572-2581}
}