We have developed the Channel-Partitioned Attention Transformer (CPAT), a novel deep learning framework that significantly enhances the performance of Single Image Super-Resolution (SISR). This work was presented at the 35th British Machine Vision Conference (BMVC 2024, Rank A) in Glasgow, UK.
In CPAT, we introduce two key components: the Channel-Partitioned Windowed Self-Attention (CPWin-SA) and the Spatial-Frequency Interaction Module (SFIM). CPWin-SA extends attention windows across both height and width dimensions, enabling the model to capture long-range dependencies while retaining computational efficiency. SFIM complements this by integrating spatial and frequency-domain information, which allows for effective recovery of fine-grained textures and sharp edges. Unlike prior dual-branch approaches such as FreqNet and SFMNet, SFIM selectively leverages frequency features with lower computational overhead, making it both efficient and accurate.
Conventional Swin Transformer-based methods often suffer from restricted receptive fields and limited ability to represent frequency-domain cues. CPAT addresses both challenges simultaneously by expanding receptive fields and incorporating frequency-domain learning, while still maintaining a lean architecture with 20.39M parameters and 329.04G FLOPs.
Benchmark evaluations further highlight the effectiveness of our approach. CPAT delivers a +0.31 dB PSNR improvement on Urban100 (×2 SR) over HAT and achieves more than 0.7 dB improvement compared to SwinIR across multiple scales, all while maintaining comparable or even lower computational cost than existing state-of-the-art methods.
These advancements demonstrate CPAT’s strong potential for a wide range of applications, including media, surveillance, defense, and medical imaging, where recovering fine details from low-resolution inputs is critical. By enhancing attention windows across height and width and effectively leveraging frequency features, we provide a more comprehensive representation of image content. As a result, CPAT consistently outperforms existing state-of-the-art approaches in both quantitative benchmarks and qualitative visual evaluations.