DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution

1Nanjing University of Science and Technology,

Abstract

We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution. Our method explores the properties of Transformers while having low computational costs. Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently. In addition, we note that existing Transformers usually explore all similarities of the tokens between the queries and keys for the feature aggregation. However, not all the tokens from the queries are relevant to those in keys, using all the similarities does not effectively facilitate the high-resolution image reconstruction. To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values so that the most useful global features can be better utilized for the high-resolution image reconstruction. We develop a hybrid dynamic-Transformer block (HDTB) that integrates the MHDLSA and SparseGSA for both local and global feature exploration. To ease the network training, we formulate the HDTBs into a residual hybrid dynamic-Transformer group (RHDTG). By embedding the RHDTGs into an end-to-end trainable network, we show that our proposed method has fewer network parameters and lower computational costs while achieving competitive performance against state-of-the-art ones in terms of accuracy.

Params vs FLOPs vs PSNR

Image super-resolution comparisons (x4) in terms of accuracy, network parameters, and floating point operations (FLOPs) from the Urban100 dataset. The area of each circle denotes the number of network parameters. Our model (DLGSANet) achieves comparable performance while having fewer network parameters (<5M) and lower FLOPs.

Framework

The proposed lightweight dynamic local and global self-attention network (DLGSANet) mainly contains a shallow feature extraction module, six residual hybrid dynamic-Transformer groups (RHDTGs) for both local and global feature extraction, and a high-resolution image reconstruction module. The shallow feature extraction uses a convolutional layer with a filter size of 3x3 pixels to extract features from the input low-resolution image. Each RHDTG takes the hybrid dynamic-Transformer block (HDTB) as the basic module. Moreover, the HDTB contains the multi-head dynamic local self-attention (MHDLSA) and the sparse global self-attention (SparseGSA). The high-resolution image reconstruction module contains a convolutional layer with a filter size of 3x3 pixels, followed by a PixelShuffle operation for upsampling. Figure shows the overview of the proposed DLGSANet for SISR.

Network architecture of the proposed DLGSANet. It mainly contains a shallow feature extraction module, six residual hybrid dynamic-Transformer groups (RHDTGs) for both local and global feature extraction, and a high-resolution image reconstruction module.

SparseGSA

We note that using the softmax function will keep all the self-attention values for the feature aggregation. However, if the tokens from the query and key are different, using the self-attention values of these tokens may affect the feature aggregation. In contrast, using the ReLU removes some self-attention values. For example, only the ones that correspond to the main structures and details are preserved, which thus leads to better results, as shown in Figure.

Using the SparseGSA is able to remove useless self-attention values and thus leads to better features for high-resolution image reconstruction.

Quantitative Evaluations

Super Resolution x2
Model Params(\M) FLOPs(\G) Set5(PSNR/SSIM) Set14(PSNR/SSIM) B100(PSNR/SSIM) Urban100(PSNR/SSIM) Manga109(PSNR/SSIM)
EDSR 40.73 9387 38.11/0.9602 33.92/0.9195 32.32/0.9013 32.93/0.9351 39.10/0.9773
RDN 22.12 5098 38.24/0.9614 34.01/0.9212 32.34/0.9017 32.89/0.9353 39.18/0.9780
RCAN 15.44 3530 38.27/0.9614 34.12/0.9216 32.41/0.9027 33.34 0.9384 39.44/0.9786
SAN 15.86 3050 38.31/0.9620 34.07/0.9213 32.42/0.9028 33.10/0.9370 39.32/0.9792
HAN 63.6 14551 38.27/0.9614 34.16/0.9217 32.41/0.9027 33.35/0.9385 39.46/0.9785
NLSA 41.79 9632 38.34/0.9618 34.08/0.9231 32.43/0.9027 33.42/0.9394 39.59/0.9789
SwinIR 11.75 2301 38.35/0.9620 34.14/0.9227 32.44/0.9030 33.40/0.9393 39.60/0.9792
ELAN 8.25 1965 38.36/0.9620 34.20/0.9228 32.45/0.9030 33.44/0.9391 39.62/0.9793
DLGSANet (Ours) 4.73 1097 38.34/0.9617 34.25/0.9231 32.38/0.9025 33.41/0.9393 39.57/0.9789
Super Resolution x3
Model Params(\M) FLOPs(\G) Set5(PSNR/SSIM) Set14(PSNR/SSIM) B100(PSNR/SSIM) Urban100(PSNR/SSIM) Manga109(PSNR/SSIM)
EDSR 43.68 4470 34.65/0.9280 30.52/0.8462 29.25/0.8093 28.80/0.8653 34.17/0.9476
RDN 22.3 2282 34.71/0.9296 30.57/0.8468 29.26/0.8093 28.80/0.8653 34.13/0.9484
RCAN 15.62 1586 34.74/0.9299 30.65/0.8482 29.32/0.8111 29.09/0.8702 34.44/0.9499
SAN 15.89 1620 34.75/0.9300 30.59/0.8476 29.33/0.8112 28.93/0.8671 34.30/0.9494
HAN 64.34 6534 34.75/0.9299 30.67/0.8483 29.32/0.8110 29.10/0.8705 34.48/0.9500
NLSA 44.74 4579 34.85/0.9306 30.70/0.8485 29.34/0.8117 29.25/0.8726 34.57 0.9508
SwinIR 11.93 1026 34.89/0.9312 30.77/0.8503 29.37/0.8124 29.29/0.8744 34.74/0.9518
ELAN 8.27 874 34.90/0.9313 30.80/0.8504 29.38/0.8124 29.32/0.8745 34.73/0.9517
DLGSANet (Ours) 4.74 486 34.95/0.9310 30.77/0.8501 29.38/0.8121 29.43/0.8761 34.76/0.9517
Super Resolution x4
Model Params(\M) FLOPs(\G) Set5(PSNR/SSIM) Set14(PSNR/SSIM) B100(PSNR/SSIM) Urban100(PSNR/SSIM) Manga109(PSNR/SSIM)
EDSR 43.09 2895 32.46/0.8968 28.80/0.7876 27.71/0.7420 26.64/0.8033 31.02/0.9148
RDN 22.27 1310 32.47/0.8990 28.81/0.7871 27.72/0.7419 26.61/0.8028 31.00/0.9151
RCAN 15.59 918 32.63/0.9002 28.87/0.7889 27.77/0.7436 26.82/0.8087 31.22/0.9173
SAN 15.86 937 32.64/0.9003 28.92/0.7888 27.78/0.7436 26.79/0.8068 31.18/0.9169
HAN 64.19 3776 32.64/0.9002 28.90/0.7890 27.80/0.7442 26.85/0.8094 31.42/0.9177
NLSA 44.15 2956 32.59/0.9000 28.87/0.7891 27.78/0.7444 26.96/0.8109 31.27/0.9184
SwinIR 11.9 584 32.72/0.9021 28.94/0.7914 27.83/0.7459 27.07/0.8164 31.67/0.9226
ELAN 8.31 494 32.75/0.9022 28.96/0.7914 27.83/0.7459 27.13/0.8167 31.68/0.9226
DLGSANet (Ours) 4.76 274 32.80/0.9021 28.95/0.7907 27.85/0.7464 27.17/0.8175 31.68/0.9219

Visual Results