We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution. Our method explores the properties of Transformers while having low computational costs. Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently. In addition, we note that existing Transformers usually explore all similarities of the tokens between the queries and keys for the feature aggregation. However, not all the tokens from the queries are relevant to those in keys, using all the similarities does not effectively facilitate the high-resolution image reconstruction. To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values so that the most useful global features can be better utilized for the high-resolution image reconstruction. We develop a hybrid dynamic-Transformer block (HDTB) that integrates the MHDLSA and SparseGSA for both local and global feature exploration. To ease the network training, we formulate the HDTBs into a residual hybrid dynamic-Transformer group (RHDTG). By embedding the RHDTGs into an end-to-end trainable network, we show that our proposed method has fewer network parameters and lower computational costs while achieving competitive performance against state-of-the-art ones in terms of accuracy.
Image super-resolution comparisons (x4) in terms of accuracy, network parameters, and floating point operations (FLOPs) from the Urban100 dataset. The area of each circle denotes the number of network parameters. Our model (DLGSANet) achieves comparable performance while having fewer network parameters (<5M) and lower FLOPs.
The proposed lightweight dynamic local and global self-attention network (DLGSANet) mainly contains a shallow feature extraction module, six residual hybrid dynamic-Transformer groups (RHDTGs) for both local and global feature extraction, and a high-resolution image reconstruction module. The shallow feature extraction uses a convolutional layer with a filter size of 3x3 pixels to extract features from the input low-resolution image. Each RHDTG takes the hybrid dynamic-Transformer block (HDTB) as the basic module. Moreover, the HDTB contains the multi-head dynamic local self-attention (MHDLSA) and the sparse global self-attention (SparseGSA). The high-resolution image reconstruction module contains a convolutional layer with a filter size of 3x3 pixels, followed by a PixelShuffle operation for upsampling. Figure shows the overview of the proposed DLGSANet for SISR.
Network architecture of the proposed DLGSANet. It mainly contains a shallow feature extraction module, six residual hybrid dynamic-Transformer groups (RHDTGs) for both local and global feature extraction, and a high-resolution image reconstruction module.
We note that using the softmax function will keep all the self-attention values for the feature aggregation. However, if the tokens from the query and key are different, using the self-attention values of these tokens may affect the feature aggregation. In contrast, using the ReLU removes some self-attention values. For example, only the ones that correspond to the main structures and details are preserved, which thus leads to better results, as shown in Figure.
Using the SparseGSA is able to remove useless self-attention values and thus leads to better features for high-resolution image reconstruction.
Super Resolution x2 | |||||||
---|---|---|---|---|---|---|---|
Model | Params(\M) | FLOPs(\G) | Set5(PSNR/SSIM) | Set14(PSNR/SSIM) | B100(PSNR/SSIM) | Urban100(PSNR/SSIM) | Manga109(PSNR/SSIM) |
EDSR | 40.73 | 9387 | 38.11/0.9602 | 33.92/0.9195 | 32.32/0.9013 | 32.93/0.9351 | 39.10/0.9773 |
RDN | 22.12 | 5098 | 38.24/0.9614 | 34.01/0.9212 | 32.34/0.9017 | 32.89/0.9353 | 39.18/0.9780 |
RCAN | 15.44 | 3530 | 38.27/0.9614 | 34.12/0.9216 | 32.41/0.9027 | 33.34 0.9384 | 39.44/0.9786 |
SAN | 15.86 | 3050 | 38.31/0.9620 | 34.07/0.9213 | 32.42/0.9028 | 33.10/0.9370 | 39.32/0.9792 |
HAN | 63.6 | 14551 | 38.27/0.9614 | 34.16/0.9217 | 32.41/0.9027 | 33.35/0.9385 | 39.46/0.9785 |
NLSA | 41.79 | 9632 | 38.34/0.9618 | 34.08/0.9231 | 32.43/0.9027 | 33.42/0.9394 | 39.59/0.9789 |
SwinIR | 11.75 | 2301 | 38.35/0.9620 | 34.14/0.9227 | 32.44/0.9030 | 33.40/0.9393 | 39.60/0.9792 |
ELAN | 8.25 | 1965 | 38.36/0.9620 | 34.20/0.9228 | 32.45/0.9030 | 33.44/0.9391 | 39.62/0.9793 |
DLGSANet (Ours) | 4.73 | 1097 | 38.34/0.9617 | 34.25/0.9231 | 32.38/0.9025 | 33.41/0.9393 | 39.57/0.9789 |
Super Resolution x3 | |||||||
Model | Params(\M) | FLOPs(\G) | Set5(PSNR/SSIM) | Set14(PSNR/SSIM) | B100(PSNR/SSIM) | Urban100(PSNR/SSIM) | Manga109(PSNR/SSIM) |
EDSR | 43.68 | 4470 | 34.65/0.9280 | 30.52/0.8462 | 29.25/0.8093 | 28.80/0.8653 | 34.17/0.9476 |
RDN | 22.3 | 2282 | 34.71/0.9296 | 30.57/0.8468 | 29.26/0.8093 | 28.80/0.8653 | 34.13/0.9484 |
RCAN | 15.62 | 1586 | 34.74/0.9299 | 30.65/0.8482 | 29.32/0.8111 | 29.09/0.8702 | 34.44/0.9499 |
SAN | 15.89 | 1620 | 34.75/0.9300 | 30.59/0.8476 | 29.33/0.8112 | 28.93/0.8671 | 34.30/0.9494 |
HAN | 64.34 | 6534 | 34.75/0.9299 | 30.67/0.8483 | 29.32/0.8110 | 29.10/0.8705 | 34.48/0.9500 |
NLSA | 44.74 | 4579 | 34.85/0.9306 | 30.70/0.8485 | 29.34/0.8117 | 29.25/0.8726 | 34.57 0.9508 |
SwinIR | 11.93 | 1026 | 34.89/0.9312 | 30.77/0.8503 | 29.37/0.8124 | 29.29/0.8744 | 34.74/0.9518 |
ELAN | 8.27 | 874 | 34.90/0.9313 | 30.80/0.8504 | 29.38/0.8124 | 29.32/0.8745 | 34.73/0.9517 |
DLGSANet (Ours) | 4.74 | 486 | 34.95/0.9310 | 30.77/0.8501 | 29.38/0.8121 | 29.43/0.8761 | 34.76/0.9517 |
Super Resolution x4 | |||||||
Model | Params(\M) | FLOPs(\G) | Set5(PSNR/SSIM) | Set14(PSNR/SSIM) | B100(PSNR/SSIM) | Urban100(PSNR/SSIM) | Manga109(PSNR/SSIM) |
EDSR | 43.09 | 2895 | 32.46/0.8968 | 28.80/0.7876 | 27.71/0.7420 | 26.64/0.8033 | 31.02/0.9148 |
RDN | 22.27 | 1310 | 32.47/0.8990 | 28.81/0.7871 | 27.72/0.7419 | 26.61/0.8028 | 31.00/0.9151 |
RCAN | 15.59 | 918 | 32.63/0.9002 | 28.87/0.7889 | 27.77/0.7436 | 26.82/0.8087 | 31.22/0.9173 |
SAN | 15.86 | 937 | 32.64/0.9003 | 28.92/0.7888 | 27.78/0.7436 | 26.79/0.8068 | 31.18/0.9169 |
HAN | 64.19 | 3776 | 32.64/0.9002 | 28.90/0.7890 | 27.80/0.7442 | 26.85/0.8094 | 31.42/0.9177 |
NLSA | 44.15 | 2956 | 32.59/0.9000 | 28.87/0.7891 | 27.78/0.7444 | 26.96/0.8109 | 31.27/0.9184 |
SwinIR | 11.9 | 584 | 32.72/0.9021 | 28.94/0.7914 | 27.83/0.7459 | 27.07/0.8164 | 31.67/0.9226 |
ELAN | 8.31 | 494 | 32.75/0.9022 | 28.96/0.7914 | 27.83/0.7459 | 27.13/0.8167 | 31.68/0.9226 |
DLGSANet (Ours) | 4.76 | 274 | 32.80/0.9021 | 28.95/0.7907 | 27.85/0.7464 | 27.17/0.8175 | 31.68/0.9219 |