DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution

Abstract

We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution. Our method explores the properties of Transformers while having low computational costs. Motivated by the network designs of Transformers, we develop a simple yet effective multi-head dynamic local self-attention (MHDLSA) module to extract local features efficiently. In addition, we note that existing Transformers usually explore all similarities of the tokens between the queries and keys for the feature aggregation. However, not all the tokens from the queries are relevant to those in keys, using all the similarities does not effectively facilitate the high-resolution image reconstruction. To overcome this problem, we develop a sparse global self-attention (SparseGSA) module to select the most useful similarity values so that the most useful global features can be better utilized for the high-resolution image reconstruction. We develop a hybrid dynamic-Transformer block (HDTB) that integrates the MHDLSA and SparseGSA for both local and global feature exploration. To ease the network training, we formulate the HDTBs into a residual hybrid dynamic-Transformer group (RHDTG). By embedding the RHDTGs into an end-to-end trainable network, we show that our proposed method has fewer network parameters and lower computational costs while achieving competitive performance against state-of-the-art ones in terms of accuracy.

Params vs FLOPs vs PSNR

Image super-resolution comparisons (x4) in terms of accuracy, network parameters, and floating point operations (FLOPs) from the Urban100 dataset. The area of each circle denotes the number of network parameters. Our model (DLGSANet) achieves comparable performance while having fewer network parameters (<5M) and lower FLOPs.

Framework

The proposed lightweight dynamic local and global self-attention network (DLGSANet) mainly contains a shallow feature extraction module, six residual hybrid dynamic-Transformer groups (RHDTGs) for both local and global feature extraction, and a high-resolution image reconstruction module. The shallow feature extraction uses a convolutional layer with a filter size of 3x3 pixels to extract features from the input low-resolution image. Each RHDTG takes the hybrid dynamic-Transformer block (HDTB) as the basic module. Moreover, the HDTB contains the multi-head dynamic local self-attention (MHDLSA) and the sparse global self-attention (SparseGSA). The high-resolution image reconstruction module contains a convolutional layer with a filter size of 3x3 pixels, followed by a PixelShuffle operation for upsampling. Figure shows the overview of the proposed DLGSANet for SISR.

Network architecture of the proposed DLGSANet. It mainly contains a shallow feature extraction module, six residual hybrid dynamic-Transformer groups (RHDTGs) for both local and global feature extraction, and a high-resolution image reconstruction module.

SparseGSA

We note that using the softmax function will keep all the self-attention values for the feature aggregation. However, if the tokens from the query and key are different, using the self-attention values of these tokens may affect the feature aggregation. In contrast, using the ReLU removes some self-attention values. For example, only the ones that correspond to the main structures and details are preserved, which thus leads to better results, as shown in Figure.

Using the SparseGSA is able to remove useless self-attention values and thus leads to better features for high-resolution image reconstruction.

Quantitative Evaluations

Super Resolution x2
Model	Params(\M)	FLOPs(\G)	Set5(PSNR/SSIM)	Set14(PSNR/SSIM)	B100(PSNR/SSIM)	Urban100(PSNR/SSIM)	Manga109(PSNR/SSIM)
EDSR	40.73	9387	38.11/0.9602	33.92/0.9195	32.32/0.9013	32.93/0.9351	39.10/0.9773
RDN	22.12	5098	38.24/0.9614	34.01/0.9212	32.34/0.9017	32.89/0.9353	39.18/0.9780
RCAN	15.44	3530	38.27/0.9614	34.12/0.9216	32.41/0.9027	33.34 0.9384	39.44/0.9786
SAN	15.86	3050	38.31/0.9620	34.07/0.9213	32.42/0.9028	33.10/0.9370	39.32/0.9792
HAN	63.6	14551	38.27/0.9614	34.16/0.9217	32.41/0.9027	33.35/0.9385	39.46/0.9785
NLSA	41.79	9632	38.34/0.9618	34.08/0.9231	32.43/0.9027	33.42/0.9394	39.59/0.9789
SwinIR	11.75	2301	38.35/0.9620	34.14/0.9227	32.44/0.9030	33.40/0.9393	39.60/0.9792
ELAN	8.25	1965	38.36/0.9620	34.20/0.9228	32.45/0.9030	33.44/0.9391	39.62/0.9793
DLGSANet (Ours)	4.73	1097	38.34/0.9617	34.25/0.9231	32.38/0.9025	33.41/0.9393	39.57/0.9789
Super Resolution x3
Model	Params(\M)	FLOPs(\G)	Set5(PSNR/SSIM)	Set14(PSNR/SSIM)	B100(PSNR/SSIM)	Urban100(PSNR/SSIM)	Manga109(PSNR/SSIM)
EDSR	43.68	4470	34.65/0.9280	30.52/0.8462	29.25/0.8093	28.80/0.8653	34.17/0.9476
RDN	22.3	2282	34.71/0.9296	30.57/0.8468	29.26/0.8093	28.80/0.8653	34.13/0.9484
RCAN	15.62	1586	34.74/0.9299	30.65/0.8482	29.32/0.8111	29.09/0.8702	34.44/0.9499
SAN	15.89	1620	34.75/0.9300	30.59/0.8476	29.33/0.8112	28.93/0.8671	34.30/0.9494
HAN	64.34	6534	34.75/0.9299	30.67/0.8483	29.32/0.8110	29.10/0.8705	34.48/0.9500
NLSA	44.74	4579	34.85/0.9306	30.70/0.8485	29.34/0.8117	29.25/0.8726	34.57 0.9508
SwinIR	11.93	1026	34.89/0.9312	30.77/0.8503	29.37/0.8124	29.29/0.8744	34.74/0.9518
ELAN	8.27	874	34.90/0.9313	30.80/0.8504	29.38/0.8124	29.32/0.8745	34.73/0.9517
DLGSANet (Ours)	4.74	486	34.95/0.9310	30.77/0.8501	29.38/0.8121	29.43/0.8761	34.76/0.9517
Super Resolution x4
Model	Params(\M)	FLOPs(\G)	Set5(PSNR/SSIM)	Set14(PSNR/SSIM)	B100(PSNR/SSIM)	Urban100(PSNR/SSIM)	Manga109(PSNR/SSIM)
EDSR	43.09	2895	32.46/0.8968	28.80/0.7876	27.71/0.7420	26.64/0.8033	31.02/0.9148
RDN	22.27	1310	32.47/0.8990	28.81/0.7871	27.72/0.7419	26.61/0.8028	31.00/0.9151
RCAN	15.59	918	32.63/0.9002	28.87/0.7889	27.77/0.7436	26.82/0.8087	31.22/0.9173
SAN	15.86	937	32.64/0.9003	28.92/0.7888	27.78/0.7436	26.79/0.8068	31.18/0.9169
HAN	64.19	3776	32.64/0.9002	28.90/0.7890	27.80/0.7442	26.85/0.8094	31.42/0.9177
NLSA	44.15	2956	32.59/0.9000	28.87/0.7891	27.78/0.7444	26.96/0.8109	31.27/0.9184
SwinIR	11.9	584	32.72/0.9021	28.94/0.7914	27.83/0.7459	27.07/0.8164	31.67/0.9226
ELAN	8.31	494	32.75/0.9022	28.96/0.7914	27.83/0.7459	27.13/0.8167	31.68/0.9226
DLGSANet (Ours)	4.76	274	32.80/0.9021	28.95/0.7907	27.85/0.7464	27.17/0.8175	31.68/0.9219