![]() The exponential increase in GPU performance has enabled the development of more sophisticated networks with deeper and denser CNN architectures. In addition, the alignment processes generally make the VSR networks more complicated due to the increase in total memory size and number of parameters. It means that the VSR network generates a single HR frame without considering the priorities between them. As one of the alignment methods, optical flow can be applied to each neighboring LR frame to perform pixel-level prediction through the two-dimensional (2D) pixel adjustment.Īlthough this scheme can provide better VSR performance compared to that of the conventional VSR schemes, as in Figure 1b, all input LR frames including the aligned neighboring frames are generally used with the same weights. To improve the VSR performance in this approach, it is important that the neighboring LR frames be aligned to contain as much context of the current LR frame as possible before conducting CNN operations at the stage of input feature extraction. Note that the generated single HR frame corresponds to the current LR frame. To overcome the limitations of the previous VSR schemes, recent VSR methods have been designed to generate single HR frames from multiple LR frames, as shown in Figure 1c. ( a) Single-image SR (SISR), ( b) video SR (VSR) to generate multiple high-resolution frames, and ( c) VSR to generate a single high-resolution frame. The contributions of this study are summarized as follows:ĬNN-based image and video super-resolution schemes. Through a variety of ablation studies, we also investigate the trade-off between the network complexity and the video super-resolution (VSR) performance in optimizing the proposed network. In this paper, we propose a deformable convolution-based alignment network (DCAN) with a lightweight structure, which enhances perceptual quality better than the previous methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). Thus, methods for reducing network complexity are proposed for use in sensors of lightweight memory and limited computing environment devices such as smartphones. Although deep learning-based SR methods have superior performance, with development, parameter size and memory capacity are increased in the networks. With the development of deep learning technologies, image or video SR methods are currently investigated using convolutional neural network (CNN) and recurrent neural network (RNN). Despite the initial SR methods based on pixel-wise interpolation algorithms, such as bicubic, bilinear, and nearest neighbor, being straightforward and intuitive in strategy, they have limitations in reconstructing high-frequency textures in the interpolated HR area. SR aims to generate high-resolution (HR) data from low-resolution (LR) data. Therefore, various image and video processing methods, such as super-resolution (SR), deblurring, and denoising, are used for restoration. Although image and video sensors were developed to work in environments of low latency and complexity, they operated in environments with low network bandwidth, which limits the quality of input images and videos. Image and video sensors are essentially used to handle the visual aspect. The development of sensors leads to miniaturization and increased performance. Sensors are used in a wide range of fields, such as autonomous driving, robotics, Internet of Things, medical, satellite, military, and surveillance. The proposed DCAN significantly reduces the network complexities, such as the number of network parameters, the total memory, and the inference speed, compared with the latest method. Experimental results show that the proposed DCAN achieved better performances in both the peak signal-to-noise ratio and structural similarity index measure than the compared methods. The proposed method consists of a feature extraction block, two different alignment blocks that use deformable convolution, and an up-sampling block. ![]() As one of the convolutional neural network-based VSR methods, we propose a deformable convolution-based alignment network (DCAN) to generate scaled high-resolution sequences with quadruple the size of the low-resolution sequences. To use consecutive contexts within a low-resolution sequence, VSR learns the spatial and temporal characteristics of multiple frames of the low-resolution sequence. Among them, video super-resolution (VSR) aims to reconstruct high-resolution sequences from low-resolution sequences. With the advancement of sensors, image and video processing have developed for use in the visual sensing area.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |