- The backbone part of Fig.1 is composed of one data process stage (Input) and 7 feature extraction stage (from S1 to S7). The input block includes two 3×3 convolution layers with stride = 2 and a batch normalization layer. Every stage is composed of multiple MBConvBlocks. MBConvn, k×k denotes that the number of the first Conv1 × 1 is n times of input feature maps channels and the kernel size of depthwise convolution. The structure of MBConvBlock is shown in Fig. 2. And the structure of SENet [10] in MBConvBlock is presented in Fig. 3. The effectiveness of deformable convolution in irregular object detection has been proven in [11]. Therefore, 3 deformable convolution blocks are added in the first 3 feature extraction stages to improve the accuracy of feature extraction on small objects in the backbone part. Deformable convolution means that an offset is added to the sampling position in standard convolution, so that the convolution kernel can be expanded to a large range in the training process. What’s more, deformable convolution can easily replace vanilla blocks in conventional CNN.
图1的骨干部分由一个数据处理阶段(输入)和七个特征提取阶段(从 S1到 S7)组成。输入块包括两个步长为2的3 × 3卷积层和一个批量归一化层。每个阶段由多个 MBConvBlock 组成。K × k 表示第一个 Conv1 × 1的数目是输入特征映射通道数和深度卷积核大小的 n 倍。MBConBlock 的结构如图2所示。图3给出了 MBConBlock 中 SENet [10]的结构。变形卷积在不规则目标检测中的有效性已在[11]中得到证实。为此,在前3个特征提取阶段中增加了3个可变形卷积块,以提高骨干部分小目标特征提取的精度。可变形卷积是指在标准卷积的采样位置上增加一个偏移量,使得卷积核在训练过程中可以扩展到很大的范围。更重要的是,可变形卷积可以很容易地取代传统的 CNN 香草块。
- The compound scaling method of EfficientNet can adjust the width and depth of the network more consistently, thereby solving the problem of network design under different image resolutions, especially high-resolution. Moreover, we apply the attention mechanism in the detection framework. Attention mechanism makes the network pay more attention to the important areas such as the object areas, and suppress the background areas. It can effectively reduce the in- fluence of background noise on object detection.
EfficientNet 的复合缩放方法可以更一致地调整网络的宽度和深度,从而解决了不同分辨率特别是高分辨率下的网络设计问题。此外,我们还将注意机制应用到检测框架中。注意机制使得网络更加关注目标区等重要区域,而抑制背景区域。它能有效地减少背景噪音对目标检测的影响。
- First, we use EfficientNet (Tan, Le, 2019) as the backbone network of the proposed framework. For large scenes and various object sizes, EfficientNet can balance the width, depth and input resolution through the compound scaling method. Therefore, it makes the network architecture more consistent with the input resolution. In addition, in view of the large scenes and the sparse objects, we try to make the network pay more attention to the object area rather than the background area.
标签:,convolution,卷积,object,network,EfficientNet,more From: https://www.cnblogs.com/lwp-nicol/p/17411823.html首先,我们使用 EfficientNet (Tan,Le,2019)作为提议框架的骨干网络。对于大场景和各种对象大小,EfficientNet 可以通过复合缩放方法平衡宽度、深度和输入分辨率。因此,它使网络体系结构与输入分辨率更加一致。此外,针对大场景和稀疏对象,我们尝试使网络更多地关注对象区域而不是背景区域。