Member-only story
Res2Net: New deep learning multi-scale architecture, for improved object detection with existing backbones
As Google Brain’s EfficientNet paper showed, there are rapidly diminishing returns on investment for scaling up various aspects of CNN architectures (width, depth, resolution).
A new paper by Gao, Cheng, Zhao et al (Res2Net: a new multi-scale backbone architecture), however, shows that multi-scale or scaling within a given block, rather than the usual layer by layer, is an unexplored domain with additional payoffs especially for object recognition and segmentation.
Most architectures leverage scale on a layer by layer basis. Their innovation here is to employ a hierarchical, cascading feature group (termed ‘scale’) within a given residual block, replacing the generic single 3x3 kernel.
Towards that end, they rebuilt the bottleneck block of the common ResNet architecture and replaced the standard 1–3–1 CNN layout with a “4scale -(3x3)” residual, hierarchical architecture. This change thus created “Res2Net”. The middle main convolution thus moves from single-branch to multi-branch.
A diagram is explanatory: