Member-only story

Res2Net: New deep learning multi-scale architecture, for improved object detection with existing backbones

4 min readSep 22, 2019

As Google Brain’s EfficientNet paper showed, there are rapidly diminishing returns on investment for scaling up various aspects of CNN architectures (width, depth, resolution).

A new paper by Gao, Cheng, Zhao et al (Res2Net: a new multi-scale backbone architecture), however, shows that multi-scale or scaling within a given block, rather than the usual layer by layer, is an unexplored domain with additional payoffs especially for object recognition and segmentation.

Most architectures leverage scale on a layer by layer basis. Their innovation here is to employ a hierarchical, cascading feature group (termed ‘scale’) within a given residual block, replacing the generic single 3x3 kernel.

Grad-CAM activation (or heatmap) comparing ResNet50 with and without the new Res2Net blocks. Note the improved object envelopment with Res2Net.

Towards that end, they rebuilt the bottleneck block of the common ResNet architecture and replaced the standard 1–3–1 CNN layout with a “4scale -(3x3)” residual, hierarchical architecture. This change thus created “Res2Net”. The middle main convolution thus moves from single-branch to multi-branch.

A diagram is explanatory:

Res2Net: New deep learning multi-scale architecture, for improved object detection with existing backbones

Written by Less Wright

Responses (1)