Member-only story

Progressive Sprinkles: A new data augmentation for CNN’s (and helps achieve new 98+% NIH Malaria dataset accuracy)

Less Wright
5 min readJul 20, 2019

--

While working on trying to beat the state of the art for the NIH Malaria dataset (97%), I faced a dilemma…most of the standard ‘heavy’ data augmentation methods (CutOut, RICAP, and CutMix) wouldn’t work. The reason is because the visual clues regarding a cell being infected are only in a random location of the cell, while the rest of the cell is otherwise perfectly normal.

Progressive Sprinkles in action — squares are randomly, and progressively, placed on the cell image to force the AI to adapt and improve it’s classification skills.
Images with and without progressive sprinkles.

Thus if you randomly clip a ‘healthy’ half with cutmix or ricap, or block out the infected area with a large black block via CutOut, you’d be telling the CNN to look at the image portion of an otherwise clean cell and teaching it, that it was in fact infected.

Which of course would not result in an intelligent classifier.

Examples of Cutout and Cutmix (from CutMix paper). Neither would work due to potentially losing/blocking the ‘infected’ part of the cell.

Thus in thinking about this, I realized that if instead of blocking out a large random block ala Cutout, which could result in no usable information available to learn from (i.e. it…

--

--

Less Wright
Less Wright

Written by Less Wright

PyTorch, Deep Learning, Object detection, Stock Index investing and long term compounding.

Responses (2)