A recent bug discovery on Pytorch+Numpy got me thinking- how much does this bug impact adversarial robustness?
A couple months ago, a post on Reddit highlighted a bug in PyTorch + NumPy that affects how data augmentation works (see image above). Knowing nearly all of my projects use this combination, I read through the linked blog by Tanel Pärnamaa to see what it was all about. I was a bit shocked that it took our community this long to notice a bug this severe! Nearly all data-loaders use more than one worker. Unfortunately, not many people (clearly, since it took us all so long to notice this bug) sit down to debug data augmentation at this level within their ML pipeline.
Reading through this bug, I remembered how (proper) data-augmentation had been proposed as a means to reduce robust overfitting by authors at DeepMind
I chose the CIFAR-10 dataset: small enough to iterate experiments fast and challenging enough to observe performance gains.
Interestingly, standard training with the fixed data-augmentation pipeline hurt performance a bit, compared to using faulty augmentation:
Model | Standard Accuracy (%) | Robust Accuracy (ε = 8/255) (%) |
---|---|---|
Standard | 89.140 | 0.000 |
Standard (augmentation) | 94.720 | 0.000 |
Standard (fixed augmentation) | 94.620 | 0.000 |
Not thinking much about the 0.1% performance drop (probably statistical noise, right?), I ran adversarial training with
Model | Standard Accuracy (%) | Robust Accuracy (ε = 8/255) (%) | Robust Accuracy (ε = 16/255) (%) |
---|---|---|---|
Robust | 79.520 | 44.370 | 15.680 |
Robust (augmentation) | 86.320 | 51.400 | 17.480 |
Robust (fixed augmentation) | 86.730 | 51.880 | 17.570 |
As visible here, there’s an absolute 0.4% performance gain for $\epsilon=\frac{8}{255}$, and 0.09% performance gain for $\epsilon=\frac{4}{255}$, when using the fixed augmentation pipeline. Although the 0.09% here is not very significant, the 0.4% improvement seems non-trivial. This improvement is especially significant compared to the kind of performance differences reported on benchmarks for this dataset. Additionally, accuracy on clean data sees an improvement as well: absolute 0.41% change.
Not wanting to make any claims based on experiments on just the $L_\infty$ norm, I reran the same set of experiments for the
Model | Standard Accuracy (%) | Robust Accuracy (%), ε = 0.5 | Robust Accuracy (%), ε = 1 |
---|---|---|---|
Robust | 78.190 | 61.740 | 42.830 |
Robust (augmentation) | 80.560 | 67.200 | 51.140 |
Robust (fixed augmentation) | 81.070 | 67.620 | 51.220 |
Performance gains appear in this case as well. Accuracy on clean data bumps up by 0.51%, while robustness on
Fixing data augmentation can have a non-trivial (and positive) impact when training for robustness. Anyone training robust models (especially with adversarial training, since that is what I tested on) should fix their data-loaders.
PLACEHOLDER FOR BIBTEX