Diffusion-based Defences for Adversarial Audio — “zero” Sample
This page compares versions of the same spoken digit “zero”: the clean recording, adversarially perturbed variants (PGD, FakeBob), and outputs after different purification methods (MultiDiff (our Defence), AudioPure, and DualPure).
It is easier to hear the difference between the samples when wearing headphones.
Clean
PGD-Linf 10
PGD-Linf 100
PGD-L2 10
PGD-L2 100
FAKEBOB-Spd20
FAKEBOB-Spd200
MultiDiff: Clean
MultiDiff: PGD-Linf 10
MultiDiff: PGD-Linf 100
MultiDiff: PGD-L2 10
MultiDiff: PGD-L2 100
MultiDiff: FAKEBOB-Spd20
MultiDiff: FAKEBOB-Spd200
AudioPure: Clean
AudioPure: PGD-Linf 10
AudioPure: PGD-Linf 100
AudioPure: PGD-L2 10
AudioPure: PGD-L2 100
AudioPure: FAKEBOB-Spd20
AudioPure: FAKEBOB-Spd200
DualPure (wave): Clean
DualPure (wave): PGD-Linf 10
DualPure (wave): PGD-Linf 100
DualPure (wave): PGD-L2 10
DualPure (wave): PGD-L2 100
DualPure (wave): FAKEBOB-Spd20
DualPure (wave): FAKEBOB-Spd200
Note: the audio examples for DualPure above show only the first step of their purification in the wave domain; their second defence operates in the frequency domain.
Spd: Samples per draw
Table 1: Performance against adaptive attacks among different methods — experiment ran 5 times
Defence | Clean | L∞ White Box | L2 White Box | FAKEBOB BLACK BOX | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PGD10 | PGD30 | PGD50 | PGD70 | PGD100 | PGD10 | PGD30 | PGD50 | PGD70 | PGD100 | iter=200 & samples_per_draw = 20 | iter=200 & samples_per_draw = 200 | ||
None | 98 | 0.2 | 0 | 0 | 0 | 0 | 1.2 | 0 | 0 | 0 | 0 | 87.2 | 36.6 |
AudioPure (DiffWave) | 95.2 | 80 | 72 | 66.8 | 65.2 | 63 | 59.6 | 40 | 35.4 | 31.2 | 29.2 | 78.6 | 77.62* |
DualPure | 95.3 | 84.4 | 80 | 76.2 | 74.8 | 74 | 55 | 46.6 | 44 | 43 | 41.2 | 85.6 | 86 |
MultiDiff — rand_init = 3, select = logit_margin | 97 | 87.2 | 83 | 82.6 | 82.6 | 80.6 | 54.2 | 44.8 | 42.8 | 43 | 41.6 | 93 | 93 |
MultiDiff — rand_init = 8, select = logit_margin | 97 | 87.6 | 84.4 | 84 | 83.8 | 83.2 | 57.6 | 48.6 | 46.4 | 45 | 43.8 | 95.6 | 95.6 |
Note: '*' means result could not be calculated with the same number of test samples due to limited resources. Calculated with 53 samples instead of 100.
Green cells indicate best performance for that column.
Table 2: Latency of the different defence methods
Defence | Time |
---|---|
AudioPure (DiffWave) | 0.4986s |
DualPure | 0.0660s |
MultiDiff, rand_init = 3 | 0.0678 |
MultiDiff, rand_init = 8 | 0.1235 |