Diffusion-based Defences for Adversarial Audio — “zero” Sample

This page compares versions of the same spoken digit “zero”: the clean recording, adversarially perturbed variants (PGD, FakeBob), and outputs after different purification methods (MultiDiff (our Defence), AudioPure, and DualPure).

It is easier to hear the difference between the samples when wearing headphones.

Clean

PGD-Linf 10

PGD-Linf 100

PGD-L2 10

PGD-L2 100

FAKEBOB-Spd20

FAKEBOB-Spd200

MultiDiff: Clean

MultiDiff: PGD-Linf 10

MultiDiff: PGD-Linf 100

MultiDiff: PGD-L2 10

MultiDiff: PGD-L2 100

MultiDiff: FAKEBOB-Spd20

MultiDiff: FAKEBOB-Spd200

AudioPure: Clean

AudioPure: PGD-Linf 10

AudioPure: PGD-Linf 100

AudioPure: PGD-L2 10

AudioPure: PGD-L2 100

AudioPure: FAKEBOB-Spd20

AudioPure: FAKEBOB-Spd200

DualPure (wave): Clean

DualPure (wave): PGD-Linf 10

DualPure (wave): PGD-Linf 100

DualPure (wave): PGD-L2 10

DualPure (wave): PGD-L2 100

DualPure (wave): FAKEBOB-Spd20

DualPure (wave): FAKEBOB-Spd200

Note: the audio examples for DualPure above show only the first step of their purification in the wave domain; their second defence operates in the frequency domain.

Spd: Samples per draw

Table 1: Performance against adaptive attacks among different methods — experiment ran 5 times

DefenceCleanL∞ White BoxL2 White BoxFAKEBOB BLACK BOX
PGD10PGD30PGD50PGD70PGD100PGD10PGD30PGD50PGD70PGD100iter=200 & samples_per_draw = 20iter=200 & samples_per_draw = 200
None980.200001.2000087.236.6
AudioPure (DiffWave)95.2807266.865.26359.64035.431.229.278.677.62*
DualPure95.384.48076.274.8745546.6444341.285.686
MultiDiff — rand_init = 3, select = logit_margin9787.28382.682.680.654.244.842.84341.69393
MultiDiff — rand_init = 8, select = logit_margin9787.684.48483.883.257.648.646.44543.895.695.6

Note: '*' means result could not be calculated with the same number of test samples due to limited resources. Calculated with 53 samples instead of 100.

Green cells indicate best performance for that column.

Table 2: Latency of the different defence methods

DefenceTime
AudioPure (DiffWave)0.4986s
DualPure0.0660s
MultiDiff, rand_init = 30.0678
MultiDiff, rand_init = 80.1235