Diffusion-based Defences for Adversarial Audio — “zero” Sample

This page compares versions of the same spoken digit “zero”: the clean recording, adversarially perturbed variants (PGD, FakeBob), and outputs after different purification methods (MultiDiff (our Defence), AudioPure, and DualPure).

It is easier to hear the difference between the samples when wearing headphones.

Clean

PGD-Linf 10

PGD-Linf 100

PGD-L2 10

PGD-L2 100

FAKEBOB-Spd20

FAKEBOB-Spd200

MultiDiff: Clean

MultiDiff: PGD-Linf 10

MultiDiff: PGD-Linf 100

MultiDiff: PGD-L2 10

MultiDiff: PGD-L2 100

MultiDiff: FAKEBOB-Spd20

MultiDiff: FAKEBOB-Spd200

AudioPure: Clean

AudioPure: PGD-Linf 10

AudioPure: PGD-Linf 100

AudioPure: PGD-L2 10

AudioPure: PGD-L2 100

AudioPure: FAKEBOB-Spd20

AudioPure: FAKEBOB-Spd200

DualPure (wave): Clean

DualPure (wave): PGD-Linf 10

DualPure (wave): PGD-Linf 100

DualPure (wave): PGD-L2 10

DualPure (wave): PGD-L2 100

DualPure (wave): FAKEBOB-Spd20

DualPure (wave): FAKEBOB-Spd200

Note: the audio examples for DualPure above show only the first step of their purification in the wave domain; their second defence operates in the frequency domain.

Spd: Samples per draw

Table 1: Performance against adaptive attacks among different methods — experiment ran 5 times

Defence	Clean	L∞ White Box					L2 White Box					FAKEBOB BLACK BOX
Defence	Clean	PGD10	PGD30	PGD50	PGD70	PGD100	PGD10	PGD30	PGD50	PGD70	PGD100	iter=200 & samples_per_draw = 20	iter=200 & samples_per_draw = 200
None	98	0.2	0	0	0	0	1.2	0	0	0	0	87.2	36.6
AudioPure (DiffWave)	95.2	80	72	66.8	65.2	63	59.6	40	35.4	31.2	29.2	78.6	77.62*
DualPure	95.3	84.4	80	76.2	74.8	74	55	46.6	44	43	41.2	85.6	86
MultiDiff — rand_init = 3, select = logit_margin	97	87.2	83	82.6	82.6	80.6	54.2	44.8	42.8	43	41.6	93	93
MultiDiff — rand_init = 8, select = logit_margin	97	87.6	84.4	84	83.8	83.2	57.6	48.6	46.4	45	43.8	95.6	95.6

Note: '*' means result could not be calculated with the same number of test samples due to limited resources. Calculated with 53 samples instead of 100.

Green cells indicate best performance for that column.

Table 2: Latency of the different defence methods

Defence	Time
AudioPure (DiffWave)	0.4986s
DualPure	0.0660s
MultiDiff, rand_init = 3	0.0678
MultiDiff, rand_init = 8	0.1235