Daskin’s Theorem:
∂
∂
θ
max
δ
L
(
f
θ
(
x
+
δ
,
y
)
)
=
∂
∂
θ
L
(
f
θ
(
x
+
δ
∗
,
y
)
)
\frac{\partial}{\partial \theta }\max_\delta L(f_\theta(x+\delta,y))=\frac{\partial }{\partial \theta}L(f_\theta(x+\delta^*,y))
∂θ∂δmaxL(fθ(x+δ,y))=∂θ∂L(fθ(x+δ∗,y))
Only care about the worst attack samples. For a batch of samples, find
δ
∗
\delta^*
δ∗, then do GD on
θ
\theta
θ based on
δ
∗
\delta^*
δ∗.
Models become more semantically meaningful by adversarial training
Successive Halving(SH) Algorithm(*)
Assume v 1 > v 2 ≥ . . . ≥ v n , Δ i = v 1 − v i v_1>v_2\ge ...\ge v_n,\Delta_i=v_1-v_i v1>v2≥...≥vn,Δi=v1−vi. With probability 1 − δ 1-\delta 1−δ, the algorithm finds the best arm with B = O ( max i > 1 i Δ i 2 log n log ( log n / δ ) ) B=O(\max_{i>1}\frac{i}{\Delta_i^2}\log n\log(\log n/\delta)) B=O(maxi>1Δi2ilognlog(logn/δ)) assuming v i ∈ [ 0 , 1 ] v_i\in [0,1] vi∈[0,1]
Hyperparameter tuning
Randomness is essential.
Database: histogram N ∣ X ∣ N^{|X|} N∣X∣ for discrete set X X X(the categories).
M
M
M is
(
ϵ
,
δ
)
(\epsilon,\delta )
(ϵ,δ)-differentially privacy if
∀
x
,
y
∈
N
∣
X
∣
,
∣
x
−
y
∣
1
≤
1
,
S
⊆
M
(
N
∣
X
∣
)
\forall x,y\in N^{|X|},|x-y|_1\le 1,S\subseteq M(N^{|X|})
∀x,y∈N∣X∣,∣x−y∣1≤1,S⊆M(N∣X∣)
P
(
M
(
x
)
∈
S
)
≤
e
ϵ
P
(
M
(
y
)
∈
S
)
+
δ
P(M(x)\in S)\le e^{\epsilon}P(M(y)\in S)+\delta
P(M(x)∈S)≤eϵP(M(y)∈S)+δ
Differencial privacy is immune to post-processing. Proof:
With
(
ϵ
,
0
)
(\epsilon,0)
(ϵ,0)-differential privacy(DP mechanism), the voting result will not change too much by changing one’s vote.
E
a
∼
f
(
M
(
x
)
)
[
u
i
(
a
)
]
=
∑
a
∈
A
u
i
(
a
)
Pr
f
(
M
(
x
)
)
[
a
]
≤
∑
a
∈
A
u
i
(
a
)
exp
(
ϵ
)
Pr
f
(
M
(
y
)
)
[
a
]
=
exp
(
ϵ
)
E
a
∼
f
(
M
(
y
)
)
[
u
i
(
a
)
]
\begin{align*} E_{a\sim f(M(x))}[u_i(a)]&=\sum_{a\in A} u_i(a)\Pr_{f(M(x))}[a]\\&\le \sum_{a\in A}u_i(a)\exp(\epsilon)\Pr_{f(M(y))}[a]\\&=\exp(\epsilon)E_{a\sim f(M(y))}[u_i(a)] \end{align*}
Ea∼f(M(x))[ui(a)]=a∈A∑ui(a)f(M(x))Pr[a]≤a∈A∑ui(a)exp(ϵ)f(M(y))Pr[a]=exp(ϵ)Ea∼f(M(y))[ui(a)]
f
f
f is a map to the feature, and we only consider the expected voting utility
u
i
(
a
)
u_i(a)
ui(a) about the feature
a
a
a.
Laplace Mechanism: reach ( ϵ , 0 ) (\epsilon,0) (ϵ,0)-DP by just adding a random gaussian noise to f ( x ) , x ∈ N ∣ X ∣ f(x),x\in N^{|X|} f(x),x∈N∣X∣
Assume f : N ∣ X ∣ → R k f:N^{|X|}\to \R^{k} f:N∣X∣→Rk
Definition: M L ( x , f , ϵ ) = f ( x ) + ( Y 1 , . . , Y k ) M_L(x,f,\epsilon)=f(x)+(Y_1,..,Y_k) ML(x,f,ϵ)=f(x)+(Y1,..,Yk)
Proof. The probability density of M L ( x , f , ϵ ) M_L(x,f,\epsilon) ML(x,f,ϵ) is p x ( z ) = ∏ i = 1 k exp ( − ϵ ∣ f ( x ) i − z i ∣ Δ f ) p_x(z)=\prod_{i=1}^k\exp\left(-\frac{\epsilon|f(x)_i-z_i|}{\Delta f}\right) px(z)=∏i=1kexp(−Δfϵ∣f(x)i−zi∣). Easily prove p x ( z ) / p y ( z ) ≤ exp ( ϵ ) p_x(z)/p_y(z)\le \exp(\epsilon) px(z)/py(z)≤exp(ϵ)
Accuracy: for
δ
∈
(
0
,
1
]
\delta\in(0,1]
δ∈(0,1],
Pr
[
∣
f
(
x
)
−
M
L
(
x
,
f
,
ϵ
)
∣
∞
≥
ln
(
k
δ
)
⋅
(
Δ
f
ϵ
)
]
≤
δ
\Pr\left[|f(x)-M_L(x,f,\epsilon)|_\infty\ge \ln\left(\frac{k}{\delta}\right)\cdot \left(\frac{\Delta f}{\epsilon}\right)\right]\le \delta
Pr[∣f(x)−ML(x,f,ϵ)∣∞≥ln(δk)⋅(ϵΔf)]≤δ
因篇幅问题不能全部显示,请点此查看更多更全内容