Intelligence is positioning
InfoNCE loss
L
(
q
,
p
1
,
{
p
i
}
i
=
2
N
)
=
−
log
exp
(
−
∥
f
(
q
)
−
f
(
p
1
)
∣
2
/
(
2
τ
)
∑
i
=
1
N
exp
(
−
∥
f
(
q
)
−
f
(
p
i
)
∣
2
/
(
2
τ
)
L(q,p_1,\{p_i\}_{i=2}^N)=-\log \frac{\exp(-\|f(q)-f(p_1)|^2/(2\tau)}{\sum_{i=1}^{N}\exp(-\|f(q)-f(p_{i})|^2/(2\tau)}
L(q,p1,{pi}i=2N)=−log∑i=1Nexp(−∥f(q)−f(pi)∣2/(2τ)exp(−∥f(q)−f(p1)∣2/(2τ)
Learn Z = f ( x ) Z=f(x) Z=f(x): map original data points into a space that semantic similarity is captured naturally.
We have a similarity matrix π \pi π about the dataset previously. π i , j \pi_{i,j} πi,j is the similarity of data i i i and j j j. We want the similarity matrix K Z K_Z KZ of f ( x ) f(x) f(x) is the same as that of x x x which is given manually. Let W X ∼ π , W Z ∼ K Z W_X\sim \pi,W_Z\sim K_Z WX∼π,WZ∼KZ, we want these two samples are the same.
Minimize crossentropy loss: H π k ( Z ) = − E W X ∼ P ( ⋅ ; π ) [ log P ( W Z = W X ; K Z ) ] H_{\pi}^{k}(Z)=-\mathbb{E}_{W_X\sim P(\cdot ;\pi)}[\log P(W_Z=W_X;K_Z)] Hπk(Z)=−EWX∼P(⋅;π)[logP(WZ=WX;KZ)]
data visualization: map data into low dimension space(2D)
SNE: Same as NCA, want q i , j ∼ exp ( − ∥ f ( x i ) − f ( x j ) ∥ 2 / ( 2 σ 2 ) ) q_{i,j}\sim \exp(-\|f(x_i)-f(x_j)\|^2/(2\sigma^2)) qi,j∼exp(−∥f(xi)−f(xj)∥2/(2σ2)) to be similar to p i , j ∼ exp ( − ∥ x i − x j ∥ 2 / ( 2 σ i 2 ) ) p_{i,j}\sim \exp (-\|x_i-x_j\|^2/(2\sigma_i^2)) pi,j∼exp(−∥xi−xj∥2/(2σi2))
Crowding problem
Solved by t-SNE: let q i , j ∼ ( 1 + ∥ y j − y i ∥ 2 ) − 1 q_{i,j}\sim (1+\|y_j-y_i\|^2)^{-1} qi,j∼(1+∥yj−yi∥2)−1(student t-distribution)
因篇幅问题不能全部显示,请点此查看更多更全内容