Why Vanilla GAN is unstable
Loss Functions for Gen and Dis
For Dis, its loss is:
$dis{loss} = -E{x \sim Pr}[logD(x)] - E{x \sim P_g}[log(1-D(x))]$ (1)
Here $P_r$ is the real input distribution, $P_g$ is the generator’s output
For Generator, its loss is:
$$gen{loss} = E{x \sim P_g}[log(1-D(x))]$$ (2)
$$ gen{loss} = E{x \sim P_g}[-logD(x)] $$ (3)
(2) is the original one, (3) is the improved one.
The problem in (1) and (2)
Conclusion: the better the discriminator, the worse the generator(worse means gradient vanishment)
For (1), if the Gen is fixed, then the optimal Dis is:
for a sample x, it might come from real input or from generator, so
$-P_r(x)logD(x) - P_g(x)log[1-D(x)]$
Derive it on D(x):
$-\cfrac{P_r(x)}{D(x)} + \cfrac{P_g(x)}{1-D(x)} = 0$ $\Rightarrow$
$D^*(x) = \cfrac{P_r(x)}{P_r(x)+P_g(x)}$ (4)
And this formula is easy to understand: the ratio of sample x come from real input or generator
If $P_r(x)=0\ and \ P_g(x) \neq 0$, then the optimal D(x) equals to 0.
If $P_r(x) = P_g(x)$, then the optimal D(x) is 0.5 (means cannot distinguish the fake and real)
Problem in Vanilla GAN
If the discriminator is well, then the generator’s performance could be really bad.
The reason is : when the dis is the optimal, then the loss function of generator is:
Add a independent part into formula 2:
$E_{x \sim Pr}[logD(x)] + E{x \sim P_g}[log(1-D(x))]$
The optimization of this equation is equal to minimum of equation 2, and equal to the inverse of loss function of generator. Then we can find:
$E_{x \sim P_r}log{ \cfrac{P_r(x)}{\frac{1}{2}[P_r(x)+Pg(x)]}} + E{x \sim P_g}log{\cfrac{P_g(x)}{\frac{1}{2}[P_r(x)+P_g(x)]}} - 2log2$ (5)
According to the definiation of JS divergence,
it equals to
$2JS(P_r||P_g) - 2log2$ (6)
Here, the summary is: when the discriminator is the optimal, then the loss of generator becoms minimum JS distance between $P_r$ and $P_g$
However, for any x, the value of $JS(P_r||P_g)$ is log2, this constant value means gradient vanishment.
Therefore, the generator cannot learn anything from it.
Summary
When the discriminator is too good, generator cannot work well