Why Vanilla GAN is unstable

Why Vanilla GAN is unstable

Loss Functions for Gen and Dis

For Dis, its loss is:

$dis{loss} = -E{x \sim Pr}[logD(x)] - E{x \sim P_g}[log(1-D(x))]$ (1)

Here $P_r$ is the real input distribution, $P_g$ is the generator’s output

For Generator, its loss is:

$$gen{loss} = E{x \sim P_g}[log(1-D(x))]$$ (2)

$$ gen{loss} = E{x \sim P_g}[-logD(x)] ​$$ (3)

(2) is the original one, (3) is the improved one.

The problem in (1) and (2)

Conclusion: the better the discriminator, the worse the generator(worse means gradient vanishment)

For (1), if the Gen is fixed, then the optimal Dis is:

for a sample x, it might come from real input or from generator, so

$-P_r(x)logD(x) - P_g(x)log[1-D(x)]$

Derive it on D(x):

$-\cfrac{P_r(x)}{D(x)} + \cfrac{P_g(x)}{1-D(x)} = 0$ $\Rightarrow$

$D^*(x) = \cfrac{P_r(x)}{P_r(x)+P_g(x)}$ (4)

And this formula is easy to understand: the ratio of sample x come from real input or generator

If $P_r(x)=0\ and \ P_g(x) \neq 0$, then the optimal D(x) equals to 0.

If $P_r(x) = P_g(x)$, then the optimal D(x) is 0.5 (means cannot distinguish the fake and real)

Problem in Vanilla GAN

If the discriminator is well, then the generator’s performance could be really bad.

The reason is : when the dis is the optimal, then the loss function of generator is:

Add a independent part into formula 2:

$E_{x \sim Pr}[logD(x)] + E{x \sim P_g}[log(1-D(x))]$

The optimization of this equation is equal to minimum of equation 2, and equal to the inverse of loss function of generator. Then we can find:

$E_{x \sim P_r}log{ \cfrac{P_r(x)}{\frac{1}{2}[P_r(x)+Pg(x)]}} + E{x \sim P_g}log{\cfrac{P_g(x)}{\frac{1}{2}[P_r(x)+P_g(x)]}} - 2log2$ (5)

According to the definiation of JS divergence,

it equals to

$2JS(P_r||P_g) - 2log2$ (6)

Here, the summary is: when the discriminator is the optimal, then the loss of generator becoms minimum JS distance between $P_r$ and $P_g$

However, for any x, the value of $JS(P_r||P_g)$ is log2, this constant value means gradient vanishment.

Therefore, the generator cannot learn anything from it.

Summary

When the discriminator is too good, generator cannot work well