My first Introduction with GAN ( Generative Adversarial Networks)

“it’s the coolest idea of Machine Learning in the last 20 years” – Yann LeCun (one of the fathers of Deep Learning)

GANs or Generative Adversarial Networks are a kind of neural networks that is composed of 2 separate deep neural networks competing each other: the generator and the discriminator.

Their goal is to generate data points that are similar to some of the data points in the training set.

Here is the original GAN paper by @goodfellow_ian:

Let’s take a theoretical example of the process of money counterfeiting. In this process, we can imagine two types agents: a criminal and cop. Let us look into their competing objectives:

Criminal’s Objective: The main objective of the criminal is to come up with complex ways of counterfeiting money such that the Cop cannot distinguish between counterfeited money and real money.

Cop’s Objective: The main objective of the cop is to come up with complex ways so as to distinguish between counterfeited money and real money.

As this process progresses the cop develops more and more sophisticated technology to detect money counterfeiting and criminal develops more and more sophisticated technology to counterfeit money. This is the basis of what is called an Adversarial Process.

Idea of GAN:


The basic idea behind GANs is actually very simple. At its core, a GAN includes two agents with competing objectives that work through opposing goals.
This relatively simple setup results in both of the agent’s coming up with increasingly complex ways to deceive each other.  This kind of situation can be modeled in Game Theory as a minimax game.

“The generator will try to generate fake images that fool the discriminator into thinking that they’re real. And the discriminator will try to distinguish between a real and a generated image as best as it could when an image is fed.”

They both get stronger together until the discriminator cannot distinguish between the real and the generated images any more.

Generative Adversarial Networks take advantage of Adversarial Processes to train two Neural Networks who compete with each other until a desirable equilibrium is reached. In this case, we have a Generator Network G(Z) which takes input random noise and tries to generate data very close to the dataset we have. The other network is called the Discriminator Network D(X) which takes input generated data and tries to discriminate between generated data and real data.

This network at its core implements a binary classification and outputs the probability that the input data actually comes from the real dataset (as opposed to the synthetic, or fake data).


GAN Implementation using tensorflow:

By the definition of GAN, we need two nets. This could be anything, be it a sophisticated net like convnet or just a two layer neural net. Let’s be simple first and use a two layer nets for both of them. We’ll use TensorFlow for this purpose.

#Discrimenator Net


D_W1=tf.Variable(xavier_init([X_dim , h_dim]),name=’D_W1′)



#Generator Net



G_W2=tf.Variable(xavier_init([h_dim, X_dim]),name=’G_W2′)



def generator(z):
G_h1=tf.nn.relu(tf.matmul(z,G_W1)+ G_b1)
G_log_prob=tf.matmul(G_h1,G_W2) + G_b2
return G_prob


def discriminator(x):
D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
out = tf.matmul(D_h1, D_W2) + D_b2
return out

Above, generator(z) takes 100-dimensional vector and returns 786-dimensional vector, which is MNIST image (28×28).

The discriminator(x) takes MNIST image(s) and return a scalar which represents a probability of real MNIST image

Now, let’s declare the Adversarial Process for training this GAN. Here’s the training algorithm from the paper:

G_sample = generator(Z)
D_real, D_logit_real = discriminator(X)
D_fake, D_logit_fake = discriminator(G_sample)

D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. – D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))


Above, we use negative sign for the loss functions because they need to be maximized, whereas TensorFlow’s optimizer can only do minimization.

Also, as per the paper’s suggestion, it’s better to maximize  tf.reduce_mean(tf.log(D_fake)) instead of minimizing tf.reduce_mean(1 – tf.log(D_fake)) in the algorithm above.

Then we train the networks one by one with those Adversarial Training, represented by those loss functions above.

# Only update D(X)’s parameters, so var_list = theta_D
D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
# Only update G(X)’s parameters, so var_list = theta_G
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)


def sample_Z(m, n):
”’Uniform prior for G(Z)”’
return np.random.uniform(-1., 1., size=[m, n])


for it in range(1000000):
X_mb, _ = mnist.train.next_batch(mb_size)

_, D_loss_curr =[D_solver, D_loss], feed_dict={X: X_mb, Z: sample_Z(mb_size,                                                        Z_dim)})
_, G_loss_curr =[G_solver, G_loss], feed_dict={Z: sample_Z(mb_size, Z_dim)})

And we’re done! We can see the training process and The GAN is learning how to write handwritten digits on its own!



We start with random noise and as the training goes on,  G(Z)starts going more and more toward p(data)  It’s proven by the more and more similar samples generated by G(Z) compared to MNIST data.

Use Cases:
Among several use cases, generative models may be applied to:
Generating realistic artwork samples (video/image/audio).
Simulation and planning using time-series data.
Statistical inference.

Please find the complete code implementation in tensorflow at my git repo:


Click to access 1406.2661.pdf
View at


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s