CODE A NEURAL NETWORK IN PLAIN NUMPY Part 2: Planar data classification with one hidden layer

In the last post we have seen neural network with only two layers that is “Input layer” and “Output layer”, which is like a logistic regression algorithm. However in this post we are going to code a Neural network with one more layer that is “hidden layer”.

You will learn how to:

  • Implement a 2-class classification neural network with a single hidden layer
  • Use units with a non-linear activation function, such as tanh
  • Compute the cross entropy loss
  • Implement forward and backward propagation
# Package imports
import numpy as np
import matplotlib.pyplot as plt
from testCases_v2 import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
%matplotlib inline
np.random.seed(1) # set a seed so that the results are consistent

Dataset

First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.In [16]:

X, Y = load_planar_dataset()

Visualize the dataset using matplotlib. The data looks like a “flower” with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data. In other words, we want the classifier to define regions as either red or blue.In [17]:

plt.scatter(X[0, :], X[1, :], c=Y[0], s=40, cmap=plt.cm.Spectral)

Out[17]:

<matplotlib.collections.PathCollection at 0x27c7e1ee7f0>

You have:

  • a numpy-array (matrix) X that contains your features (x1, x2)
  • a numpy-array (vector) Y that contains your labels (red:0, blue:1).

Lets first get a better sense of what our data is like.

Exercise:

How many training examples do you have? In addition, what is the shape of the variables X and Y?In [19]:

X.shape,Y.shape

Out[19]:

((2, 400), (1, 400))

In [25]:

X.T.shape,Y.T.shape

Out[25]:

((400, 2), (400, 1))

Simple Logistic Regression

In [22]:

clf = sklearn.linear_model.LogisticRegressionCV();

Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.In [27]:

clf.fit(X.T,Y.T)
anaconda3\envs\tf\lib\site-packages\sklearn\utils\validation.py:761: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
anaconda3\envs\tf\lib\site-packages\sklearn\model_selection\_split.py:2053: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.
  warnings.warn(CV_WARNING, FutureWarning)

Out[27]:

LogisticRegressionCV(Cs=10, class_weight=None, cv='warn', dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='warn', n_jobs=None, penalty='l2',
           random_state=None, refit=True, scoring=None, solver='lbfgs',
           tol=0.0001, verbose=0)

You can now plot the decision boundary of these models. Run the code below.In [29]:

# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y[0])
plt.title("Logistic Regression")

# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y, LR_predictions) + np.dot(1 - Y,1 - LR_predictions)) / float(Y.size) * 100) +
       '% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

Interpretation: The dataset is not linearly separable, so logistic regression doesn’t perform well. Hopefully a neural network will do better. Let’s try this now!

The general methodology to build a Neural Network is to:

1. Define the neural network structure ( # of input units, # of hidden units, etc).

2. Initialize the model’s parameters

3. Loop: – Implement forward propagation – Compute loss – Implement backward propagation to get the gradients – Update parameters (gradient descent)

Neural Network model

Exercise:

Define three variables: –

n_x: the size of the input layer –

n_h: the size of the hidden layer (set this to 4) –

n_y: the size of the output layerIn [34]:

def layer_sizes(X, Y):
    n_x=X.shape[0]
    n_h=4
    n_y=Y.shape[0]
    return (n_x,n_h,n_y)

In [35]:

X_assess, Y_assess = layer_sizes_test_case()
(n_x, n_h, n_y) = layer_sizes(X_assess, Y_assess)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))
The size of the input layer is: n_x = 5
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 2

Initialize the model’s parameters

Exercise:

Implement the function initialize_parameters()
Instructions: Make sure your parameters’ sizes are right.
Refer to the neural network figure above if needed.
You will initialize the weights matrices with random values.
Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
You will initialize the bias vectors as zeros.
Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.In [97]:

def initialize_parameters(n_x, n_h, n_y):
    np.random.seed(2) 
    W1=np.random.randn(n_h,n_x)*.01
    b1=np.zeros((n_h,1))
    W2=np.random.randn(n_y,n_h)*.01
    b2=np.zeros((n_y,1))
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters            

In [48]:

n_x, n_h, n_y = initialize_parameters_test_case()

parameters = initialize_parameters(n_x, n_h, n_y)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00416758 -0.00056267]
 [-0.02136196  0.01640271]
 [-0.01793436 -0.00841747]
 [ 0.00502881 -0.01245288]]
b1 = [[0.]
 [0.]
 [0.]
 [0.]]
W2 = [[-0.01057952 -0.00909008  0.00551454  0.02292208]]
b2 = [[0.]]

Exercise:

Implement forward_propagation()In [58]:

def forward_propagation(X, parameters):
    W1=parameters["W1"]
    W2=parameters["W2"]
    b1=parameters["b1"]
    b2=parameters["b2"]
    
    Z1=np.dot(W1,X)+b1
    A1=np.tanh(Z1)
    Z2=np.dot(W2,A1)+b2
    A2=sigmoid(Z2)
    
    assert(A2.shape == (1, X.shape[1]))
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    return A2, cache    

In [59]:

X_assess, parameters = forward_propagation_test_case()

A2, cache = forward_propagation(X_assess, parameters)

# Note: we use the mean here just to make sure that your output matches ours. 
print(np.mean(cache['Z1']), np.mean(cache['A1']), np.mean(cache['Z2']), np.mean(cache['A2']))
0.26281864019752443 0.09199904522700109 -1.3076660128732143 0.21287768171914198

Exercise:

Implement compute_cost() to compute the value of the cost J.

In [67]:

def compute_cost(A2, Y, parameters):
    m=Y.shape[1]
    cost=(-1/m)*(np.sum((np.multiply(Y,np.log(A2))+(np.multiply((1-Y),np.log(1-A2))))))
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
    assert(isinstance(cost, float))
    return cost

In [68]:

A2, Y_assess, parameters = compute_cost_test_case()
print("cost = " + str(compute_cost(A2, Y_assess, parameters)))
cost = 0.6930587610394646

Exercise:

Implement the function backward_propagation()

In [75]:

def backward_propagation(parameters, cache, X, Y):
    m=Y.shape[1]
    Z1=cache["Z1"]
    A1=cache["A1"]
    Z2=cache["Z2"]
    A2=cache["A2"]
    
    dZ2=A2-Y
    dW2=(1/m)*np.dot(dZ2,A1.T)
    db2=(1/m)*np.sum(dZ2,axis=1,keepdims=True)
    
    dZ1=np.multiply(np.dot(parameters["W2"].T,dZ2),1-np.power(A1,2))
    dW1=(1/m)*np.dot(dZ1,X.T)
    db1=(1/m)*np.sum(dZ1,axis=1,keepdims=True)
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [76]:

parameters, cache, X_assess, Y_assess = backward_propagation_test_case()

grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))
dW1 = [[ 0.00301023 -0.00747267]
 [ 0.00257968 -0.00641288]
 [-0.00156892  0.003893  ]
 [-0.00652037  0.01618243]]
db1 = [[ 0.00176201]
 [ 0.00150995]
 [-0.00091736]
 [-0.00381422]]
dW2 = [[ 0.00078841  0.01765429 -0.00084166 -0.01022527]]
db2 = [[-0.16655712]]

Exercise:

Implement the update rule.
Use gradient descent. #
You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).In [80]:

def update_parameters(parameters, grads, learning_rate = 1.2):
            
    dw1=grads["dW1"]
    db1=grads["db1"]
    dw2=grads["dW2"]
    db2=grads["db2"]
    
    W1=parameters["W1"]
    W2=parameters["W2"]
    b1=parameters["b1"]
    b2=parameters["b2"]
    
    W1=W1-np.multiply(learning_rate,dw1)
    b1=b1-np.multiply(learning_rate,db1)
    W2=W2-np.multiply(learning_rate,dw2)
    b2=b2-np.multiply(learning_rate,db2)
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [81]:

parameters, grads = update_parameters_test_case()
parameters = update_parameters(parameters, grads)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00643025  0.01936718]
 [-0.02410458  0.03978052]
 [-0.01653973 -0.02096177]
 [ 0.01046864 -0.05990141]]
b1 = [[-1.02420756e-06]
 [ 1.27373948e-05]
 [ 8.32996807e-07]
 [-3.20136836e-06]]
W2 = [[-0.01041081 -0.04463285  0.01758031  0.04747113]]
b2 = [[0.00010457]]

Exercise:

Build your neural network model in nn_model()In [85]:

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=True):
    np.random.seed(3)
    (n_x,n_h,n_y)=layer_sizes(X, Y)
    n_h=n_h
    parameters = initialize_parameters(n_x, n_h, n_y)
    costs=[]
    for i in range(num_iterations):
        A2, cache = forward_propagation(X, parameters)
        cost=compute_cost(A2, Y, parameters)
        grads = backward_propagation(parameters, cache, X, Y)
        parameters = update_parameters(parameters, grads)
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" % (i, cost))
    return parameters        

In [87]:

X_assess, Y_assess = nn_model_test_case()

parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=True)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
Cost after iteration 0: 0.692739
Cost after iteration 1000: 0.000218
Cost after iteration 2000: 0.000107
Cost after iteration 3000: 0.000071
Cost after iteration 4000: 0.000053
Cost after iteration 5000: 0.000042
Cost after iteration 6000: 0.000035
Cost after iteration 7000: 0.000030
Cost after iteration 8000: 0.000026
Cost after iteration 9000: 0.000023
W1 = [[-0.65848169  1.21866811]
 [-0.76204273  1.39377573]
 [ 0.5792005  -1.10397703]
 [ 0.76773391 -1.41477129]]
b1 = [[ 0.287592  ]
 [ 0.3511264 ]
 [-0.2431246 ]
 [-0.35772805]]
W2 = [[-2.45566237 -3.27042274  2.00784958  3.36773273]]
b2 = [[0.20459656]]

Exercise:

Use your model to predict by building predict(). Use forward propagation to predict results.

In [90]:

def predict(parameters, X):
        A2, cache = forward_propagation(X, parameters)
        predictions = np.round(A2)
        return predictions

In [91]:

parameters, X_assess = predict_test_case()

predictions = predict(parameters, X_assess)
print("predictions mean = " + str(np.mean(predictions)))
predictions mean = 0.6666666666666666

It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layerIn [93]:

# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y[0])
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219504
Cost after iteration 9000: 0.218571

Out[93]:

Text(0.5, 1.0, 'Decision Boundary for hidden layer size 4')

In [94]:

# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy: 90%

Refrences:

https://www.coursera.org/ Deep learning Specialization

planar_utils.py file code below:

In [104]:

# import matplotlib.pyplot as plt
# import numpy as np
# import sklearn
# import sklearn.datasets
# import sklearn.linear_model

# def plot_decision_boundary(model, X, y):
#     # Set min and max values and give it some padding
#     x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1
#     y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1
#     h = 0.01
#     # Generate a grid of points with distance h between them
#     xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
#     # Predict the function value for the whole grid
#     Z = model(np.c_[xx.ravel(), yy.ravel()])
#     Z = Z.reshape(xx.shape)
#     # Plot the contour and training examples
#     plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
#     plt.ylabel('x2')
#     plt.xlabel('x1')
#     plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral)
    

# def sigmoid(x):
#     """
#     Compute the sigmoid of x
#     Arguments:
#     x -- A scalar or numpy array of any size.
#     Return:
#     s -- sigmoid(x)
#     """
#     s = 1/(1+np.exp(-x))
#     return s

# def load_planar_dataset():
#     np.random.seed(1)
#     m = 400 # number of examples
#     N = int(m/2) # number of points per class
#     D = 2 # dimensionality
#     X = np.zeros((m,D)) # data matrix where each row is a single example
#     Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
#     a = 4 # maximum ray of the flower

#     for j in range(2):
#         ix = range(N*j,N*(j+1))
#         t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
#         r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
#         X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
#         Y[ix] = j
        
#     X = X.T
#     Y = Y.T

#     return X, Y

# def load_extra_datasets():  
#     N = 200
#     noisy_circles = sklearn.datasets.make_circles(n_samples=N, factor=.5, noise=.3)
#     noisy_moons = sklearn.datasets.make_moons(n_samples=N, noise=.2)
#     blobs = sklearn.datasets.make_blobs(n_samples=N, random_state=5, n_features=2, centers=6)
#     gaussian_quantiles = sklearn.datasets.make_gaussian_quantiles(mean=None, cov=0.5, n_samples=N, n_features=2, n_classes=2, shuffle=True, random_state=None)
#     no_structure = np.random.rand(N, 2), np.random.rand(N, 2)
    
#     return noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s