CODE A NEURAL NETWORK IN PLAIN NUMPY Part 2: Planar data classification with one hidden layer

In the last post we have seen neural network with only two layers that is “Input layer” and “Output layer”, which is like a logistic regression algorithm. However in this post we are going to code a Neural network with one more layer that is “hidden layer”.

You will learn how to:

  • Implement a 2-class classification neural network with a single hidden layer
  • Use units with a non-linear activation function, such as tanh
  • Compute the cross entropy loss
  • Implement forward and backward propagation
# Package imports
import numpy as np
import matplotlib.pyplot as plt
from testCases_v2 import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
%matplotlib inline
np.random.seed(1) # set a seed so that the results are consistent

Dataset

First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.In [16]:

X, Y = load_planar_dataset()

Visualize the dataset using matplotlib. The data looks like a “flower” with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data. In other words, we want the classifier to define regions as either red or blue.In [17]:

plt.scatter(X[0, :], X[1, :], c=Y[0], s=40, cmap=plt.cm.Spectral)

Out[17]:

<matplotlib.collections.PathCollection at 0x27c7e1ee7f0>

You have:

  • a numpy-array (matrix) X that contains your features (x1, x2)
  • a numpy-array (vector) Y that contains your labels (red:0, blue:1).

Lets first get a better sense of what our data is like.

Exercise:

How many training examples do you have? In addition, what is the shape of the variables X and Y?In [19]:

X.shape,Y.shape

Out[19]:

((2, 400), (1, 400))

In [25]:

X.T.shape,Y.T.shape

Out[25]:

((400, 2), (400, 1))

Simple Logistic Regression

In [22]:

clf = sklearn.linear_model.LogisticRegressionCV();

Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.In [27]:

clf.fit(X.T,Y.T)
anaconda3\envs\tf\lib\site-packages\sklearn\utils\validation.py:761: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
anaconda3\envs\tf\lib\site-packages\sklearn\model_selection\_split.py:2053: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.
  warnings.warn(CV_WARNING, FutureWarning)

Out[27]:

LogisticRegressionCV(Cs=10, class_weight=None, cv='warn', dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='warn', n_jobs=None, penalty='l2',
           random_state=None, refit=True, scoring=None, solver='lbfgs',
           tol=0.0001, verbose=0)

You can now plot the decision boundary of these models. Run the code below.In [29]:

# Plot the decision boundary for logistic regression
plot_decision_boundary(lambda x: clf.predict(x), X, Y[0])
plt.title("Logistic Regression")

# Print accuracy
LR_predictions = clf.predict(X.T)
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y, LR_predictions) + np.dot(1 - Y,1 - LR_predictions)) / float(Y.size) * 100) +
       '% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

Interpretation: The dataset is not linearly separable, so logistic regression doesn’t perform well. Hopefully a neural network will do better. Let’s try this now!

The general methodology to build a Neural Network is to:

1. Define the neural network structure ( # of input units, # of hidden units, etc).

2. Initialize the model’s parameters

3. Loop: – Implement forward propagation – Compute loss – Implement backward propagation to get the gradients – Update parameters (gradient descent)

Neural Network model

Exercise:

Define three variables: –

n_x: the size of the input layer –

n_h: the size of the hidden layer (set this to 4) –

n_y: the size of the output layerIn [34]:

def layer_sizes(X, Y):
    n_x=X.shape[0]
    n_h=4
    n_y=Y.shape[0]
    return (n_x,n_h,n_y)

In [35]:

X_assess, Y_assess = layer_sizes_test_case()
(n_x, n_h, n_y) = layer_sizes(X_assess, Y_assess)
print("The size of the input layer is: n_x = " + str(n_x))
print("The size of the hidden layer is: n_h = " + str(n_h))
print("The size of the output layer is: n_y = " + str(n_y))
The size of the input layer is: n_x = 5
The size of the hidden layer is: n_h = 4
The size of the output layer is: n_y = 2

Initialize the model’s parameters

Exercise:

Implement the function initialize_parameters()
Instructions: Make sure your parameters’ sizes are right.
Refer to the neural network figure above if needed.
You will initialize the weights matrices with random values.
Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
You will initialize the bias vectors as zeros.
Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.In [97]:

def initialize_parameters(n_x, n_h, n_y):
    np.random.seed(2) 
    W1=np.random.randn(n_h,n_x)*.01
    b1=np.zeros((n_h,1))
    W2=np.random.randn(n_y,n_h)*.01
    b2=np.zeros((n_y,1))
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters            

In [48]:

n_x, n_h, n_y = initialize_parameters_test_case()

parameters = initialize_parameters(n_x, n_h, n_y)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00416758 -0.00056267]
 [-0.02136196  0.01640271]
 [-0.01793436 -0.00841747]
 [ 0.00502881 -0.01245288]]
b1 = [[0.]
 [0.]
 [0.]
 [0.]]
W2 = [[-0.01057952 -0.00909008  0.00551454  0.02292208]]
b2 = [[0.]]

Exercise:

Implement forward_propagation()In [58]:

def forward_propagation(X, parameters):
    W1=parameters["W1"]
    W2=parameters["W2"]
    b1=parameters["b1"]
    b2=parameters["b2"]
    
    Z1=np.dot(W1,X)+b1
    A1=np.tanh(Z1)
    Z2=np.dot(W2,A1)+b2
    A2=sigmoid(Z2)
    
    assert(A2.shape == (1, X.shape[1]))
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    return A2, cache    

In [59]:

X_assess, parameters = forward_propagation_test_case()

A2, cache = forward_propagation(X_assess, parameters)

# Note: we use the mean here just to make sure that your output matches ours. 
print(np.mean(cache['Z1']), np.mean(cache['A1']), np.mean(cache['Z2']), np.mean(cache['A2']))
0.26281864019752443 0.09199904522700109 -1.3076660128732143 0.21287768171914198

Exercise:

Implement compute_cost() to compute the value of the cost J.

In [67]:

def compute_cost(A2, Y, parameters):
    m=Y.shape[1]
    cost=(-1/m)*(np.sum((np.multiply(Y,np.log(A2))+(np.multiply((1-Y),np.log(1-A2))))))
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
    assert(isinstance(cost, float))
    return cost

In [68]:

A2, Y_assess, parameters = compute_cost_test_case()
print("cost = " + str(compute_cost(A2, Y_assess, parameters)))
cost = 0.6930587610394646

Exercise:

Implement the function backward_propagation()

In [75]:

def backward_propagation(parameters, cache, X, Y):
    m=Y.shape[1]
    Z1=cache["Z1"]
    A1=cache["A1"]
    Z2=cache["Z2"]
    A2=cache["A2"]
    
    dZ2=A2-Y
    dW2=(1/m)*np.dot(dZ2,A1.T)
    db2=(1/m)*np.sum(dZ2,axis=1,keepdims=True)
    
    dZ1=np.multiply(np.dot(parameters["W2"].T,dZ2),1-np.power(A1,2))
    dW1=(1/m)*np.dot(dZ1,X.T)
    db1=(1/m)*np.sum(dZ1,axis=1,keepdims=True)
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [76]:

parameters, cache, X_assess, Y_assess = backward_propagation_test_case()

grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))
dW1 = [[ 0.00301023 -0.00747267]
 [ 0.00257968 -0.00641288]
 [-0.00156892  0.003893  ]
 [-0.00652037  0.01618243]]
db1 = [[ 0.00176201]
 [ 0.00150995]
 [-0.00091736]
 [-0.00381422]]
dW2 = [[ 0.00078841  0.01765429 -0.00084166 -0.01022527]]
db2 = [[-0.16655712]]

Exercise:

Implement the update rule.
Use gradient descent. #
You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).In [80]:

def update_parameters(parameters, grads, learning_rate = 1.2):
            
    dw1=grads["dW1"]
    db1=grads["db1"]
    dw2=grads["dW2"]
    db2=grads["db2"]
    
    W1=parameters["W1"]
    W2=parameters["W2"]
    b1=parameters["b1"]
    b2=parameters["b2"]
    
    W1=W1-np.multiply(learning_rate,dw1)
    b1=b1-np.multiply(learning_rate,db1)
    W2=W2-np.multiply(learning_rate,dw2)
    b2=b2-np.multiply(learning_rate,db2)
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [81]:

parameters, grads = update_parameters_test_case()
parameters = update_parameters(parameters, grads)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00643025  0.01936718]
 [-0.02410458  0.03978052]
 [-0.01653973 -0.02096177]
 [ 0.01046864 -0.05990141]]
b1 = [[-1.02420756e-06]
 [ 1.27373948e-05]
 [ 8.32996807e-07]
 [-3.20136836e-06]]
W2 = [[-0.01041081 -0.04463285  0.01758031  0.04747113]]
b2 = [[0.00010457]]

Exercise:

Build your neural network model in nn_model()In [85]:

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=True):
    np.random.seed(3)
    (n_x,n_h,n_y)=layer_sizes(X, Y)
    n_h=n_h
    parameters = initialize_parameters(n_x, n_h, n_y)
    costs=[]
    for i in range(num_iterations):
        A2, cache = forward_propagation(X, parameters)
        cost=compute_cost(A2, Y, parameters)
        grads = backward_propagation(parameters, cache, X, Y)
        parameters = update_parameters(parameters, grads)
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" % (i, cost))
    return parameters        

In [87]:

X_assess, Y_assess = nn_model_test_case()

parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=True)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
Cost after iteration 0: 0.692739
Cost after iteration 1000: 0.000218
Cost after iteration 2000: 0.000107
Cost after iteration 3000: 0.000071
Cost after iteration 4000: 0.000053
Cost after iteration 5000: 0.000042
Cost after iteration 6000: 0.000035
Cost after iteration 7000: 0.000030
Cost after iteration 8000: 0.000026
Cost after iteration 9000: 0.000023
W1 = [[-0.65848169  1.21866811]
 [-0.76204273  1.39377573]
 [ 0.5792005  -1.10397703]
 [ 0.76773391 -1.41477129]]
b1 = [[ 0.287592  ]
 [ 0.3511264 ]
 [-0.2431246 ]
 [-0.35772805]]
W2 = [[-2.45566237 -3.27042274  2.00784958  3.36773273]]
b2 = [[0.20459656]]

Exercise:

Use your model to predict by building predict(). Use forward propagation to predict results.

In [90]:

def predict(parameters, X):
        A2, cache = forward_propagation(X, parameters)
        predictions = np.round(A2)
        return predictions

In [91]:

parameters, X_assess = predict_test_case()

predictions = predict(parameters, X_assess)
print("predictions mean = " + str(np.mean(predictions)))
predictions mean = 0.6666666666666666

It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layerIn [93]:

# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y[0])
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219504
Cost after iteration 9000: 0.218571

Out[93]:

Text(0.5, 1.0, 'Decision Boundary for hidden layer size 4')

In [94]:

# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy: 90%

Refrences:

https://www.coursera.org/ Deep learning Specialization

planar_utils.py file code below:

In [104]:

# import matplotlib.pyplot as plt
# import numpy as np
# import sklearn
# import sklearn.datasets
# import sklearn.linear_model

# def plot_decision_boundary(model, X, y):
#     # Set min and max values and give it some padding
#     x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1
#     y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1
#     h = 0.01
#     # Generate a grid of points with distance h between them
#     xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
#     # Predict the function value for the whole grid
#     Z = model(np.c_[xx.ravel(), yy.ravel()])
#     Z = Z.reshape(xx.shape)
#     # Plot the contour and training examples
#     plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
#     plt.ylabel('x2')
#     plt.xlabel('x1')
#     plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral)
    

# def sigmoid(x):
#     """
#     Compute the sigmoid of x
#     Arguments:
#     x -- A scalar or numpy array of any size.
#     Return:
#     s -- sigmoid(x)
#     """
#     s = 1/(1+np.exp(-x))
#     return s

# def load_planar_dataset():
#     np.random.seed(1)
#     m = 400 # number of examples
#     N = int(m/2) # number of points per class
#     D = 2 # dimensionality
#     X = np.zeros((m,D)) # data matrix where each row is a single example
#     Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
#     a = 4 # maximum ray of the flower

#     for j in range(2):
#         ix = range(N*j,N*(j+1))
#         t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
#         r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
#         X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
#         Y[ix] = j
        
#     X = X.T
#     Y = Y.T

#     return X, Y

# def load_extra_datasets():  
#     N = 200
#     noisy_circles = sklearn.datasets.make_circles(n_samples=N, factor=.5, noise=.3)
#     noisy_moons = sklearn.datasets.make_moons(n_samples=N, noise=.2)
#     blobs = sklearn.datasets.make_blobs(n_samples=N, random_state=5, n_features=2, centers=6)
#     gaussian_quantiles = sklearn.datasets.make_gaussian_quantiles(mean=None, cov=0.5, n_samples=N, n_features=2, n_classes=2, shuffle=True, random_state=None)
#     no_structure = np.random.rand(N, 2), np.random.rand(N, 2)
    
#     return noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure

One thought on “CODE A NEURAL NETWORK IN PLAIN NUMPY Part 2: Planar data classification with one hidden layer

  1. Trulife Distribution – Nutrition Distribution helps our clients achieve success in a complex, competitive retail environment. Our team of nutrition industry experts takes care of everything from importation compliance to marketing, sales and distribution at the ground level. There is no need to navigate the complicated intricacies of the American market when we have already done the work. Let us use our experience to expand your brand and put your product into the hands of American consumers. https://trulifedist.com/

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s