In the last post we have seen neural network with only two layers that is “Input layer” and “Output layer”, which is like a logistic regression algorithm. However in this post we are going to code a Neural network with one more layer that is “hidden layer”.
You will learn how to:
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh
- Compute the cross entropy loss
- Implement forward and backward propagation
# Package imports import numpy as np import matplotlib.pyplot as plt from testCases_v2 import * import sklearn import sklearn.datasets import sklearn.linear_model from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets %matplotlib inline np.random.seed(1) # set a seed so that the results are consistent
Dataset
First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.In [16]:
X, Y = load_planar_dataset()
Visualize the dataset using matplotlib. The data looks like a “flower” with some red (label y=0) and some blue (y=1) points. Your goal is to build a model to fit this data. In other words, we want the classifier to define regions as either red or blue.In [17]:
plt.scatter(X[0, :], X[1, :], c=Y[0], s=40, cmap=plt.cm.Spectral)
Out[17]:
<matplotlib.collections.PathCollection at 0x27c7e1ee7f0>

You have:
- a numpy-array (matrix) X that contains your features (x1, x2)
- a numpy-array (vector) Y that contains your labels (red:0, blue:1).
Lets first get a better sense of what our data is like.
Exercise:
How many training examples do you have? In addition, what is the shape of the variables X and Y?In [19]:
X.shape,Y.shape
Out[19]:
((2, 400), (1, 400))
In [25]:
X.T.shape,Y.T.shape
Out[25]:
((400, 2), (400, 1))
Simple Logistic Regression
In [22]:
clf = sklearn.linear_model.LogisticRegressionCV();
Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.In [27]:
clf.fit(X.T,Y.T)
anaconda3\envs\tf\lib\site-packages\sklearn\utils\validation.py:761: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True) anaconda3\envs\tf\lib\site-packages\sklearn\model_selection\_split.py:2053: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22. warnings.warn(CV_WARNING, FutureWarning)
Out[27]:
LogisticRegressionCV(Cs=10, class_weight=None, cv='warn', dual=False, fit_intercept=True, intercept_scaling=1.0, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)
You can now plot the decision boundary of these models. Run the code below.In [29]:
# Plot the decision boundary for logistic regression plot_decision_boundary(lambda x: clf.predict(x), X, Y[0]) plt.title("Logistic Regression") # Print accuracy LR_predictions = clf.predict(X.T) print ('Accuracy of logistic regression: %d ' % float((np.dot(Y, LR_predictions) + np.dot(1 - Y,1 - LR_predictions)) / float(Y.size) * 100) + '% ' + "(percentage of correctly labelled datapoints)")
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)

Interpretation: The dataset is not linearly separable, so logistic regression doesn’t perform well. Hopefully a neural network will do better. Let’s try this now!
The general methodology to build a Neural Network is to:
1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model’s parameters
3. Loop: – Implement forward propagation – Compute loss – Implement backward propagation to get the gradients – Update parameters (gradient descent)
Neural Network model
Exercise:
Define three variables: –
n_x: the size of the input layer –
n_h: the size of the hidden layer (set this to 4) –
n_y: the size of the output layerIn [34]:
def layer_sizes(X, Y): n_x=X.shape[0] n_h=4 n_y=Y.shape[0] return (n_x,n_h,n_y)
In [35]:
X_assess, Y_assess = layer_sizes_test_case() (n_x, n_h, n_y) = layer_sizes(X_assess, Y_assess) print("The size of the input layer is: n_x = " + str(n_x)) print("The size of the hidden layer is: n_h = " + str(n_h)) print("The size of the output layer is: n_y = " + str(n_y))
The size of the input layer is: n_x = 5 The size of the hidden layer is: n_h = 4 The size of the output layer is: n_y = 2
Initialize the model’s parameters
Exercise:
Implement the function initialize_parameters()
Instructions: Make sure your parameters’ sizes are right.
Refer to the neural network figure above if needed.
You will initialize the weights matrices with random values.
Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
You will initialize the bias vectors as zeros.
Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.In [97]:
def initialize_parameters(n_x, n_h, n_y): np.random.seed(2) W1=np.random.randn(n_h,n_x)*.01 b1=np.zeros((n_h,1)) W2=np.random.randn(n_y,n_h)*.01 b2=np.zeros((n_y,1)) assert (W1.shape == (n_h, n_x)) assert (b1.shape == (n_h, 1)) assert (W2.shape == (n_y, n_h)) assert (b2.shape == (n_y, 1)) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters
In [48]:
n_x, n_h, n_y = initialize_parameters_test_case() parameters = initialize_parameters(n_x, n_h, n_y) print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00416758 -0.00056267] [-0.02136196 0.01640271] [-0.01793436 -0.00841747] [ 0.00502881 -0.01245288]] b1 = [[0.] [0.] [0.] [0.]] W2 = [[-0.01057952 -0.00909008 0.00551454 0.02292208]] b2 = [[0.]]
Exercise:
Implement forward_propagation()In [58]:
def forward_propagation(X, parameters): W1=parameters["W1"] W2=parameters["W2"] b1=parameters["b1"] b2=parameters["b2"] Z1=np.dot(W1,X)+b1 A1=np.tanh(Z1) Z2=np.dot(W2,A1)+b2 A2=sigmoid(Z2) assert(A2.shape == (1, X.shape[1])) cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2} return A2, cache
In [59]:
X_assess, parameters = forward_propagation_test_case() A2, cache = forward_propagation(X_assess, parameters) # Note: we use the mean here just to make sure that your output matches ours. print(np.mean(cache['Z1']), np.mean(cache['A1']), np.mean(cache['Z2']), np.mean(cache['A2']))
0.26281864019752443 0.09199904522700109 -1.3076660128732143 0.21287768171914198
Exercise:
Implement compute_cost() to compute the value of the cost J.
In [67]:
def compute_cost(A2, Y, parameters): m=Y.shape[1] cost=(-1/m)*(np.sum((np.multiply(Y,np.log(A2))+(np.multiply((1-Y),np.log(1-A2)))))) cost = np.squeeze(cost) # makes sure cost is the dimension we expect. assert(isinstance(cost, float)) return cost
In [68]:
A2, Y_assess, parameters = compute_cost_test_case() print("cost = " + str(compute_cost(A2, Y_assess, parameters)))
cost = 0.6930587610394646
Exercise:
Implement the function backward_propagation()
In [75]:
def backward_propagation(parameters, cache, X, Y): m=Y.shape[1] Z1=cache["Z1"] A1=cache["A1"] Z2=cache["Z2"] A2=cache["A2"] dZ2=A2-Y dW2=(1/m)*np.dot(dZ2,A1.T) db2=(1/m)*np.sum(dZ2,axis=1,keepdims=True) dZ1=np.multiply(np.dot(parameters["W2"].T,dZ2),1-np.power(A1,2)) dW1=(1/m)*np.dot(dZ1,X.T) db1=(1/m)*np.sum(dZ1,axis=1,keepdims=True) grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} return grads
In [76]:
parameters, cache, X_assess, Y_assess = backward_propagation_test_case() grads = backward_propagation(parameters, cache, X_assess, Y_assess) print ("dW1 = "+ str(grads["dW1"])) print ("db1 = "+ str(grads["db1"])) print ("dW2 = "+ str(grads["dW2"])) print ("db2 = "+ str(grads["db2"]))
dW1 = [[ 0.00301023 -0.00747267] [ 0.00257968 -0.00641288] [-0.00156892 0.003893 ] [-0.00652037 0.01618243]] db1 = [[ 0.00176201] [ 0.00150995] [-0.00091736] [-0.00381422]] dW2 = [[ 0.00078841 0.01765429 -0.00084166 -0.01022527]] db2 = [[-0.16655712]]
Exercise:
Implement the update rule.
Use gradient descent. #
You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).In [80]:
def update_parameters(parameters, grads, learning_rate = 1.2): dw1=grads["dW1"] db1=grads["db1"] dw2=grads["dW2"] db2=grads["db2"] W1=parameters["W1"] W2=parameters["W2"] b1=parameters["b1"] b2=parameters["b2"] W1=W1-np.multiply(learning_rate,dw1) b1=b1-np.multiply(learning_rate,db1) W2=W2-np.multiply(learning_rate,dw2) b2=b2-np.multiply(learning_rate,db2) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters
In [81]:
parameters, grads = update_parameters_test_case() parameters = update_parameters(parameters, grads) print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
W1 = [[-0.00643025 0.01936718] [-0.02410458 0.03978052] [-0.01653973 -0.02096177] [ 0.01046864 -0.05990141]] b1 = [[-1.02420756e-06] [ 1.27373948e-05] [ 8.32996807e-07] [-3.20136836e-06]] W2 = [[-0.01041081 -0.04463285 0.01758031 0.04747113]] b2 = [[0.00010457]]
Exercise:
Build your neural network model in nn_model()In [85]:
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=True): np.random.seed(3) (n_x,n_h,n_y)=layer_sizes(X, Y) n_h=n_h parameters = initialize_parameters(n_x, n_h, n_y) costs=[] for i in range(num_iterations): A2, cache = forward_propagation(X, parameters) cost=compute_cost(A2, Y, parameters) grads = backward_propagation(parameters, cache, X, Y) parameters = update_parameters(parameters, grads) # Print the cost every 1000 iterations if print_cost and i % 1000 == 0: print ("Cost after iteration %i: %f" % (i, cost)) return parameters
In [87]:
X_assess, Y_assess = nn_model_test_case() parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=True) print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
Cost after iteration 0: 0.692739 Cost after iteration 1000: 0.000218 Cost after iteration 2000: 0.000107 Cost after iteration 3000: 0.000071 Cost after iteration 4000: 0.000053 Cost after iteration 5000: 0.000042 Cost after iteration 6000: 0.000035 Cost after iteration 7000: 0.000030 Cost after iteration 8000: 0.000026 Cost after iteration 9000: 0.000023 W1 = [[-0.65848169 1.21866811] [-0.76204273 1.39377573] [ 0.5792005 -1.10397703] [ 0.76773391 -1.41477129]] b1 = [[ 0.287592 ] [ 0.3511264 ] [-0.2431246 ] [-0.35772805]] W2 = [[-2.45566237 -3.27042274 2.00784958 3.36773273]] b2 = [[0.20459656]]
Exercise:
Use your model to predict by building predict(). Use forward propagation to predict results.
In [90]:
def predict(parameters, X): A2, cache = forward_propagation(X, parameters) predictions = np.round(A2) return predictions
In [91]:
parameters, X_assess = predict_test_case() predictions = predict(parameters, X_assess) print("predictions mean = " + str(np.mean(predictions)))
predictions mean = 0.6666666666666666
It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layerIn [93]:
# Build a model with a n_h-dimensional hidden layer parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True) # Plot the decision boundary plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y[0]) plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048 Cost after iteration 1000: 0.288083 Cost after iteration 2000: 0.254385 Cost after iteration 3000: 0.233864 Cost after iteration 4000: 0.226792 Cost after iteration 5000: 0.222644 Cost after iteration 6000: 0.219731 Cost after iteration 7000: 0.217504 Cost after iteration 8000: 0.219504 Cost after iteration 9000: 0.218571
Out[93]:
Text(0.5, 1.0, 'Decision Boundary for hidden layer size 4')

In [94]:
# Print accuracy predictions = predict(parameters, X) print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy: 90%
Refrences:
https://www.coursera.org/ Deep learning Specialization
planar_utils.py file code below:
In [104]:
# import matplotlib.pyplot as plt # import numpy as np # import sklearn # import sklearn.datasets # import sklearn.linear_model # def plot_decision_boundary(model, X, y): # # Set min and max values and give it some padding # x_min, x_max = X[0, :].min() - 1, X[0, :].max() + 1 # y_min, y_max = X[1, :].min() - 1, X[1, :].max() + 1 # h = 0.01 # # Generate a grid of points with distance h between them # xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # # Predict the function value for the whole grid # Z = model(np.c_[xx.ravel(), yy.ravel()]) # Z = Z.reshape(xx.shape) # # Plot the contour and training examples # plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral) # plt.ylabel('x2') # plt.xlabel('x1') # plt.scatter(X[0, :], X[1, :], c=y, cmap=plt.cm.Spectral) # def sigmoid(x): # """ # Compute the sigmoid of x # Arguments: # x -- A scalar or numpy array of any size. # Return: # s -- sigmoid(x) # """ # s = 1/(1+np.exp(-x)) # return s # def load_planar_dataset(): # np.random.seed(1) # m = 400 # number of examples # N = int(m/2) # number of points per class # D = 2 # dimensionality # X = np.zeros((m,D)) # data matrix where each row is a single example # Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue) # a = 4 # maximum ray of the flower # for j in range(2): # ix = range(N*j,N*(j+1)) # t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta # r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius # X[ix] = np.c_[r*np.sin(t), r*np.cos(t)] # Y[ix] = j # X = X.T # Y = Y.T # return X, Y # def load_extra_datasets(): # N = 200 # noisy_circles = sklearn.datasets.make_circles(n_samples=N, factor=.5, noise=.3) # noisy_moons = sklearn.datasets.make_moons(n_samples=N, noise=.2) # blobs = sklearn.datasets.make_blobs(n_samples=N, random_state=5, n_features=2, centers=6) # gaussian_quantiles = sklearn.datasets.make_gaussian_quantiles(mean=None, cov=0.5, n_samples=N, n_features=2, n_classes=2, shuffle=True, random_state=None) # no_structure = np.random.rand(N, 2), np.random.rand(N, 2) # return noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure
Trulife Distribution – Nutrition Distribution helps our clients achieve success in a complex, competitive retail environment. Our team of nutrition industry experts takes care of everything from importation compliance to marketing, sales and distribution at the ground level. There is no need to navigate the complicated intricacies of the American market when we have already done the work. Let us use our experience to expand your brand and put your product into the hands of American consumers. https://trulifedist.com/
LikeLiked by 1 person