
You will learn to:
Build the general architecture of a learning algorithm, including:
Initializing parameters
Calculating the cost function and its gradient
Using an optimization algorithm (gradient descent)
Gather all three functions above into a main model function, in the right order.
Overview of the Problem set
Problem Statement: You are given a dataset containing:
- a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
- a test set of m_test images labeled as cat or non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
You will build a simple image-recognition algorithm that can correctly classify pictures as cat or non-cat.
Show me the code
import numpy as np import matplotlib.pyplot as plt import h5py import scipy from PIL import Image from scipy import ndimage from lr_utils import load_dataset # for this lib git link is provided %matplotlib inline
Let’s get more familiar with the dataset.In [183]:
# Loading the data (cat/non-cat) train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
In [28]:
train_set_x_orig.shape
Out[28]:
(209, 64, 64, 3)
In [29]:
train_set_y.shape
Out[29]:
(1, 209)
In [21]:
classes
Out[21]:
array([b'non-cat', b'cat'], dtype='|S7')
In [184]:
# Example of a picture index = 200 plt.imshow(train_set_x_orig[index])
Out[184]:
<matplotlib.image.AxesImage at 0x22c6a3ba240>

Exercise: 1
Find the values for:
- m_train (number of training examples)
- m_test (number of test examples)
- num_px (= height = width of a training image) Remember that train_set_x_orig is a numpy-array of shape (m_train, num_px, num_px, 3). For instance, you can access m_train by writing train_set_x_orig.shape[0].
In [186]:
m_train = train_set_x_orig.shape[0] m_test = test_set_x_orig.shape[0] num_px = train_set_x_orig.shape[1] print ("Number of training examples: m_train = " + str(m_train)) print ("Number of testing examples: m_test = " + str(m_test)) print ("Height/Width of each image: num_px = " + str(num_px)) print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)") print ("train_set_x shape: " + str(train_set_x_orig.shape)) print ("train_set_y shape: " + str(train_set_y.shape)) print ("test_set_x shape: " + str(test_set_x_orig.shape)) print ("test_set_y shape: " + str(test_set_y.shape))
Number of training examples: m_train = 209 Number of testing examples: m_test = 50 Height/Width of each image: num_px = 64 Each image is of size: (64, 64, 3) train_set_x shape: (209, 64, 64, 3) train_set_y shape: (1, 209) test_set_x shape: (50, 64, 64, 3) test_set_y shape: (1, 50)
Exercise:2
Standardize and Reshape the training and test data sets so that images of size (num_px, num_px, 3) are flattened into single vectors of shape (num_px ∗ num_px ∗ 3, 1).In [66]:
train_set_x_flatten =train_set_x_orig.reshape(train_set_x_orig.shape[1]*train_set_x_orig.shape[1]*3,train_set_x_orig.shape[0])
In [67]:
train_set_x_flatten .shape
Out[67]:
(12288, 209)
In [54]:
test_set_x_flatten =test_set_x_orig.reshape(test_set_x_orig.shape[1]*test_set_x_orig.shape[1]*3,test_set_x_orig.shape[0])
In [56]:
test_set_x_flatten.shape
Out[56]:
(12288, 50)
To represent color images, the red, green and blue channels (RGB) must be specified for each pixel, and so the pixel value is actually a vector of three numbers ranging from 0 to 255. Let’s standardize our dataset.In [69]:
train_set_x = train_set_x_flatten/255. test_set_x = test_set_x_flatten/255.
General Architecture of the learning algorithm
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude
Building the parts of our algorithm
The main steps for building a Neural Network are: Define the model structure (such as number of input features) Initialize the model’s parameters Loop: Calculate current loss (forward propagation) Calculate current gradient (backward propagation) Update parameters (gradient descent) You often build 1-3 separately and integrate them into one function we call model().
Exercise:3
Implement sigmoid() function .In [188]:
def sigmoid(z): return 1/(1+np.exp(-z))
In [189]:
print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))
sigmoid([0, 2]) = [0.5 0.88079708]
Initializing parameters
Exercise:4
Implement parameter initialization. You have to initialize w as a vector of zeros. If you don’t know what numpy function to use, look up np.zeros() in the Numpy library’s documentation.In [192]:
def initialize_with_zeros(dim): w=np.zeros((dim,1)) b=0 return w,b
In [193]:
dim = 5 w, b = initialize_with_zeros(dim) print ("w = " + str(w)) print ("b = " + str(b))
w = [[0.] [0.] [0.] [0.] [0.]] b = 0
Forward and Backward propagation
Exercise:5
Implement a function propagate() that computes the cost function and its gradient.In [194]:
def propagate(w, b, X, Y): m=X.shape[1] A=sigmoid((np.dot(w.T,X) + b)) yloga=np.multiply(Y,np.log(A)) ylogaa=np.multiply((1-Y),np.log(1-A)) cost=(-1/m)*(np.sum(yloga+ylogaa)) dw=(1/m)*(np.dot(X,(A-Y).T)) db=(1/m)*(np.sum(A-Y)) assert(dw.shape == w.shape) assert(db.dtype == float) cost = np.squeeze(cost) assert(cost.shape == ()) grads = {"dw": dw, "db": db} return grads, cost
In [195]:
w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]]) grads, cost = propagate(w, b, X, Y) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"])) print ("cost = " + str(cost))
dw = [[0.99845601] [2.39507239]] db = 0.001455578136784208 cost = 5.801545319394553
Optimization
You have initialized your parameters. You are also able to compute a cost function and its gradient. Now, you want to update the parameters using gradient descent.
Exercise: 6
Write down the optimization function. The goal is to learn w and b by minimizing the cost function J . For a parameter θ ,the update rule is θ=θ−α*dθ ,where α is the learning rate.In [196]:
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False): costs=[] for i in range(num_iterations): grads, cost=propagate(w, b, X, Y) dw=grads['dw'] db=grads['db'] w=w-np.multiply(learning_rate,dw) b=b-np.multiply(learning_rate,db) costs.append(cost) # Record the costs if i % 100 == 0: costs.append(cost) # Print the cost every 100 training iterations if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) params = {"w": w, "b": b} grads = {"dw": dw, "db": db} return params, grads, costs
In [197]:
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False) print ("w = " + str(params["w"])) print ("b = " + str(params["b"])) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"]))
w = [[0.19033591] [0.12259159]] b = 1.9253598300845747 dw = [[0.67752042] [1.41625495]] db = 0.21919450454067652
Exercise: 7
We are able to learned w and b for a dataset X. Now Implement the predict() function.In [147]:
def predict(w, b, X): m = X.shape[1] Y_prediction = np.zeros((1,m)) A=sigmoid(np.dot(w.T,X)+b) for i in range(A.shape[1]): if A[0,i] > 0.5: Y_prediction[0][i] = 1 else: Y_prediction[0][i] = 0 assert(Y_prediction.shape == (1, m)) return Y_prediction
In [148]:
w = np.array([[0.1124579],[0.23106775]]) b = -0.3 X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]]) print ("predictions = " + str(predict(w, b, X)))
predictions = [[1. 1. 0.]]
Merge all functions into a model
You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.
Exercise: Implement the model function. Use the following notation:
- Y_prediction_test for your predictions on the test set
- Y_prediction_train for your predictions on the train set
- w, costs, grads for the outputs of optimize()
In [198]:
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False): dim=X_train.shape[0] w,b=initialize_with_zeros(dim) params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations= num_iterations, learning_rate= learning_rate, print_cost= True) Y_prediction_train=predict(params["w"], params["b"], X_train) Y_prediction_test=predict(params["w"],params["b"], X_test) print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100)) d = {"costs": costs, "Y_prediction_test": Y_prediction_test, "Y_prediction_train" : Y_prediction_train, "w" : params["w"], "b" : params["b"], "learning_rate" : learning_rate, "num_iterations": num_iterations} return d
In [203]:
d=model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
Cost after iteration 0: 0.693147 Cost after iteration 100: 0.709726 Cost after iteration 200: 0.657712 Cost after iteration 300: 0.614611 Cost after iteration 400: 0.578001 Cost after iteration 500: 0.546372 Cost after iteration 6: 0.518331 Cost after iteration 700: 0.492852 Cost after iteration 800: 0.469259 Cost after iteration 900: 0.447139 Cost after iteration 1000: 0.426262 Cost after iteration 1100: 0.406617 Cost after iteration 1200: 0.388723 Cost after iteration 1300: 0.374678 Cost after iteration 1400: 0.365826 Cost after iteration 1500: 0.358532 Cost after iteration 1600: 0.351612 Cost after iteration 1700: 0.345012 Cost after iteration 1800: 0.338704 Cost after iteration 1900: 0.332664 train accuracy: 91.38755980861244 % test accuracy: 34.0 %
In [162]:
# Plot learning curve (with costs) costs = np.squeeze(d['costs']) plt.plot(costs) plt.ylabel('cost') plt.xlabel('iterations (per hundreds)') plt.title("Learning rate =" + str(d["learning_rate"])) plt.show()

In [165]:
classes
Out[165]:
array([b'non-cat', b'cat'], dtype='|S7')
In [175]:
# Example of a picture that was wrongly classified. index = 2 plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3))) # print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") + "\" picture.")
Out[175]:
<matplotlib.image.AxesImage at 0x22c6a2d2f60>

In [ ]:
learning_rates = [0.01, 0.001, 0.0001] models = {} for i in learning_rates: print ("learning rate is: " + str(i)) models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False) print ('\n' + "-------------------------------------------------------" + '\n') for i in learning_rates: plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"])) plt.ylabel('cost') plt.xlabel('iterations (hundreds)') legend = plt.legend(loc='upper center', shadow=True) frame = legend.get_frame() frame.set_facecolor('0.90') plt.show()
learning rate is: 0.01 Cost after iteration 0: 0.693147 Cost after iteration 100: 2.321788 Cost after iteration 200: 3.011239 Cost after iteration 300: 0.483519 Cost after iteration 400: 1.297533 Cost after iteration 500: 1.215430 Cost after iteration 600: 1.135770 Cost after iteration 700: 0.901737 Cost after iteration 800: 0.821976 Cost after iteration 900: 0.791033 Cost after iteration 1000: 0.762400 Cost after iteration 1100: 0.736228 Cost after iteration 1200: 0.711983 Cost after iteration 1300: 0.689076 Cost after iteration 1400: 0.667013 train accuracy: 71.29186602870814 % test accuracy: 64.0 % ------------------------------------------------------- learning rate is: 0.001 Cost after iteration 0: 0.693147 Cost after iteration 100: 0.605784 Cost after iteration 200: 0.589938 Cost after iteration 300: 0.577890 Cost after iteration 400: 0.567791 Cost after iteration 500: 0.559013 Cost after iteration 600: 0.551207 Cost after iteration 700: 0.544146 Cost after iteration 800: 0.537671 Cost after iteration 900: 0.531668 Cost after iteration 1000: 0.526054 Cost after iteration 1100: 0.520764 Cost after iteration 1200: 0.515752 Cost after iteration 1300: 0.510979 Cost after iteration 1400: 0.506416 train accuracy: 74.16267942583733 % test accuracy: 34.0 % ------------------------------------------------------- learning rate is: 0.0001 Cost after iteration 0: 0.693147 Cost after iteration 100: 0.636292 Cost after iteration 200: 0.630322 Cost after iteration 300: 0.625487 Cost after iteration 400: 0.621470 Cost after iteration 500: 0.618051 Cost after iteration 600: 0.615075 Cost after iteration 700: 0.612432 Cost after iteration 800: 0.610042 Cost after iteration 900: 0.607850 Cost after iteration 1000: 0.605814 Cost after iteration 1100: 0.603904 Cost after iteration 1200: 0.602098 Cost after iteration 1300: 0.600377 Cost after iteration 1400: 0.598731
lr_utils.py code
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
REFERENCES:
coursera.org
deep learning specialization