Code a Neural Network in plain NumPy: Part 1 (with no Hidden Layer)

You will learn to:

Build the general architecture of a learning algorithm, including:
Initializing parameters
Calculating the cost function and its gradient
Using an optimization algorithm (gradient descent)

Gather all three functions above into a main model function, in the right order.

Overview of the Problem set

Problem Statement: You are given a dataset containing:

- a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
- a test set of m_test images labeled as cat or non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).

You will build a simple image-recognition algorithm that can correctly classify pictures as cat or non-cat.

Show me the code

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset # for this lib git link is provided
%matplotlib inline

Let’s get more familiar with the dataset.In [183]:

# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

In [28]:

train_set_x_orig.shape

Out[28]:

(209, 64, 64, 3)

In [29]:

train_set_y.shape

Out[29]:

(1, 209)

In [21]:

classes

Out[21]:

array([b'non-cat', b'cat'], dtype='|S7')

In [184]:

# Example of a picture
index = 200
plt.imshow(train_set_x_orig[index])

Out[184]:

<matplotlib.image.AxesImage at 0x22c6a3ba240>

Exercise: 1

Find the values for:

  • m_train (number of training examples)
  • m_test (number of test examples)
  • num_px (= height = width of a training image) Remember that train_set_x_orig is a numpy-array of shape (m_train, num_px, num_px, 3). For instance, you can access m_train by writing train_set_x_orig.shape[0].

In [186]:

m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)

Exercise:2

Standardize and Reshape the training and test data sets so that images of size (num_px, num_px, 3) are flattened into single vectors of shape (num_px ∗ num_px ∗ 3, 1).In [66]:

train_set_x_flatten =train_set_x_orig.reshape(train_set_x_orig.shape[1]*train_set_x_orig.shape[1]*3,train_set_x_orig.shape[0])

In [67]:

train_set_x_flatten .shape

Out[67]:

(12288, 209)

In [54]:

test_set_x_flatten =test_set_x_orig.reshape(test_set_x_orig.shape[1]*test_set_x_orig.shape[1]*3,test_set_x_orig.shape[0])

In [56]:

test_set_x_flatten.shape

Out[56]:

(12288, 50)

To represent color images, the red, green and blue channels (RGB) must be specified for each pixel, and so the pixel value is actually a vector of three numbers ranging from 0 to 255. Let’s standardize our dataset.In [69]:

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

General Architecture of the learning algorithm

  • Initialize the parameters of the model
  • Learn the parameters for the model by minimizing the cost
  • Use the learned parameters to make predictions (on the test set)
  • Analyse the results and conclude

Building the parts of our algorithm

The main steps for building a Neural Network are: Define the model structure (such as number of input features) Initialize the model’s parameters Loop: Calculate current loss (forward propagation) Calculate current gradient (backward propagation) Update parameters (gradient descent) You often build 1-3 separately and integrate them into one function we call model().

Exercise:3

Implement sigmoid() function .In [188]:

def sigmoid(z):
    return 1/(1+np.exp(-z))

In [189]:

print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))
sigmoid([0, 2]) = [0.5        0.88079708]

Initializing parameters

Exercise:4

Implement parameter initialization. You have to initialize w as a vector of zeros. If you don’t know what numpy function to use, look up np.zeros() in the Numpy library’s documentation.In [192]:

def initialize_with_zeros(dim):
    w=np.zeros((dim,1))
    b=0
    return w,b

In [193]:

dim = 5
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))
w = [[0.]
 [0.]
 [0.]
 [0.]
 [0.]]
b = 0

Forward and Backward propagation

Exercise:5

Implement a function propagate() that computes the cost function and its gradient.In [194]:

def propagate(w, b, X, Y):
    m=X.shape[1]
    A=sigmoid((np.dot(w.T,X) + b))
    yloga=np.multiply(Y,np.log(A))
    ylogaa=np.multiply((1-Y),np.log(1-A))
    cost=(-1/m)*(np.sum(yloga+ylogaa))
    dw=(1/m)*(np.dot(X,(A-Y).T))
    db=(1/m)*(np.sum(A-Y))
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    grads = {"dw": dw,
             "db": db}
    return grads, cost

In [195]:

w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
dw = [[0.99845601]
 [2.39507239]]
db = 0.001455578136784208
cost = 5.801545319394553

Optimization

You have initialized your parameters. You are also able to compute a cost function and its gradient. Now, you want to update the parameters using gradient descent.

Exercise: 6

Write down the optimization function. The goal is to learn w and b by minimizing the cost function J . For a parameter θ ,the update rule is θ=θ−α*dθ ,where α is the learning rate.In [196]:

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    costs=[]
    for i in range(num_iterations):
        grads, cost=propagate(w, b, X, Y)
        
        dw=grads['dw']
        db=grads['db']
        
        w=w-np.multiply(learning_rate,dw)
        b=b-np.multiply(learning_rate,db)
        
        costs.append(cost)
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
            
        # Print the cost every 100 training iterations
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    grads = {"dw": dw,
             "db": db} 
    return params, grads, costs

In [197]:

params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
w = [[0.19033591]
 [0.12259159]]
b = 1.9253598300845747
dw = [[0.67752042]
 [1.41625495]]
db = 0.21919450454067652

Exercise: 7

We are able to learned w and b for a dataset X. Now Implement the predict() function.In [147]:

def predict(w, b, X):
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    A=sigmoid(np.dot(w.T,X)+b)
    for i in range(A.shape[1]):
            if A[0,i] > 0.5:
                Y_prediction[0][i] = 1
            else:
                Y_prediction[0][i] = 0
    assert(Y_prediction.shape == (1, m))
    return Y_prediction 

In [148]:

w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print ("predictions = " + str(predict(w, b, X)))
predictions = [[1. 1. 0.]]

Merge all functions into a model

You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.

Exercise: Implement the model function. Use the following notation:

  • Y_prediction_test for your predictions on the test set
  • Y_prediction_train for your predictions on the train set
  • w, costs, grads for the outputs of optimize()

In [198]:

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    dim=X_train.shape[0]
    w,b=initialize_with_zeros(dim)
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations= num_iterations, learning_rate= learning_rate, print_cost= True)
    
    Y_prediction_train=predict(params["w"], params["b"], X_train)
    Y_prediction_test=predict(params["w"],params["b"], X_test)
    
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : params["w"], 
         "b" : params["b"],
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

In [203]:

d=model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.709726
Cost after iteration 200: 0.657712
Cost after iteration 300: 0.614611
Cost after iteration 400: 0.578001
Cost after iteration 500: 0.546372
Cost after iteration 6: 0.518331
Cost after iteration 700: 0.492852
Cost after iteration 800: 0.469259
Cost after iteration 900: 0.447139
Cost after iteration 1000: 0.426262
Cost after iteration 1100: 0.406617
Cost after iteration 1200: 0.388723
Cost after iteration 1300: 0.374678
Cost after iteration 1400: 0.365826
Cost after iteration 1500: 0.358532
Cost after iteration 1600: 0.351612
Cost after iteration 1700: 0.345012
Cost after iteration 1800: 0.338704
Cost after iteration 1900: 0.332664
train accuracy: 91.38755980861244 %
test accuracy: 34.0 %

In [162]:

# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

In [165]:

classes

Out[165]:

array([b'non-cat', b'cat'], dtype='|S7')

In [175]:

# Example of a picture that was wrongly classified.
index = 2
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
# print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

Out[175]:

<matplotlib.image.AxesImage at 0x22c6a2d2f60>

In [ ]:

learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()
learning rate is: 0.01
Cost after iteration 0: 0.693147
Cost after iteration 100: 2.321788
Cost after iteration 200: 3.011239
Cost after iteration 300: 0.483519
Cost after iteration 400: 1.297533
Cost after iteration 500: 1.215430
Cost after iteration 600: 1.135770
Cost after iteration 700: 0.901737
Cost after iteration 800: 0.821976
Cost after iteration 900: 0.791033
Cost after iteration 1000: 0.762400
Cost after iteration 1100: 0.736228
Cost after iteration 1200: 0.711983
Cost after iteration 1300: 0.689076
Cost after iteration 1400: 0.667013
train accuracy: 71.29186602870814 %
test accuracy: 64.0 %

-------------------------------------------------------

learning rate is: 0.001
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.605784
Cost after iteration 200: 0.589938
Cost after iteration 300: 0.577890
Cost after iteration 400: 0.567791
Cost after iteration 500: 0.559013
Cost after iteration 600: 0.551207
Cost after iteration 700: 0.544146
Cost after iteration 800: 0.537671
Cost after iteration 900: 0.531668
Cost after iteration 1000: 0.526054
Cost after iteration 1100: 0.520764
Cost after iteration 1200: 0.515752
Cost after iteration 1300: 0.510979
Cost after iteration 1400: 0.506416
train accuracy: 74.16267942583733 %
test accuracy: 34.0 %

-------------------------------------------------------

learning rate is: 0.0001
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.636292
Cost after iteration 200: 0.630322
Cost after iteration 300: 0.625487
Cost after iteration 400: 0.621470
Cost after iteration 500: 0.618051
Cost after iteration 600: 0.615075
Cost after iteration 700: 0.612432
Cost after iteration 800: 0.610042
Cost after iteration 900: 0.607850
Cost after iteration 1000: 0.605814
Cost after iteration 1100: 0.603904
Cost after iteration 1200: 0.602098
Cost after iteration 1300: 0.600377
Cost after iteration 1400: 0.598731
lr_utils.py code
import numpy as np
import h5py

def load_dataset():
 train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
 train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
 train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
 test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
 test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
 test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

 classes = np.array(test_dataset["list_classes"][:]) # the list of classes

 train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
 test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

 return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

REFERENCES:

coursera.org
deep learning specialization

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s