coursera 吴恩达深度学习 Specialization 编程作业（course 1 week 2）

numpy 使用基础

Numpy 是 Python 里用于科学计算的模块，由一个开源社区进行维护，下面介绍一些用于神经网络搭建的函数的构建

sigmoid 函数，np.exp( )

回忆：$sigmoid(x)=\frac {1}{1+e^{-x}}$

如果 $ x = (x_1, x_2, …, x_n)$ 是一个行向量，那么 $np.exp(x)$ 会将 $exp( )$ 函数用于 x 的每个元素，输出 $np.exp(x) = (e^{x_1}, e^{x_2}, …, e^{x_n})$

import numpy as np

# np.exp的例子
x = np.array([1, 2, 3])#x 为一个行向量
print(np.exp(x)) # 结果为 (exp(1), exp(2), exp(3))

>> [ 2.71828183 7.3890561 20.08553692]

# 更多向量操作例子
x = np.array([1, 2, 3])
print (x + 3)
print(1/x)

>>[4 5 6]

>>[ 1. 0.5 0.33333333]

$\text{For } x \in \mathbb {R}^n \text{, } sigmoid(x) = sigmoid\begin{pmatrix}x_1 \\x_2 \\ ... \\ x_n \\\end{pmatrix} = \begin{pmatrix}\frac{1}{1+e^{-x_1}} \\\frac{1}{1+e^{-x_2}} \\ ... \\ \frac{1}{1+e^{-x_n}} \\\end{pmatrix}$

import numpy as np #用 np.exp() 代替 numpy.exp()

def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    s = 1 / (1 + np.exp(-x))
    
    return s

1 2	x = np.array([1, 2, 3]) sigmoid(x)

>>array([ 0.73105858, 0.88079708, 0.95257413])

求 sigmoid 函数的导数 (dZ)

注意：在前面的笔记中，吴恩达的课程视频中写的 dZ 的求法是：$dZ=A-Y$，但是在课程 ppt 和编程作业中的求法是：$dZ=\sigma(x)(1-\sigma(x))$

$sigmoid\_derivative(x) = {\sigma' (x)} = \sigma(x)(1-\sigma(x))=A(1-A)$

def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    
    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """

    s = sigmoid(x)
    ds = s * (1 - s)

    return ds

1 2	x = np.array([1, 2, 3]) print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))

>>sigmoid_derivative(x) = [ 0.19661193 0.10499359 0.04517666]

reshape 命令重构矩阵

X.shape 命令用于得到一个矩阵或向量的维度（形状）
X.reshape 命令用于把 X 重构为某个维度，例如把 shape (length,height,depth=3) 的图片变成输入，重构为 shape (length∗height∗3,1) 的向量

下面的代码把一张图片变为一个向量：

def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    
    v = image.reshape((image.shape[0]*image.shape[1]*image.shape[2],1)) 
    #image.shape[0] 获得图片或数组第一个维度的长度
   
    return v

# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

print ("image2vector(image) = " + str(image2vector(image)))

image2vector(image) = [[ 0.67826139]
 [ 0.29380381]
 [ 0.90714982]
 [ 0.52835647]
 [ 0.4215251 ]
 [ 0.45017551]
 [ 0.92814219]
 [ 0.96677647]
 [ 0.85304703]
 [ 0.52351845]
 [ 0.19981397]
 [ 0.27417313]
 [ 0.60659855]
 [ 0.00533165]
 [ 0.10820313]
 [ 0.49978937]
 [ 0.34144279]
 [ 0.94630077]]

输入矩阵的行向量标准化

将数据标准化会使得模型表现得更好，因为标准化之后梯度下降收敛速度更快，我们可以通过将输入矩阵 x 每一个行向量除以该行向量的模，即 $ \frac{x}{| x|} $

如果 $x = \begin{bmatrix} 0 & 3 & 4 \\ 2 & 6 & 4 \\\end{bmatrix}$ 那么 $| x| = np.linalg.norm(x, axis = 1, keepdims = True) =\begin{bmatrix} 5 \\ \sqrt{56} \\\end{bmatrix} $ 而且 $ x_normalized = \frac{x}{| x|} = \begin{bmatrix} 0 & \frac{3}{5} & \frac{4}{5} \\ \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\\end{bmatrix}$

def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    x_norm = np.linalg.norm(x, ord = 2,axis = 1,keepdims = True)
    #计算 x 的范数或者说行向量的模，其中 ord = 2 表示范数类型为 2，即平方和的开方，axis =1 表示按行向量处理，keepdims = True 表示保持矩阵的二维特性
    x = x / x_norm #用 x 矩阵除以行向量的模，自动进行广播拓展
    
    return x

x = np.array([
    [0, 3, 4],
    [1, 6, 4]])
print("normalizeRows(x) = " + str(normalizeRows(x)))

1 2	>> normalizeRows(x) = [[ 0. 0.6 0.8 ] [ 0.13736056 0.82416338 0.54944226]]

广播（broadcasting）和 softmax 函数

你可以把 softmax 函数看作一个用来标准化的函数，当算法需要进行二元或者更多元分类时

$ \text{for } \ x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix} x_1 && x_2 && … && x_n \end{bmatrix}) = \begin{bmatrix} \frac{e^{x_1}}{\sum_{j}e^{x_j}} && \frac{e^{x_2}}{\sum_{j}e^{x_j}} && … && \frac{e^{x_n}}{\sum_{j}e^{x_j}} \end{bmatrix} $
$\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$
$softmax(x) = softmax\begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} \\= \begin{bmatrix} \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} \end{bmatrix} = \begin{pmatrix} softmax\text{(first row of x)} \\ softmax\text{(second row of x)} \\ ... \\ softmax\text{(last row of x)} \\ \end{pmatrix}$

实现代码如下：

def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """
    
    # 对 x 每个元素求 exp()
    x_exp = np.exp(x)

    # 计算 x_exp 每行的和，使用 np.sum(..., axis = 1, keepdims = True)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    
    # 使用广播特性自动计算 softmax 函数
    s = x_exp / x_sum

    return s

x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(x)))

>> softmax(x) = [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04
    1.21052389e-04]
 [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04
    8.01252314e-04]]

两种损失函数 L1 和 L2

L1 损失函数定义为：

$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}$

def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L1 loss function defined above
    """

    loss = np.sum(np.abs(y - yhat))

    return loss

1
2
3

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat,y)))

>> L1 = 1.1

L2 损失函数定义为：

$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}$

def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the L2 loss function defined above
    """

    loss = np.dot(y-yhat, y-yhat) # 直接写成自己和自己做点积

    return loss

1
2
3

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))

>> L2 = 0.43

用逻辑回归识别猫的图片

实现步骤：

建立一个学习算法的一般结构，包括：
- 初始化参数
- 计算代价函数和它对参数的导数
- 使用梯度下降法迭代参数
把这三个函数用正确的顺序聚合到一个模型主函数中

数据集预处理

建立一个简单的判断图片是不是猫的识别算法，数据集包括：

m_train 个训练集，包括图片集 train_set_x_orig 和标签集 train_set_y
m_test 个测试集，包括图片集 test_set_x_orig 和标签集 test_set_y
每张图片都是方形 (height = num_px, width = num_px)，有三个颜色通道，所以数组形状是 (num_px, num_px, 3)
每个图片集都要进行预处理，所以原始数据加上 _orig，但是标签集不需要预处理

加载原始数据集

1 2	# 加载数据集 train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

# 测试
index = 1
plt.imshow(train_set_x_orig[index]) # 画图
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")
# np.squeeze() 用于把一个数组的 shape 中为 1 的维度删掉，即让 train_set_y[:, index] 变为一个数

确定图片维度和个数以防止出错

训练集的个数：m_train
测试集的个数：m_test
图片（正方形）的尺寸即边长的像素数：num_px

# 确定维度和个数
# train_set_x_orig 形状为 (m_train, num_px, num_px, 3)
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

>>Number of training examples: m_train = 209
  Number of testing examples: m_test = 50
  Height/Width of each image: num_px = 64
  Each image is of size: (64, 64, 3)
  train_set_x shape: (209, 64, 64, 3)
  train_set_y shape: (1, 209)
  test_set_x shape: (50, 64, 64, 3)
  test_set_y shape: (1, 50)

重构图片数组变为标准输入矩阵

把尺寸为 (num_px, num_px, 3) 的图片变为 shape 为 (num_px ∗ num_px ∗ 3, 1) 的向量

把一个 shape 为 (a,b,c,d) 的矩阵变为一个 shape 为 (b∗c∗d, a) 的矩阵的技巧：x_flatten = X.reshape(X.shape[0], -1).T

实际上，reshape() 是按行取元素，按行放元素

# 重构图片数组
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))#重构后完整性检查

>>train_set_x_flatten shape: (12288, 209)
  train_set_y shape: (1, 209)
  test_set_x_flatten shape: (12288, 50)
  test_set_y shape: (1, 50)
  sanity check after reshaping: [17 31 56 22 33]

数据标准化

为了使得数据在一个合适的尺度上，我们需要将数据标准化，数据标准化的办法见上一篇博文，但是对于图片来说，由于图片的每个像素的 RGB 值介于 0 到 255 之间，所以我们可以将每个特征值除以 255，这样就能将它们标准化了

1
2
3

# 数据标准化
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

算法的一般体系结构

初始化模型参数
通过最优化代价函数学习参数
- 计算目前的损失函数（前向传播）
- 计算现在的梯度（反向传播）
- 更新参数（梯度下降）
使用学习后的参数对测试集进行预测
分析结果得出结论

构建算法的各个部分

辅助函数

实现 sigmoid 函数：

# sigmoid 函数
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    s = 1 / (1 + np.exp(-z))
    
    return s

1	print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

>> sigmoid([0, 2]) = [ 0.5 0.88079708]

初始化参数

使用一系列的 0 初始化我们的参数 w 和 b

# 初始化参数
def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """

    w = np.zeros((dim,1))
    b = 0

    assert(w.shape == (dim, 1)) # 确保 w 的维度正确
    assert(isinstance(b, float) or isinstance(b, int)) #确保 b 是浮点数或者整数
    
    return w, b

dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

>> w = [[0] [0]] b = 0

前向传播和反向传播

步骤：

输入 X
计算预测值 $A = \sigma(w^T X + b)$
计算代价函数 $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$
计算 $ dw = \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$
计算 $ db=\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$

# 计算前向传播和反向传播
def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # 前向传播 (从 X 到 COST)
    A = sigmoid(np.dot(w.T,X ) + b) # 计算预测值                    
    cost = (-1/m)*(np.sum(Y*np.log(A) + (1-Y)*np.log(1-A))) # 计算代价函数
    
    # 反向传播 (计算梯度)
    dw = (1/m)*np.dot(X,(A-Y).T)
    db = (1/m)*np.sum(A-Y)

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,   #返回梯度 dict
             "db": db}
    
    return grads, cost

w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

>>dw = [[ 0.99845601]
       [ 2.39507239]]
  db = 0.00145557813678
  cost = 5.80154531939

梯度下降优化参数

更新参数方法：$ \theta = \theta - \alpha \text{ } d\theta$ ，其中 $\alpha$ 为学习率

#梯度下降
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- 循环的迭代次数
    learning_rate -- 学习率
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        
        # 代价函数和梯度计算 
        grads, cost = propagate(w, b, X, Y)
        
        dw = grads["dw"]
        db = grads["db"]
        
        # 更新参数
        w = w - learning_rate*dw
        b = b - learning_rate*db

        # 每100次迭代记录一次代价函数到 costs
        if i % 100 == 0:
            costs.append(cost)
        
        # 每一百步打印一次代价函数
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

>> w = [[ 0.19033591]
       [ 0.12259159]]
   b = 1.92535983008
   dw = [[ 0.67752042]
        [ 1.41625495]]
   db = 0.219194504541

用得到的参数预测数据集的标签

先计算预测值 $\hat{Y} = A = \sigma(w^T X + b)$
若 $\hat Y > 0.5$ ，则预测的标签为 1
若 $\hat Y <= 0.5$ ，则预测的标签为 0

# 进行预测
def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m)) #初始化
    w = w.reshape(X.shape[0], 1) #确保 w shape 正确
    
    # 计算图片中是猫的概率
    A = sigmoid(np.dot(w.T,X) + b)
    
    for i in range(A.shape[1]):
        
        # 把概率转化为标签值
        if A[0,i] > 0.5:
            Y_prediction[0,i] = 1
        else:
            Y_prediction[0,i] = 0
  
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print ("predictions = " + str(predict(w, b, X)))

>> predictions = [[ 1. 1. 0.]]

把所有的函数聚合到主函数

# 主函数
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """

    # 用 0 初始化参数
    w, b = initialize_with_zeros(X_train.shape[0])

    # 梯度下降
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations , learning_rate , print_cost = False)
    
    # 从参数 dict 中获取参数
    w = parameters["w"]
    b = parameters["b"]
    
    # 预测训练集/测试集的标签
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    # 打印预测误差
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

1 2	# 开始训练模型 d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

输出结果：

1 2	train accuracy: 99.04306220095694 % # 训练集的预测精确度 test accuracy: 70.0 % #测试集的预测精确度

画出错误图形：

# Example of a picture that was wrongly classified.
index = 1
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

画出代价函数的下降曲线：

# Plot learning curve (with costs)
costs = np.squeeze(d['costs']) # np.squeeze() 确保它是一维数组
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

进一步分析

学习率的选择

为了梯度下降正常工作，必须选择合适的学习率。学习率 $\alpha$ 决定了更新参数的快慢，如果学习率太大，我们可能错过最优值，同样，如果太小，我们需要迭代很多次到达最优值，下面比较不同学习率的差别。

learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %

-------------------------------------------------------

learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %

-------------------------------------------------------

learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %

-------------------------------------------------------

结论：

不同的学习率导致不同的代价函数和预测结果
如果学习率太大（0.01），代价函数可能会上下摆动甚至偏离（尽管在这个例子中 0.01 最后收敛得很好）
更小的学习率不意味着更好的模型，因为有可能出现 过拟合，一般出现在训练数据精确度比测试数据高得多时
深度学习中，一般推荐：
- 选择更好降低代价函数的学习率
- 如果发生过拟合，用其他的方式减小过拟合

作业结论

预处理数据很重要
先分开构建函数： initialize(), propagate(), optimize()，最后再搭建模型 model()
调整学习率（超参数的一个例子）对算法影响很大