cnn 辅助函数的构建

这周的编程作业内容是使用 numpy 实现卷积层和池化层，包括前向传播和方向传播。

包的引入

import numpy as np
import h5py
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

作业大纲

卷积函数，包括：
- 零填充
- 卷积窗口
- 卷积前向传播
- 卷积反向传播
池化函数，包括：
- 池化前向传播
- 创造 mask
- Distribute value
- 池化反向传播

注意：每一步前向传播都需要储存一些参数在 cache 中，以便方向传播时可以用

卷积神经网络

先构建两个辅助函数。

零填充 Zero-Padding

该辅助函数的作用是在该图片周围加零，如下图所示：

它使图片在通过卷积层时尺寸不会减小，对深层网络尤其有效
使我们保留边缘的重要信息

代码如下：

def zero_pad(X, pad):
    """
    在图片的长宽方向填充 pad 宽的 0 像素值
    
    Argument:
    X -- 代表 m 张图片的形状为 (m, n_H, n_W, n_C) 的数组
    pad -- 整数，图片水平方向和竖直方向的填充值
    
    Returns:
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    """
    
    # np.pad(填充对象, ((维度1的前方，维度1的后方),(..,..),(..,..),(..,..)), '填充方式', constant_values = (前方填充值, 后方填充值)) 
    X_pad = np.pad(X, ((0,0),(pad,pad),(pad,pad),(0,0)), 'constant', constant_values = (0,0) )

    return X_pad

单步卷积运算

取出输入向量
在输入的每个位置使用过滤器进行卷积
输出另一个向量

代码如下：

def conv_single_step(a_slice_prev, W, b):
    """
    将过滤器与一小片图片进行卷积操作
    
    Arguments:
    a_slice_prev -- 形状为 (f, f, n_C_prev) 的之前图片的一小片
    W -- 形状为 (f, f, n_C_prev) 形状与过滤器一样的权重参数矩阵
    b -- 形状为 (1, 1, 1) 的偏差参数矩阵
    
    Returns:
    Z -- 标量，过滤器与一小片图片卷积的结果
    """

    # 先逐元素相乘
    s = np.multiply(a_slice_prev, W)
    # 再全部相加
    Z = np.sum(s)
    # 加上偏差 b
    Z = Z + float(b) # b 为矩阵，先用 float() 转为标量再相加

    return Z

卷积神经网络——前向传播

切片方法：

代码如下：

def conv_forward(A_prev, W, b, hparameters):
    """
    卷积函数的前向传播
    
    Arguments:
    A_prev -- 形状为 (m, n_H_prev, n_W_prev, n_C_prev)，前一层的激活值
    W -- 形状为 (f, f, n_C_prev, n_C)，权重矩阵
    b -- 形为 (1, 1, 1, n_C)，偏差矩阵
    hparameters -- 包含步长 "stride" 和填充 "pad" 的字典
        
    Returns:
    Z -- 形为 (m, n_H, n_W, n_C) 的卷积输出
    cache -- 反向传播中要用到的缓存值
    """

    # 从 A_prev 的 shape 中获取维度信息  
    (m, n_H_prev, n_W_prev, n_C_prev) = np.shape(A_prev)
    
    # 从 W 的 shape 中获取维度信息
    (f, f, n_C_prev, n_C) = np.shape(W)
    
    # 获取步长和填充信息
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # 用公式计算输出的维度信息，int() 可用于向下取整
    n_H = int((n_H_prev+2*pad-f)/stride) + 1
    n_W = int((n_W_prev+2*pad-f)/stride) + 1
    
    # 用 0 初始化输出矩阵
    Z = np.zeros((m, n_H, n_W, n_C))
    
    # 对输入进行填充
    A_prev_pad = zero_pad(A_prev, pad)
    
    for i in range(m):                               # 对 m 个训练图像的循环
        a_prev_pad = A_prev_pad[i,:,:,:]             # 选出第 i 个图像
        for h in range(n_H):                         # 对输出向量高度方向的循环
            for w in range(n_W):                     # 对输出向量宽度方向的循环
                for c in range(n_C):                 # 对输出向量通道数（过滤器个数）的循环
                    
                    # 找到循环到 (i,h,w,c) 时候对应的图像“小片”
                    vert_start = h*stride
                    vert_end = vert_start + f
                    horiz_start = w*stride 
                    horiz_end = horiz_start + f
                    
                    # 进行切片操作
                    a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
                    # 将对应的图像小片和过滤器进行卷积得到 (i,h,w,c) 处的值
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, W[:,:,:,c], b[:,:,:,c])
                                          
    # 确保输出维度正确
    assert(Z.shape == (m, n_H, n_W, n_C))
    
    # 将一些信息储存在缓存中以便反向传播可以用
    cache = (A_prev, W, b, hparameters)
    
    return Z, cache

池化层

池化层减小了输入的高度和宽度，帮助减少计算量，使得特征检测器在输入中的位置更加不变。

max pooling
average pooling

池化层没有参数学习，只需要确定超参数，例如滤波器的大小。

代码如下：

def pool_forward(A_prev, hparameters, mode = "max"):
    """
    池化层的前向传播
    
    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    """
    
    # 取出输入的维度信息
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # 取出过滤器的参数 
    f = hparameters["f"]
    stride = hparameters["stride"]
    
    # 定义输出的维度
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    # 初始化输出矩阵
    A = np.zeros((m, n_H, n_W, n_C))              
    
    for i in range(m):                           # 对所有训练样本循环
        for h in range(n_H):                     # 对高度方向循环
            for w in range(n_W):                 # 对宽度方向循环
                for c in range (n_C):            # 对输出通道数的循环
                    
                    # 找到切片索引值
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # 对输入进行切片
                    a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]
                    
                    # 进行池化操作
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.average(a_prev_slice)

    # 缓存
    cache = (A_prev, hparameters)
    
    # 保证输出维度正确
    assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache

cnn 中的反向传播

卷积层

这部分是可选项，在课程中也没有给详细的推导过程和解释，具体的解释可以参考以下几个网站：

在参考 1 中我们可以知道，$dA^{[l-1]}$ 就是将 $dZ^{[l]}$ 与翻转 180 度的过滤器矩阵进行卷积的结果，如下图所示：

这正是下面式子所表示的，+= 就是将四个叠加起来，问题是下面的式子中的 W 并没有旋转 180 度……这点一直无法解释……

$d A^{[l-1]} + = \sum _ { h = 0 } ^ { n _ { l I } } \sum _ { w = 0 } ^ { n _ { W } } W _ { c } \times d Z ^{[l]} _ { h w }$

1	da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]

$d W _ { c } + = \sum _ { h = 0 } ^ { n _ { H } } \sum _ { w = 0 } ^ { n _ { W } } a _ { s l i c e } \times d Z _ { h w }$

1	dW[:,:,:,c] += a_slice * dZ[i, h, w, c]

$d b = \sum _ { h } \sum _ { w } d Z _ { h w }$

1	db[:,:,:,c] += dZ[i, h, w, c]

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function
    
    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()
    
    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """
    
    ### START CODE HERE ###
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache
    
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = np.shape(A_prev)
    
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = np.shape(W)
    
    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = np.shape(dZ)
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros(np.shape(A_prev))                           
    dW = np.zeros(np.shape(W))   
    db = np.zeros(np.shape(b))   

    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)
    
    for i in range(m):                       # loop over the training examples
        
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i,:,:,:]
        da_prev_pad = dA_prev_pad[i,:,:,:]
        
        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
                    dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
                    db[:,:,:,c] += dZ[i, h, w, c]
                    
        # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = dA_prev_pad[i, pad:-pad, pad:-pad, :]
    ### END CODE HERE ###
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
    
    return dA_prev, dW, db

池化层

虽然池化层没有参数，但是还是需要将梯度反向传播到池化层的上一层，以便反向传播能继续下去。

max pooling

对于 max pooling 而言，只有原来的最大值才对最终的代价函数有影响，所以我们只需要计算代价函数对这个最大值的梯度即可，其他的置为零，首先创造一个蒙板函数：

$X = \left[ \begin{array} { l l } { 1 } & { 3 } \\ { 4 } & { 2 } \end{array} \right] \quad \rightarrow \quad M = \left[ \begin{array} { l l } { 0 } & { 0 } \\ { 1 } & { 0 } \end{array} \right]$

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.
    
    Arguments:
    x -- Array of shape (f, f)
    
    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """

    mask = (x == np.max(x)) # x 中等于 np.max(x) 的都为真，其他为假
   
    return mask

average pooling

由于 average pooling 中过滤器中每个值都对最终结果有影响，所以这每个值的梯度都是下一层梯度的平均一份，因为它们每个数对最终代价函数的贡献都是一样的，所以我们将一个梯度分散为若干个相等的梯度：

$d Z = 1 \quad \rightarrow \quad d Z = \left[ \begin{array} { l l } { 1 / 4 } & { 1 / 4 } \\ { 1 / 4 } & { 1 / 4 } \end{array} \right]$

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape
    
    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
    
    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """

    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape
    
    # Compute the value to distribute on the matrix (≈1 line)
    average = n_H * n_W
    
    # Create a matrix where every entry is the "average" value (≈1 line)
    a = dz * np.ones((n_H, n_W)) / average
  
    return a

合并到一个函数

def pool_backward(dA, cache, mode = "max"):
    """
    实现池化层的反向传播
    
    Arguments:
    dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
    cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters 
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
    """
    

    # 从 cache 取出参数
    (A_prev, hparameters) = cache
    
    # 取出超参数
    stride = hparameters['stride']
    f = hparameters['f']
    
    # 取出 dA_prev 和 dA 的维度信息
    m, n_H_prev, n_W_prev, n_C_prev = np.shape(A_prev)
    m, n_H, n_W, n_C = np.shape(dA)
    
    # 将 dA_prev 初始化
    dA_prev = np.zeros(np.shape(A_prev))
    
    for i in range(m):                        # loop over the training examples
        
        # 对每个训练样例单独操作
        a_prev = A_prev[i, :,:,:]
        
        for h in range(n_H):                   # loop on the vertical axis
            for w in range(n_W):               # loop on the horizontal axis
                for c in range(n_C):           # loop over the channels (depth)
                    
                    # 找到切片索引值
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride 
                    horiz_end = horiz_start + f
                    
                    # 用两种方式计算反向传播
                    if mode == "max":
                        
                        # 进行切片
                        a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c]
                        # 创造目前切片的蒙板，使得该切片中最大值置 1，其他置 0
                        mask = create_mask_from_window(a_prev_slice)
                        # 将 dA[i,h,w,c] 这个位置的值乘上蒙板得到 [i,h,w,c] 这个位置反向传播的结果，最后根据链式法则将所有支路相加，也就是 +=
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i,h,w,c]
                        
                    elif mode == "average":
                        
                        # 首先得到 [i,h,w,c] 这个特定位置的梯度值 dA[i,h,w,c]
                        da = dA[i,h,w,c]
                        # 过滤器形状
                        shape = (f,f)
                        # 将 dA[i,h,w,c] 这个值分散得到该位置反向传播结果，根据链式法则将所有位置的结果相加
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)
                        

    # Making sure your output shape is correct
    assert(dA_prev.shape == A_prev.shape)
    
    return dA_prev

cnn 的应用

这部分使用 tensorflow 来构建一个分类器。

　包的引入和数据集

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *
import os

%matplotlib inline
np.random.seed(1)

1 2	＃加载数据集 X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

仍然是识别手势的数据集：

# 图片示例
index = 6
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

# 数据集信息
X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
conv_layers = {}

创建占位符

首先需要创建输入数据的占位符，以便在运行 sess 时可以喂数据进去。

# 创建占位符
def create_placeholders(n_H0, n_W0, n_C0, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_H0 -- scalar, height of an input image
    n_W0 -- scalar, width of an input image
    n_C0 -- scalar, number of channels of the input
    n_y -- scalar, number of classes
        
    Returns:
    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
    """

    X = tf.placeholder(tf.float32, shape = [None,n_H0, n_W0, n_C0 ]) # 第一个参数是数据类型，第二个是占位符形状
    Y = tf.placeholder(tf.float32, shape = [None, n_y ])
  
    return X, Y

初始化参数

初始化使用 W = tf.get_variable(“W”, [1,2,3,4], initializer = …)
初始化器使用 tf.contrib.layers.xavier_initializer(seed = 0)

# 参数初始化
def initialize_parameters():
    """
    Initializes weight parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [4, 4, 3, 8]
                        W2 : [2, 2, 8, 16]
    Returns:
    parameters -- a dictionary of tensors containing W1, W2
    """
    
    tf.set_random_seed(1)                              # 这句不用管，确保我们的输出和教程一样

    W1 = tf.get_variable("W1", [4,4,3,8], initializer = tf.contrib.layers.xavier_initializer(seed = 0)) 
    W2 = tf.get_variable("W2", [2,2,8,16], initializer = tf.contrib.layers.xavier_initializer(seed = 0))
   
    parameters = {"W1": W1,
                  "W2": W2}
    
    return parameters

前向传播

卷积层（步长 1，same 填充） -> RELU 激活 -> maxpool 池化（8×8过滤器，8×8步长，same 填充） -> 卷积层（步长 1，same 填充）-> RELU 激活 -> maxpool 池化（4×4 过滤器，4×4 步长）->拍扁 -> 全连接层（输出结点 6 个，不需要调用 softmax，因为在 tensorflow 中，softmax 和代价函数被整合进一个函数中）

使用的函数为：

卷积层：tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME')
- X 为输入，W1 为过滤器，strides 必须为 [1,s,s,1]，s 为步长，padding 类型为 same
maxpool 池化层：tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = 'SAME')
- f 为过滤器尺寸
relu：tf.nn.relu(Z1)
- Z1 可以是任意形状
拍扁：tf.contrib.layers.flatten(P)
- 返回一个 [batch_size,k] 的张量，也就是说会保留样本个数那个维度
全连接层：tf.contrib.layers.fully_connected(F, num_outputs)
- num_outputs 为输出层结点个数
- 注意：tensorflow 会自动帮我们初始化全连接层的参数并在训练模型的时候自动训练，所以不用初始参数

# 前向传播
def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model:
    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
    
    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "W2"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    W2 = parameters['W2']
    
    # CONV2D: stride of 1, padding 'SAME'
    Z1 = tf.nn.conv2d(X, W1, strides = [1,1,1,1], padding = 'SAME')
    # RELU
    A1 = tf.nn.relu(Z1)
    # MAXPOOL: window 8x8, sride 8, padding 'SAME'
    P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
    # CONV2D: filters W2, stride 1, padding 'SAME'
    Z2 = tf.nn.conv2d(P1, W2, strides = [1,1,1,1], padding = 'SAME')
    # RELU
    A2 = tf.nn.relu(Z2)
    # MAXPOOL: window 4x4, stride 4, padding 'SAME'
    P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
    # FLATTEN
    P2 = tf.contrib.layers.flatten(P2)
    # FULLY-CONNECTED without non-linear activation function (not not call softmax).
    # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None" 
    Z3 = tf.contrib.layers.fully_connected(P2, 6, activation_fn = None)

    return Z3

计算代价函数

计算所有样例的损失函数：tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)
- 返回所有样例的损失函数的一个向量
计算损失函数均值（代价函数）：tf.reduce_mean()

# 计算代价函数
def compute_cost(Z3, Y):
    """
    Computes the cost
    
    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3
    
    Returns:
    cost - Tensor of the cost function
    """

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))
   
    return cost

总模型

创造占位符
初始化参数（全连接层不要）
前向传播
计算损失
创建优化器
小批量梯度下降

# 总模型
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,
          num_epochs = 100, minibatch_size = 64, print_cost = True):
    """
    Implements a three-layer ConvNet in Tensorflow:
    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
    
    Arguments:
    X_train -- training set, of shape (None, 64, 64, 3)
    Y_train -- test set, of shape (None, n_y = 6)
    X_test -- training set, of shape (None, 64, 64, 3)
    Y_test -- test set, of shape (None, n_y = 6)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    train_accuracy -- real number, accuracy on the train set (X_train)
    test_accuracy -- real number, testing accuracy on the test set (X_test)
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    ops.reset_default_graph()	# 重置默认的计算图
    tf.set_random_seed(1)       # 不用管，保持结果一致 (tensorflow seed)
    seed = 3                    # 不用管，保持结果一致 (numpy seed)
    (m, n_H0, n_W0, n_C0) = X_train.shape             
    n_y = Y_train.shape[1]                            
    costs = []                  # 记录代价函数
    
    # 根据输入数据形状创建占位符
    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)

    # 初始化参数
    parameters = initialize_parameters()
    
    # 前向传播: 在 tensorflow 计算图中构建前向传播
    Z3 = forward_propagation(X, parameters)
    
    # 代价函数计算: 往计算图中增加代价函数
    cost = compute_cost(Z3, Y)
    
    # 反向传播: 定义优化器. Use an AdamOptimizer that minimizes the cost.
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

    # 初始化全局参数
    init = tf.global_variables_initializer()
     
    # 开始会话 sess 计算计算图
    with tf.Session() as sess:
        
        # 运行全局初始化
        sess.run(init)
        
        # 进行训练循环
        for epoch in range(num_epochs):

            minibatch_cost = 0.
            num_minibatches = int(m / minibatch_size) # 计算小批量的个数
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)	# 划分小批量

            for minibatch in minibatches:

                # 选择小批量
                (minibatch_X, minibatch_Y) = minibatch
                # IMPORTANT: The line that runs the graph on a minibatch.
                # 运行会话以执行 optimizer and the cost, the feedict should contain a minibatch for (X,Y).
                _ , temp_cost= sess.run([optimizer,cost], feed_dict={X: minibatch_X, Y: minibatch_Y})               
                
                minibatch_cost += temp_cost / num_minibatches
                

            # 每五次迭代打印一次 cost
            if print_cost == True and epoch % 5 == 0:
                print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))
            if print_cost == True and epoch % 1 == 0:
                costs.append(minibatch_cost)
        
        
        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # Calculate the correct predictions
        predict_op = tf.argmax(Z3, 1)	# 找到 Z3 中最大值的索引号，1 为按行取
        correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))	# 相等则为 1 否则为 0
        
        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print(accuracy)
        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})	# accuracy.eval() 相当于 sess.run(accuracy)
        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
        print("Train Accuracy:", train_accuracy)
        print("Test Accuracy:", test_accuracy)                              
                
        return train_accuracy, test_accuracy, parameters

Tensorflow 使用感想

在 tensorflow 中，计算图中的任何值都不会被计算出来，除非你使用 sess.run(feed_dict)或者 tensor.eval(feed_dict)，在求值的时候，从要求的值往前推，把这一条线上所有需要的 placeholder 找出来然后填入 feed_dict。