读了lonlon ago关于这个问题的帖子,很受启发,非常感谢。
觉得lonlon ago帖子好,是因为这个帖子与TensorFlow中用RNN&LSTM识别MNIST的例子最近,为便于其他同学进一步学习,我把我做了注释的代码附加在后边,便于大家参考。
代码在TensorFlow1.8上跑通,代码在这里易读性太差,建议拷贝到代码环境里阅读。
疑问1:(?*28*28)的图片(代码中?是100)在CNN中是一下子被送到模型里训练(W,b)的,但是RNN(LSTMCell)中,?*28*28被分解为28个(?*28pixels)按“流”28次(28次也就是28个time steps)流入cell,也就是(?*28pixels被同时进行运算)得到最后一个output,注意这时这个output不是一维的,而是?维度,作为(W,b)的训练输入(input),这一点我认为对从“宏观”上理解RNN(with LSTMCell)怎么工作是非常重要的,可惜提到的人不多,我是花了一些时间才理解的。但我还是不太确信,希望继续思考、验证确认这一点。
疑问2:相同的输入input,LStMCell的每一次输出output并不是稳定的一个矩阵,但却不影响最终的“收敛”,这一点我还没有从机理弄清楚。
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 2 11:19:32 2018
@author: Administrator
Sequence Classification with LSTM on MNIST
"""
#matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import numpy as np
#import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets(".",one_hot=True)
trainimgs=mnist.train.images
trainlabels=mnist.train.labels
testimgs=mnist.test.images
testlabels=mnist.test.labels
ntrain=trainimgs.shape[0]
ntest=testimgs.shape[0]
dim=trainimgs.shape[1]
nclasses=trainlabels.shape[1]
print("Train Images:",trainimgs.shape)
print("Train Lables:",trainlabels.shape)
print() #Insert One blank row
print("Test Images:",testimgs.shape)
print("Test Labels:",testlabels.shape)
# =============================================================================
# RNN
# =============================================================================
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)
learning_rate = 0.001
training_iters = 100000
batch_size = 100
display_step = 10
#Construct a Recurrent Neural Network
# x is [?,28,28]
x = tf.placeholder(dtype="float", shape=[None, n_steps, n_input], name="x") # Current data input shape: (batch_size, n_steps, n_input) [100x28x28]
# y is [?,10]
y = tf.placeholder(dtype="float", shape=[None, n_classes], name="y")
#create the weight and biases for the read out layer
#weights is dict of [128,10]
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
#biases is dict of [10]
biases = {
'out': tf.Variable(tf.random_normal([n_classes])) #comment can be added to one virtual code line
}
#define a lstm cell with tensorflow
#n_hidden = 128, num_units: int, define The number of units in the LSTM cell.
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
"""#dynamic_rnn creates a recurrent neural network specified from lstm_cell:
#??? 'outputs' is a tensor of shape [batch_size, max_time, cell_state_size] as [?, 28, 128], SS Yes, it is
#??? 'state' is a tensor of shape [batch_size, cell_state_size] as [?, 128], $$ Yes, it is.
??? so I guess the following
??? cell_state_size = n_hidden as 128
??? max_time = n_steps as 28
# x is [batch_size,28,28],here the last 28 is the 28 pixels input, the last second is an index
which point the next group of 28 pixels input. So it is different from CNN, every 28 pixels are input
by 28 sequences(n_steps, time_steps)
"""
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=x, dtype=tf.float32)
#outputs is matrix of [100x28x128] output is matrix of [100,128],batchsize =100
output = tf.reshape(tf.split(outputs, 28, axis=1, num=None, name='split')[-1],[-1,128])
#pred=[?,128]*[128*10]+[?,10]=[?,10], we also understand it another way every entry is
#calculated one by one rather than a batch deal, weights shape=(128,10), biases shape=(10,)
pred = tf.matmul(output, weights['out']) + biases['out']
#define the cost function and optimizer:
##Labels and logits should be tensors of shape [100x10], lets check it out:
#y is [?,10],pred is [?,10]
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred ))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
#define the accuracy and evaluation methods to be used in the learning process:
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#recall that we will treat the MNIST image ∈R28×28 as 2828 sequences of a vector x∈R28 .
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters: # batch_size 100, training_iters 100,000
# We will read a batch of 100 images [100 x 784] as batch_x
# batch_y is a matrix of [100x10]
batch_x, batch_y = mnist.train.next_batch(batch_size)
# We consider each row of the image as one sequence
# Reshape data to get 28 seq of 28 elements, so that, batxh_x is [100x28x28]
batch_x = batch_x.reshape((batch_size, n_steps, n_input)) #100,28,28
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
print("Optimization Finished!")
# Calculate accuracy for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: test_data, y: test_label}))
#sess.close()
|