LSTM神经网络输入输出究竟是怎样的？

提示: 作者被禁止或删除内容自动屏蔽

天造人设 · 2018-9-29 22:54:38

简单直接的做个回答：
1）输入究竟什么样：
（1）神经网络的神经元存储的是标量，这样同一层的所有神经元按序排列的整体就看做一个向量，如果是输入层，他就对应了输入特征向量；所谓矩阵计算就是神经元上的这些标量和w的xx++
（2）RNN中，序列的顺序处理是逐步处理的，称为时间步，其实就是次数步，一句话十个字，就按序处理十次，每次的输入就是那个字对应的特征向量（字是离散的，所以需要Embedding到一个定长向量，如果不是处理文本，其他很多任务的输入本身就已经是向量了）
（3）因为需要用到上一次的计算结果，所以不能并行，只能按序串行计算。
（4）整个RNN模型结构跟序列长度没有任何关系，序列长度处理就像一个for循环，是凌驾于RNN模型结构之外的。
答案是：它的输入形式只需要考虑单次的输入，其实和MLP没什么差别，只是多了一个上一次的结果向量。
2）输出究竟什么样：
形式跟MLP一样。只是因为处理一个序列有和输入序列个数相等的输出，因为每次都是一个MLP嘛，肯定会输出。具体看怎么使用了，可以每个时间步的结果都用，也可以只使用最后一个时间步的结果，这就与RNN本身无关了。
回答完毕，不知道讲清楚了没有。
------------------------------------
题外：
很多资料都提到RNN的“权值共享”概念，我是很不建议这么去理解的，会让初学者蒙圈。它不像CNN在不同位置上共享权值，RNN没有把权值共享给谁，可以把它想象成一个读卡机（可以是一个带了左右手的MLP，也可以是肚子里一堆花花肠子有四只手的LSTM），下面的穿孔纸带卡持续穿过，读卡机不停地读，输出读的结果，同时把右手的信息（前一时间步记忆的某种东西）倒给左手，如此反复；如果把手拿掉，那就是一个没有记忆能力的MLP。所以说，它就是它，就那么一个MLP，没什么共享可言，这就是原理。（因工程需要，静态计算图按时间步展开、batch、多卡等中的权值共享，那是另外一说）

魏洪旭 · 2018-9-29 22:54:39

循环神经网络(RNN, Recurrent Neural Networks)介绍
看过最好的相关介绍了，搬运一下

阳阳 · 2018-9-29 22:54:40

输入输出都是向量，或者说是矩阵。LSTM用于分类的话，后面一般会接softmax层。
个人浅薄理解，拿动作识别分类举例，每个动作帧放入LSTM中训练，还是根据task来训练每个LSTM单元的Weights。所以LSTM的单元数量跟输入和输出都没有关系，甚至还可以几层LSTM叠加起来用。分类的话，一般用最后一个单元接上softmax层。

LSTM结构是传统的RNN结构扩展，解决了传统RNN梯度消失/爆炸的问题，从而使得深层次的网络更容易训练。从这个角度理解，可能会容易很多。今年的ResNet也是使传统的CNN更容易训练weights。
看来deep learning越来越深是趋势啊。不清楚的地方多看看论文，手机码字就不堆公式了。或者用python把LSTM自己实现一遍，网上也有不少教程。

Expect · 2018-9-29 22:54:42

每一个时刻的输入是一个向量, 如果是处理文本的话, 一般还会做一次embedding. 输出一般也是一个向量, 但是在TensorFlow中, 例如如下的

encoder_outputs, encoder_state = self._build_encoder(hparams)

encoder_outputs的size是[time_steps, batch_size, num_units], 这个其实不是严格意义上的输出, 它其实是每个时刻隐藏状态的输出, 如果要得到最终的输出, 还需要添加一个全连接层. encoder_state 则是最后一个时刻隐藏状态.

石田梅岩 · 2018-9-29 22:54:43

原来我也尝试搞懂一些天书般的公式，很快发现从那里入手是个错误。
强烈推荐：理解LSTM网络（翻译自 Understanding LSTM Networks）
只要有一点点CNN基础+半个小时，就可以通过这篇文章理解LSTM的基础原理。
回答你的问题：和神经元个数无关，不知道你是如何理解“神经元”这个概念的。输入输出层保证tensor的维数和输入输出一致就可以了。

ZHANG Hua · 2018-9-29 22:54:44

可以看下这份C++代码
https://github.com/dmlc/MXNet.cpp/blob/master/example/charRNN.cpp
如果说训练，就一个关键，所谓LSTM Unroll，将RNN展开成一个静态的“并行”网络，内部有“侧向连接”，实现长的短时记忆功能（状态“记忆”在LSTM Cell里）。这里如何记忆网上有中文讲的很清楚：http://www.jianshu.com/p/9dc9f41f0b29 （百度搜LSTM，第一条）
如果说预测，也就一个关键，要将Cell的h和C弄出来，作为当前状态（也就是所谓“记忆”）作为init参数输入，这样，携带了当前记忆状态的网络，预测得到的就是下一个输入了，所谓的recurrent了。
那份代码里还包含了一个使用cudnn的实现（built-in RNN operator），这是一个高性能的版本，可以真正干活的。

GloryFrank · 2018-9-29 22:54:46

读了lonlon ago关于这个问题的帖子，很受启发，非常感谢。
觉得lonlon ago帖子好，是因为这个帖子与TensorFlow中用RNN&LSTM识别MNIST的例子最近，为便于其他同学进一步学习，我把我做了注释的代码附加在后边，便于大家参考。
代码在TensorFlow1.8上跑通，代码在这里易读性太差，建议拷贝到代码环境里阅读。

疑问1：（？*28*28）的图片（代码中？是100）在CNN中是一下子被送到模型里训练（W，b）的，但是RNN（LSTMCell）中，？*28*28被分解为28个（？*28pixels）按“流”28次（28次也就是28个time steps）流入cell，也就是（？*28pixels被同时进行运算）得到最后一个output，注意这时这个output不是一维的，而是？维度，作为（W,b）的训练输入（input），这一点我认为对从“宏观”上理解RNN（with LSTMCell）怎么工作是非常重要的，可惜提到的人不多，我是花了一些时间才理解的。但我还是不太确信，希望继续思考、验证确认这一点。

疑问2：相同的输入input，LStMCell的每一次输出output并不是稳定的一个矩阵，但却不影响最终的“收敛”，这一点我还没有从机理弄清楚。

# -*- coding: utf-8 -*-
"""
Created on Mon Jul  2 11:19:32 2018

@author: Administrator
Sequence Classification with LSTM on MNIST
"""
#matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import numpy as np
#import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets(".",one_hot=True)

trainimgs=mnist.train.images
trainlabels=mnist.train.labels
testimgs=mnist.test.images
testlabels=mnist.test.labels

ntrain=trainimgs.shape[0]
ntest=testimgs.shape[0]
dim=trainimgs.shape[1]
nclasses=trainlabels.shape[1]
print("Train Images:",trainimgs.shape)
print("Train Lables:",trainlabels.shape)
print() #Insert One blank row
print("Test Images:",testimgs.shape)
print("Test Labels:",testlabels.shape)

# =============================================================================
# RNN
# =============================================================================
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)

learning_rate = 0.001
training_iters = 100000
batch_size = 100
display_step = 10

#Construct a Recurrent Neural Network
# x is [?,28,28]
x = tf.placeholder(dtype="float", shape=[None, n_steps, n_input], name="x") # Current data input shape: (batch_size, n_steps, n_input) [100x28x28]
# y is [?,10]
y = tf.placeholder(dtype="float", shape=[None, n_classes], name="y")

#create the weight and biases for the read out layer
#weights is dict of [128,10]
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
#biases is dict of [10]
biases = {
'out': tf.Variable(tf.random_normal([n_classes])) #comment can be added to one virtual code line
}

#define a lstm cell with tensorflow
#n_hidden = 128, num_units: int, define The number of units in the LSTM cell.
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

"""#dynamic_rnn creates a recurrent neural network specified from lstm_cell:
#???  'outputs' is a tensor of shape [batch_size, max_time, cell_state_size] as [?, 28, 128], SS Yes, it is
#??? 'state' is a tensor of shape [batch_size, cell_state_size] as [?, 128], $$ Yes, it is.
??? so I guess the following
??? cell_state_size = n_hidden as 128
??? max_time = n_steps as 28
# x is [batch_size,28,28],here the last 28 is the 28 pixels input, the last second is an index
which point the next group of 28 pixels input. So it is different from CNN, every 28 pixels are input
by 28 sequences(n_steps, time_steps)
"""
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=x, dtype=tf.float32)

#outputs is matrix of [100x28x128] output is matrix of [100,128],batchsize =100
output = tf.reshape(tf.split(outputs, 28, axis=1, num=None, name='split')[-1],[-1,128])
#pred=[?,128]*[128*10]+[?,10]=[?,10], we also understand it another way every entry is
#calculated one by one rather than a batch deal, weights shape=(128,10), biases shape=(10,)
pred = tf.matmul(output, weights['out']) + biases['out']

#define the cost function and optimizer:
##Labels and logits should be tensors of shape [100x10], lets check it out:
#y is [?,10],pred is [?,10]
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=pred ))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

#define the accuracy and evaluation methods to be used in the learning process:
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

#recall that we will treat the MNIST image  ∈R28×28  as  2828  sequences of a vector x∈R28 .
init = tf.global_variables_initializer()

with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:  # batch_size 100, training_iters 100,000

      # We will read a batch of 100 images [100 x 784] as batch_x
      # batch_y is a matrix of [100x10]
      batch_x, batch_y = mnist.train.next_batch(batch_size)
      # We consider each row of the image as one sequence
      # Reshape data to get 28 seq of 28 elements, so that, batxh_x is [100x28x28]
      batch_x = batch_x.reshape((batch_size, n_steps, n_input)) #100,28,28

      # Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
      if step % display_step == 0:
         # Calculate batch accuracy
         acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
         # Calculate batch loss
         loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
         print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
               "{:.6f}".format(loss) + ", Training Accuracy= " + \
               "{:.5f}".format(acc))
      step += 1
print("Optimization Finished!")

# Calculate accuracy for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

#sess.close()

LSTM神经网络输入输出究竟是怎样的？

7 个回复