手把手:Python加密货币价格预测9步走,视频+代码

论坛 期权论坛 期权     
大数据文摘   2019-10-27 05:22   2189   0




YouTube网红小哥Siraj Raval系列视频又和大家见面啦!今天要讲的是加密货币价格预测,包含大量代码,还用一个视频详解具体步骤,不信你看了还学不会!


点击观看详解视频
时长22分钟
有中文字幕

[iframe]https://v.qq.com/iframe/preview.html?vid=b0643h6sq4i&width=500&height=375&auto=0[/iframe]

预测加密货币价格其实很简单,用Python+Keras,再来一个循环神经网络(确切说是双向LSTM),只需要9步就可以了!比特币以太坊价格预测都不在话下。


这9个步骤是:
  • 数据处理
  • 建模
  • 训练模型
  • 测试模型
  • 分析价格变化
  • 分析价格百分比变化
  • 比较预测值和实际数据
  • 计算模型评估指标
  • 结合在一起:可视化







数据处理


导入Keras、Scikit learn的metrics、numpy、pandas、matplotlib这些我们需要的库。


  1. ## Keras for deep learning
  2. from keras.layers.core import Dense, Activation, Dropout
  3. from keras.layers.recurrent import LSTM
  4. from keras.layers import Bidirectional
  5. from keras.models import Sequential
  6. ## Scikit learn for mapping metrics
  7. from sklearn.metrics import mean_squared_error
  8. #for logging
  9. import time
  10. ##matrix math
  11. import numpy as np
  12. import math
  13. ##plotting
  14. import matplotlib.pyplot as plt
  15. ##data processing
  16. import pandas as pd
复制代码

首先,要对数据进行归一化处理。关于数据处理的原则,有张大图,大家可以在大数据文摘公众号后台对话框内回复“加密货币”查看高清图。





  1. def load_data(filename, sequence_length):
  2.     """
  3.     Loads the bitcoin data
  4.     Arguments:
  5.     filename -- A string that represents where the .csv file can be located
  6.     sequence_length -- An integer of how many days should be looked at in a row
  7.     Returns:
  8.     X_train -- A tensor of shape (2400, 49, 35) that will be inputed into the model to train it
  9.     Y_train -- A tensor of shape (2400,) that will be inputed into the model to train it
  10.     X_test -- A tensor of shape (267, 49, 35) that will be used to test the model's proficiency
  11.     Y_test -- A tensor of shape (267,) that will be used to check the model's predictions
  12.     Y_daybefore -- A tensor of shape (267,) that represents the price of bitcoin the day before each Y_test value
  13.     unnormalized_bases -- A tensor of shape (267,) that will be used to get the true prices from the normalized ones
  14.     window_size -- An integer that represents how many days of X values the model can look at at once
  15.     """
  16.     #Read the data file
  17.     raw_data = pd.read_csv(filename, dtype = float).values
  18.     #Change all zeros to the number before the zero occurs
  19.     for x in range(0, raw_data.shape[0]):
  20.         for y in range(0, raw_data.shape[1]):
  21.             if(raw_data[x][y] == 0):
  22.                 raw_data[x][y] = raw_data[x-1][y]
  23.     #Convert the file to a list
  24.     data = raw_data.tolist()
  25.     #Convert the data to a 3D array (a x b x c)
  26.     #Where a is the number of days, b is the window size, and c is the number of features in the data file
  27.     result = []
  28.     for index in range(len(data) - sequence_length):
  29.         result.append(data[index: index + sequence_length])
  30.     #Normalizing data by going through each window
  31.     #Every value in the window is divided by the first value in the window, and then 1 is subtracted
  32.     d0 = np.array(result)
  33.     dr = np.zeros_like(d0)
  34.     dr[:,1:,:] = d0[:,1:,:] / d0[:,0:1,:] - 1
  35.     #Keeping the unnormalized prices for Y_test
  36.     #Useful when graphing bitcoin price over time later
  37.     start = 2400
  38.     end = int(dr.shape[0] + 1)
  39.     unnormalized_bases = d0[start:end,0:1,20]
  40.     #Splitting data set into training (First 90% of data points) and testing data (last 10% of data points)
  41.     split_line = round(0.9 * dr.shape[0])
  42.     training_data = dr[:int(split_line), :]
  43.     #Shuffle the data
  44.     np.random.shuffle(training_data)
  45.     #Training Data
  46.     X_train = training_data[:, :-1]
  47.     Y_train = training_data[:, -1]
  48.     Y_train = Y_train[:, 20]
  49.     #Testing data
  50.     X_test = dr[int(split_line):, :-1]
  51.     Y_test = dr[int(split_line):, 49, :]
  52.     Y_test = Y_test[:, 20]
  53.     #Get the day before Y_test's price
  54.     Y_daybefore = dr[int(split_line):, 48, :]
  55.     Y_daybefore = Y_daybefore[:, 20]
  56.     #Get window size and sequence length
  57.     sequence_length = sequence_length
  58.     window_size = sequence_length - 1 #because the last value is reserved as the y value
  59.     return X_train, Y_train, X_test, Y_test, Y_daybefore, unnormalized_bases, window_size
复制代码

建模






我们用到的是一个3层RNN,dropout率20%。


双向RNN基于这样的想法:时间t的输出不仅依赖于序列中的前一个元素,而且还可以取决于未来的元素。比如,要预测一个序列中缺失的单词,需要查看左侧和右侧的上下文。双向RNN是两个堆叠在一起的RNN,根据两个RNN的隐藏状态计算输出。


举个例子,这句话里缺失的单词gym要查看上下文才能知道(文摘菌:everyday?):


I go to the (  ) everyday to get fit.
  1. def initialize_model(window_size, dropout_value, activation_function, loss_function, optimizer):
  2.     """
  3.     Initializes and creates the model to be used
  4.     Arguments:
  5.     window_size -- An integer that represents how many days of X_values the model can look at at once
  6.     dropout_value -- A decimal representing how much dropout should be incorporated at each level, in this case 0.2
  7.     activation_function -- A string to define the activation_function, in this case it is linear
  8.     loss_function -- A string to define the loss function to be used, in the case it is mean squared error
  9.     optimizer -- A string to define the optimizer to be used, in the case it is adam
  10.     Returns:
  11.     model -- A 3 layer RNN with 100*dropout_value dropout in each layer that uses activation_function as its activation
  12.              function, loss_function as its loss function, and optimizer as its optimizer
  13.     """
  14.     #Create a Sequential model using Keras
  15.     model = Sequential()
  16.     #First recurrent layer with dropout
  17.     model.add(Bidirectional(LSTM(window_size, return_sequences=True), input_shape=(window_size, X_train.shape[-1]),))
  18.     model.add(Dropout(dropout_value))
  19.     #Second recurrent layer with dropout
  20.     model.add(Bidirectional(LSTM((window_size*2), return_sequences=True)))
  21.     model.add(Dropout(dropout_value))
  22.     #Third recurrent layer
  23.     model.add(Bidirectional(LSTM(window_size, return_sequences=False)))
  24.     #Output layer (returns the predicted value)
  25.     model.add(Dense(units=1))
  26.     #Set activation function
  27.     model.add(Activation(activation_function))
  28.     #Set loss function and optimizer
  29.     model.compile(loss=loss_function, optimizer=optimizer)
  30.     return model
复制代码

训练模型


这里取batch size = 1024,epoch times = 100。我们需要最小化均方误差MSE。


  1. def fit_model(model, X_train, Y_train, batch_num, num_epoch, val_split):
  2.     """
  3.     Fits the model to the training data
  4.     Arguments:
  5.     model -- The previously initalized 3 layer Recurrent Neural Network
  6.     X_train -- A tensor of shape (2400, 49, 35) that represents the x values of the training data
  7.     Y_train -- A tensor of shape (2400,) that represents the y values of the training data
  8.     batch_num -- An integer representing the batch size to be used, in this case 1024
  9.     num_epoch -- An integer defining the number of epochs to be run, in this case 100
  10.     val_split -- A decimal representing the proportion of training data to be used as validation data
  11.     Returns:
  12.     model -- The 3 layer Recurrent Neural Network that has been fitted to the training data
  13.     training_time -- An integer representing the amount of time (in seconds) that the model was training
  14.     """
  15.     #Record the time the model starts training
  16.     start = time.time()
  17.     #Train the model on X_train and Y_train
  18.     model.fit(X_train, Y_train, batch_size= batch_num, nb_epoch=num_epoch, validation_split= val_split)
  19.     #Get the time it took to train the model (in seconds)
  20.     training_time = int(math.floor(time.time() - start))
  21.     return model, training_time
复制代码

测试模型

  1. def test_model(model, X_test, Y_test, unnormalized_bases):
  2.     """
  3.     Test the model on the testing data
  4.     Arguments:
  5.     model -- The previously fitted 3 layer Recurrent Neural Network
  6.     X_test -- A tensor of shape (267, 49, 35) that represents the x values of the testing data
  7.     Y_test -- A tensor of shape (267,) that represents the y values of the testing data
  8.     unnormalized_bases -- A tensor of shape (267,) that can be used to get unnormalized data points
  9.     Returns:
  10.     y_predict -- A tensor of shape (267,) that represnts the normalized values that the model predicts based on X_test
  11.     real_y_test -- A tensor of shape (267,) that represents the actual prices of bitcoin throughout the testing period
  12.     real_y_predict -- A tensor of shape (267,) that represents the model's predicted prices of bitcoin
  13.     fig -- A branch of the graph of the real predicted prices of bitcoin versus the real prices of bitcoin
  14.     """
  15.     #Test the model on X_Test
  16.     y_predict = model.predict(X_test)
  17.     #Create empty 2D arrays to store unnormalized values
  18.     real_y_test = np.zeros_like(Y_test)
  19.     real_y_predict = np.zeros_like(y_predict)
  20.     #Fill the 2D arrays with the real value and the predicted value by reversing the normalization process
  21.     for i in range(Y_test.shape[0]):
  22.         y = Y_test[i]
  23.         predict = y_predict[i]
  24.         real_y_test[i] = (y+1)*unnormalized_bases[i]
  25.         real_y_predict[i] = (predict+1)*unnormalized_bases[i]
  26.     #Plot of the predicted prices versus the real prices
  27.     fig = plt.figure(figsize=(10,5))
  28.     ax = fig.add_subplot(111)
  29.     ax.set_title("Bitcoin Price Over Time")
  30.     plt.plot(real_y_predict, color = 'green', label = 'Predicted Price')
  31.     plt.plot(real_y_test, color = 'red', label = 'Real Price')
  32.     ax.set_ylabel("Price (USD)")
  33.     ax.set_xlabel("Time (Days)")
  34.     ax.legend()
  35.     return y_predict, real_y_test, real_y_predict, fig
复制代码

分析价格变化

  1. def price_change(Y_daybefore, Y_test, y_predict):
  2.     """
  3.     Calculate the percent change between each value and the day before
  4.     Arguments:
  5.     Y_daybefore -- A tensor of shape (267,) that represents the prices of each day before each price in Y_test
  6.     Y_test -- A tensor of shape (267,) that represents the normalized y values of the testing data
  7.     y_predict -- A tensor of shape (267,) that represents the normalized y values of the model's predictions
  8.     Returns:
  9.     Y_daybefore -- A tensor of shape (267, 1) that represents the prices of each day before each price in Y_test
  10.     Y_test -- A tensor of shape (267, 1) that represents the normalized y values of the testing data
  11.     delta_predict -- A tensor of shape (267, 1) that represents the difference between predicted and day before values
  12.     delta_real -- A tensor of shape (267, 1) that represents the difference between real and day before values
  13.     fig -- A plot representing percent change in bitcoin price per day,
  14.     """
  15.     #Reshaping Y_daybefore and Y_test
  16.     Y_daybefore = np.reshape(Y_daybefore, (-1, 1))
  17.     Y_test = np.reshape(Y_test, (-1, 1))
  18.     #The difference between each predicted value and the value from the day before
  19.     delta_predict = (y_predict - Y_daybefore) / (1+Y_daybefore)
  20.     #The difference between each true value and the value from the day before
  21.     delta_real = (Y_test - Y_daybefore) / (1+Y_daybefore)
  22.     #Plotting the predicted percent change versus the real percent change
  23.     fig = plt.figure(figsize=(10, 6))
  24.     ax = fig.add_subplot(111)
  25.     ax.set_title("Percent Change in Bitcoin Price Per Day")
  26.     plt.plot(delta_predict, color='green', label = 'Predicted Percent Change')
  27.     plt.plot(delta_real, color='red', label = 'Real Percent Change')
  28.     plt.ylabel("Percent Change")
  29.     plt.xlabel("Time (Days)")
  30.     ax.legend()
  31.     plt.show()
  32.     return Y_daybefore, Y_test, delta_predict, delta_real, fig
复制代码

分析价格百分比变化

  1. def binary_price(delta_predict, delta_real):
  2.     """
  3.     Converts percent change to a binary 1 or 0, where 1 is an increase and 0 is a decrease/no change
  4.     Arguments:
  5.     delta_predict -- A tensor of shape (267, 1) that represents the predicted percent change in price
  6.     delta_real -- A tensor of shape (267, 1) that represents the real percent change in price
  7.     Returns:
  8.     delta_predict_1_0 -- A tensor of shape (267, 1) that represents the binary version of delta_predict
  9.     delta_real_1_0 -- A tensor of shape (267, 1) that represents the binary version of delta_real
  10.     """
  11.     #Empty arrays where a 1 represents an increase in price and a 0 represents a decrease in price
  12.     delta_predict_1_0 = np.empty(delta_predict.shape)
  13.     delta_real_1_0 = np.empty(delta_real.shape)
  14.     #If the change in price is greater than zero, store it as a 1
  15.     #If the change in price is less than zero, store it as a 0
  16.     for i in range(delta_predict.shape[0]):
  17.         if delta_predict[i][0] > 0:
  18.             delta_predict_1_0[i][0] = 1
  19.         else:
  20.             delta_predict_1_0[i][0] = 0
  21.     for i in range(delta_real.shape[0]):
  22.         if delta_real[i][0] > 0:
  23.             delta_real_1_0[i][0] = 1
  24.         else:
  25.             delta_real_1_0[i][0] = 0
  26.     return delta_predict_1_0, delta_real_1_0
复制代码

比较预测值和实际数据

  1. def find_positives_negatives(delta_predict_1_0, delta_real_1_0):
  2.     """
  3.     Finding the number of false positives, false negatives, true positives, true negatives
  4.     Arguments:
  5.     delta_predict_1_0 -- A tensor of shape (267, 1) that represents the binary version of delta_predict
  6.     delta_real_1_0 -- A tensor of shape (267, 1) that represents the binary version of delta_real
  7.     Returns:
  8.     true_pos -- An integer that represents the number of true positives achieved by the model
  9.     false_pos -- An integer that represents the number of false positives achieved by the model
  10.     true_neg -- An integer that represents the number of true negatives achieved by the model
  11.     false_neg -- An integer that represents the number of false negatives achieved by the model
  12.     """
  13.     #Finding the number of false positive/negatives and true positives/negatives
  14.     true_pos = 0
  15.     false_pos = 0
  16.     true_neg = 0
  17.     false_neg = 0
  18.     for i in range(delta_real_1_0.shape[0]):
  19.         real = delta_real_1_0[i][0]
  20.         predicted = delta_predict_1_0[i][0]
  21.         if real == 1:
  22.             if predicted == 1:
  23.                 true_pos += 1
  24.             else:
  25.                 false_neg += 1
  26.         elif real == 0:
  27.             if predicted == 0:
  28.                 true_neg += 1
  29.             else:
  30.                 false_pos += 1
  31.     return true_pos, false_pos, true_neg, false_neg
复制代码

计算模型评估指标





  1. def calculate_statistics(true_pos, false_pos, true_neg, false_neg, y_predict, Y_test):
  2.    """
  3.    Calculate various statistics to assess performance
  4.    Arguments:
  5.    true_pos -- An integer that represents the number of true positives achieved by the model
  6.    false_pos -- An integer that represents the number of false positives achieved by the model
  7.    true_neg -- An integer that represents the number of true negatives achieved by the model
  8.    false_neg -- An integer that represents the number of false negatives achieved by the model
  9.    Y_test -- A tensor of shape (267, 1) that represents the normalized y values of the testing data
  10.    y_predict -- A tensor of shape (267, 1) that represents the normalized y values of the model's predictions
  11.    Returns:
  12.    precision -- How often the model gets a true positive compared to how often it returns a positive
  13.    recall -- How often the model gets a true positive compared to how often is hould have gotten a positive
  14.    F1 -- The weighted average of recall and precision
  15.    Mean Squared Error -- The average of the squares of the differences between predicted and real values
  16.    """
  17.    precision = float(true_pos) / (true_pos + false_pos)
  18.    recall = float(true_pos) / (true_pos + false_neg)
  19.    F1 = float(2 * precision * recall) / (precision + recall)
  20.    #Get Mean Squared Error
  21.    MSE = mean_squared_error(y_predict.flatten(), Y_test.flatten())
  22.    return precision, recall, F1, MSE
复制代码

结合在一起:可视化


终于可以看看我们的成果啦!


首先是预测价格vs实际价格:
  1. y_predict, real_y_test, real_y_predict, fig1 = test_model(model, X_test, Y_test, unnormalized_bases)
  2. #Show the plot
  3. plt.show(fig1)
复制代码





然后是预测的百分比变化vs实际的百分比变化,值得注意的是,这里的预测相对实际来说波动更大,这是模型可以提高的部分:


Y_daybefore, Y_test, delta_predict, delta_real, fig2 = price_change(Y_daybefore, Y_test, y_predict)

#Show the plot
plt.show(fig2)




最终模型表现是这样的:


Precision: 0.62
Recall: 0.553571428571
F1 score: 0.584905660377
Mean Squared Error: 0.0430756924477


怎么样,看完有没有跃跃欲试呢?


代码下载地址:
https://github.com/llSourcell/ethereum_future/blob/master/A%20Deep%20Learning%20Approach%20to%20Predicting%20Cryptocurrency%20Prices.ipynb
原视频地址:
https://www.youtube.com/watch?v=G5Mx7yYdEhE


作    者 | Siraj Raval 大数据文摘经授权译制
翻    译 | 糖竹子、狗小白、邓子稷
时间轴 | 韩振峰、Barbara、菜菜Tom
监    制 | 龙牧雪


【今日机器学习概念】
Have a Great Definition


志愿者介绍
回复“志愿者”加入我们










分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:90
帖子:18
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP