首页 > 其他分享 >sklearn.preprocessing + keras

sklearn.preprocessing + keras

时间:2024-01-14 17:23:16浏览次数:31  
标签:apple scaler keras dataset preprocessing prices import stock sklearn

sklearn.preprocessing + keras

sklearn 的数据预处理 可以对业务数据进行规范化, 和规范化后的数据还原,

经常跟其他的模型配合使用。

例如如下情况:

https://github.com/influxdata/influxdb-client-python/blob/master/notebooks/stock-predictions.ipynb

 

Example InfluxDB Jupyter notebook.

This example demonstrates how to query data from InfluxDB 2.0 using Flux and predict the stock price. (ML example using Keras)

Prerequisites

  • import testing dataset before running this notebook using python3 ./stock_predictions_import_data.py
  • install fallowing dependencies
    • pip3 install keras
    • pip3 install matplotlib
    • pip3 install pyplot
    • pip3 install tensorflow
    • pip3 install sklearn
 
# Import a Client

import os
import sys

sys.path.insert(0, os.path.abspath('../'))
 
from __future__ import print_function

import math
import os

import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display
from keras.layers.core import Dense
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler

from influxdb_client import InfluxDBClient
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
 
# parameters to be set ("optimum" hyperparameters obtained from grid search):
look_back = 7
epochs = 100
batch_size = 32
 
# fix random seed for reproducibility
np.random.seed(7)

# read all prices using panda
#prices_dataset =  pd.read_csv('./prices-split-adjusted.csv', header=0)

# read prices from InfluxDB 2.0 
client = InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org", debug=False)
query='''
from(bucket:"my-bucket")
        |> range(start: 0, stop: now())
        |> filter(fn: (r) => r._measurement == "financial-analysis")
        |> filter(fn: (r) => r.symbol == "AAPL")
        |> filter(fn: (r) => r._field == "close")
        |> drop(columns: ["_start", "result", "_stop", "table", "_field","_measurement"])
        |> rename(columns: {_value: "close"})
'''
prices_dataset = client.query_api().query_data_frame(org="my-org", query=query)
display(prices_dataset.head())

# save Apple's stock values as type of floating point number
apple_stock_prices = prices_dataset.close.values.astype('float32')
 
# reshape to column vector
apple_stock_prices = apple_stock_prices.reshape(len(apple_stock_prices), 1)

# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
apple_stock_prices = scaler.fit_transform(apple_stock_prices)
 
# split data into training set and test set
train_size = int(len(apple_stock_prices) * 0.67)
test_size = len(apple_stock_prices) - train_size
train, test = apple_stock_prices[0:train_size,:], apple_stock_prices[train_size:len(apple_stock_prices),:]

print('Split data into training set and test set... Number of training samples/ test samples:', len(train), len(test))
 
# convert an array of values into a time series dataset 
# in form 
#                     X                     Y
# t-look_back+1, t-look_back+2, ..., t     t+1

def create_dataset(dataset, look_back):
	dataX, dataY = [], []
	for i in range(len(dataset)-look_back-1):
		a = dataset[i:(i+look_back), 0]
		dataX.append(a)
		dataY.append(dataset[i + look_back, 0])
	return np.array(dataX), np.array(dataY)

# convert Apple's stock price data into time series dataset
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)

# reshape input of the LSTM to be format [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))
 
# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(look_back, 1)))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size)
 
model.summary()
 
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
 
# invert predictions and targets to unscaled
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])
 
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
 
# shift predictions of training data for plotting
trainPredictPlot = np.empty_like(apple_stock_prices)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

# shift predictions of test data for plotting
testPredictPlot = np.empty_like(apple_stock_prices)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(apple_stock_prices)-1, :] = testPredict
 
# plot baseline and predictions
plt.plot(scaler.inverse_transform(apple_stock_prices))
plt.plot(trainPredictPlot)
plt.plot(testPredictPlot)
plt.show()

 

preprocessing

https://scikit-learn.org/stable/modules/preprocessing.html

from sklearn import preprocessing
import numpy as np
X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])
scaler = preprocessing.StandardScaler().fit(X_train)
scaler

scaler.mean_

scaler.scale_

X_scaled = scaler.transform(X_train)

 

 

对于模型训练前需要进行规范化,

模型预测值需要反规范化的情况, 例如上面的时间序列

对于这种情况,不仅仅模型需要可保存,

规范化转换器也需要可保存,

joblib提供保存功能:

https://www.codenong.com/41993565/#google_vignette

from sklearn.externals import joblib
scaler_filename ="scaler.save"
joblib.dump(scaler, scaler_filename)

# And now to load...

scaler = joblib.load(scaler_filename)

 

标签:apple,scaler,keras,dataset,preprocessing,prices,import,stock,sklearn
From: https://www.cnblogs.com/lightsong/p/17963926

相关文章

  • 机器学习-Kmeans算法的sklearn实现
    fromsklearn.clusterimportKMeansfromsklearn.datasetsimportmake_blobsimportmatplotlib.pyplotasplt#可视化数据#生成数据n_samples=200n_clusters=3random_state=42X,y=make_blobs(n_samples=n_samples,centers=n_clusters,random_state=rando......
  • Keras and Transfer Learning: Harnessing PreTrained Models for Your Projects
    1.背景介绍Keras是一个开源的深度学习框架,由Google的TensorFlow团队开发。它提供了许多预训练的模型,可以用于各种项目。在这篇文章中,我们将讨论Keras和传输学习的基本概念,以及如何利用预训练模型来完成自己的项目。传输学习是一种机器学习方法,它涉及到在一种任务上训练的模......
  • Keras的 Transfer Learning:预训练模型的使用和优化
    1.背景介绍深度学习已经成为处理复杂数据和模式的首选方法。随着数据规模的增加,深度学习模型也在不断增长。然而,这些模型需要大量的数据和计算资源来训练,这可能是一个挑战。在这种情况下,TransferLearning(传输学习)成为了一种有效的解决方案。TransferLearning是一种机器学习方法,......
  • 人工智能算法原理与代码实战:从Keras到MXNet
    1.背景介绍人工智能(ArtificialIntelligence,AI)是一门研究如何让计算机自主地完成人类任务的学科。在过去的几十年里,人工智能研究主要集中在规则系统、知识表示和推理、以及机器学习等领域。随着大数据、云计算和深度学习等技术的发展,人工智能在过去几年里崛起,成为一个热门的研究......
  • 基于LSTM模型的时间序列预测(车厢重量预测),Python中Keras库实现LSTM,实现预测未来未知数
    简介LSTM是一种常用的循环神经网络,其全称为“长短期记忆网络”(LongShort-TermMemoryNetwork)。相较于传统的循环神经网络,LSTM具有更好的长期记忆能力和更强的时间序列建模能力,因此在各种自然语言处理、语音识别、时间序列预测等任务中广泛应用。问题场景:对一节火车进行装载货物,......
  • python+sklearn 机器学习代码备忘
    importsklearnfromsklearn.model_selectionimporttrain_test_splitfromsklearn.linear_modelimportLinearRegressionimportpandasaspdimportmatplotlib.pyplotaspltimportseabornassnsfromsklearnimportpreprocessingimportcsvimportnumpyas......
  • TF-IDF原理及Sklearn实现
    TF-IDF算法介绍TF-IDF(termfrequency–inversedocumentfrequency,词频-逆向文件频率)是一种用于信息检索(informationretrieval)与文本挖掘(textmining)的常用加权技术。TF-IDF是一种统计方法,用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随......
  • Keras 3.0正式发布:可用于TensorFlow、JAX和PyTorch
    前言 Keras3.0正式发布:可用于TensorFlow、JAX和PyTorch本文转载自机器之心仅用于学术分享,若侵权请联系删除欢迎关注公众号CV技术指南,专注于计算机视觉的技术总结、最新技术跟踪、经典论文解读、CV招聘信息。CV各大方向专栏与各个部署框架最全教程整理【CV技术指南】CV全栈......
  • 通过结巴分词 sklearn判断语句和例句集合最相近的句子
    `importjiebafromsklearn.feature_extraction.textimportTfidfVectorizerfromsklearn.metrics.pairwiseimportcosine_similaritytemplates=["分析一下攻击队QAX的攻击行为","分析一下防守单位QAX的防守情况","分析一下目标资产1.1.1.1相关的攻击行为","攻击队QAX......
  • PYTHON用KERAS的LSTM神经网络进行时间序列预测天然气价格例子|附代码数据
    全文下载链接:http://tecdat.cn?p=26519最近我们被客户要求撰写关于LSTM的研究报告,包括一些图形和统计输出。一个简单的编码器-解码器LSTM神经网络应用于时间序列预测问题:预测天然气价格,预测范围为10天。“进入”时间步长也设置为10天。)只需要10天来推断接下来的10天。......