南寧美容網(wǎng)站建設(shè)seo智能優(yōu)化公司
基于WIN10的64位系統(tǒng)演示
一、寫(xiě)在前面
這一期,我們介紹CNN回歸。
同樣,這里使用這個(gè)數(shù)據(jù):
《PLoS One》2015年一篇題目為《Comparison of Two Hybrid Models for Forecasting the Incidence of Hemorrhagic Fever with Renal Syndrome in Jiangsu Province, China》文章的公開(kāi)數(shù)據(jù)做演示。數(shù)據(jù)為江蘇省2004年1月至2012年12月腎綜合癥出血熱月發(fā)病率。運(yùn)用2004年1月至2011年12月的數(shù)據(jù)預(yù)測(cè)2012年12個(gè)月的發(fā)病率數(shù)據(jù)。
二、CNN回歸
(1)原理
卷積神經(jīng)網(wǎng)絡(luò)(CNN)最初是為圖像識(shí)別和處理而設(shè)計(jì)的,但它們已經(jīng)被證明對(duì)于各種類(lèi)型的序列數(shù)據(jù),包括時(shí)間序列,也是有效的。以下是一些關(guān)于CNN在時(shí)間序列預(yù)測(cè)中應(yīng)用的原理:
(a)局部感受野:
-CNN的關(guān)鍵特點(diǎn)是它的局部感受野,這意味著每個(gè)卷積核只查看輸入數(shù)據(jù)的一個(gè)小部分。
-對(duì)于時(shí)間序列,這意味著CNN可以捕獲和學(xué)習(xí)模式中的短期依賴關(guān)系或周期性。
-這類(lèi)似于在時(shí)間序列分析中使用滑動(dòng)窗口來(lái)捕獲短期模式。
(b)參數(shù)共享:
-在CNN中,卷積核的權(quán)重在輸入的所有部分上都是共享的。
-這意味著網(wǎng)絡(luò)可以在時(shí)間序列的任何位置都識(shí)別出相同的模式,增加了其泛化能力。
(c)多尺度特征捕獲:
-通過(guò)使用多個(gè)卷積層和池化層,CNN能夠在不同的時(shí)間尺度上捕獲模式。
-這使得它們能夠捕獲長(zhǎng)期和短期的時(shí)間序列依賴關(guān)系。
(d)堆疊結(jié)構(gòu):
多層的CNN結(jié)構(gòu)使得網(wǎng)絡(luò)可以學(xué)習(xí)時(shí)間序列中的復(fù)雜和抽象的模式。例如,第一層可能會(huì)捕獲簡(jiǎn)單的趨勢(shì)或周期性,而更深層的網(wǎng)絡(luò)可能會(huì)捕獲更復(fù)雜的季節(jié)性模式或其他非線性關(guān)系。
(e)自動(dòng)特征學(xué)習(xí):
-傳統(tǒng)的時(shí)間序列分析方法通常需要手動(dòng)選擇和構(gòu)造特征。
-使用CNN,網(wǎng)絡(luò)可以自動(dòng)從原始數(shù)據(jù)中學(xué)習(xí)和提取相關(guān)特征,這通常導(dǎo)致更好的性能和更少的手工工作。
(f)時(shí)間序列的結(jié)構(gòu)化特征:
-和圖像數(shù)據(jù)一樣,時(shí)間序列數(shù)據(jù)也具有結(jié)構(gòu)性。例如,過(guò)去的觀察結(jié)果通常影響未來(lái)的觀察結(jié)果。
-CNN利用這種結(jié)構(gòu)性,通過(guò)卷積操作從數(shù)據(jù)中提取局部和全局的時(shí)間模式。
總之,雖然CNN最初是為圖像設(shè)計(jì)的,但它們?cè)谔幚硇蛄袛?shù)據(jù),特別是時(shí)間序列數(shù)據(jù)時(shí),已經(jīng)顯示出了很強(qiáng)的潛力。這是因?yàn)樗鼈兛梢宰詣?dòng)從數(shù)據(jù)中學(xué)習(xí)重要的特征,捕獲多種尺度的模式,并適應(yīng)時(shí)間序列中的短期和長(zhǎng)期依賴關(guān)系。
(2)單步滾動(dòng)預(yù)測(cè)
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D
from tensorflow.python.keras.optimizers import adam_v2# 讀取數(shù)據(jù)
data = pd.read_csv('data.csv')# 將時(shí)間列轉(zhuǎn)換為日期格式
data['time'] = pd.to_datetime(data['time'], format='%b-%y')# 創(chuàng)建滯后期特征
lag_period = 6
for i in range(lag_period, 0, -1):data[f'lag_{i}'] = data['incidence'].shift(lag_period - i + 1)# 刪除包含 NaN 的行
data = data.dropna().reset_index(drop=True)# 劃分訓(xùn)練集和驗(yàn)證集
train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
validation_data = data[(data['time'] >= '2012-01-01') & (data['time'] <= '2012-12-31')]# 定義特征和目標(biāo)變量
X_train = train_data[['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6']].values
y_train = train_data['incidence'].values
X_validation = validation_data[['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5', 'lag_6']].values
y_validation = validation_data['incidence'].values# 對(duì)于CNN,我們需要將輸入數(shù)據(jù)重塑為3D格式 [samples, timesteps, features]
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_validation = X_validation.reshape(X_validation.shape[0], X_validation.shape[1], 1)# 構(gòu)建CNN模型
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(X_train.shape[1], 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(1))model.compile(optimizer=adam_v2.Adam(learning_rate=0.001), loss='mse')# 訓(xùn)練模型
history = model.fit(X_train, y_train, epochs=200, batch_size=32, validation_data=(X_validation, y_validation), verbose=0)# 單步滾動(dòng)預(yù)測(cè)函數(shù)
def rolling_forecast(model, initial_features, n_forecasts):forecasts = []current_features = initial_features.copy()for i in range(n_forecasts):# 使用當(dāng)前的特征進(jìn)行預(yù)測(cè)forecast = model.predict(current_features.reshape(1, len(current_features), 1)).flatten()[0]forecasts.append(forecast)# 更新特征,用新的預(yù)測(cè)值替換最舊的特征current_features = np.roll(current_features, shift=-1)current_features[-1] = forecastreturn np.array(forecasts)# 使用訓(xùn)練集的最后6個(gè)數(shù)據(jù)點(diǎn)作為初始特征
initial_features = X_train[-1].flatten()# 使用單步滾動(dòng)預(yù)測(cè)方法預(yù)測(cè)驗(yàn)證集
y_validation_pred = rolling_forecast(model, initial_features, len(X_validation))# 計(jì)算訓(xùn)練集上的MAE, MAPE, MSE 和 RMSE
mae_train = mean_absolute_error(y_train, model.predict(X_train).flatten())
mape_train = np.mean(np.abs((y_train - model.predict(X_train).flatten()) / y_train))
mse_train = mean_squared_error(y_train, model.predict(X_train).flatten())
rmse_train = np.sqrt(mse_train)# 計(jì)算驗(yàn)證集上的MAE, MAPE, MSE 和 RMSE
mae_validation = mean_absolute_error(y_validation, y_validation_pred)
mape_validation = np.mean(np.abs((y_validation - y_validation_pred) / y_validation))
mse_validation = mean_squared_error(y_validation, y_validation_pred)
rmse_validation = np.sqrt(mse_validation)print("驗(yàn)證集:", mae_validation, mape_validation, mse_validation, rmse_validation)
print("訓(xùn)練集:", mae_train, mape_train, mse_train, rmse_train)
看結(jié)果:
(3)多步滾動(dòng)預(yù)測(cè)-vol. 1
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D
from tensorflow.python.keras.optimizers import adam_v2# 讀取數(shù)據(jù)
data = pd.read_csv('data.csv')
data['time'] = pd.to_datetime(data['time'], format='%b-%y')n = 6
m = 2# 創(chuàng)建滯后期特征
for i in range(n, 0, -1):data[f'lag_{i}'] = data['incidence'].shift(n - i + 1)data = data.dropna().reset_index(drop=True)train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
validation_data = data[(data['time'] >= '2012-01-01') & (data['time'] <= '2012-12-31')]# 準(zhǔn)備訓(xùn)練數(shù)據(jù)
X_train = []
y_train = []for i in range(len(train_data) - n - m + 1):X_train.append(train_data.iloc[i+n-1][[f'lag_{j}' for j in range(1, n+1)]].values)y_train.append(train_data.iloc[i+n:i+n+m]['incidence'].values)X_train = np.array(X_train)
y_train = np.array(y_train)
X_train = X_train.astype(np.float32)
y_train = y_train.astype(np.float32)# 為CNN準(zhǔn)備數(shù)據(jù)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)# 構(gòu)建CNN模型
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(X_train.shape[1], 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(m))model.compile(optimizer=adam_v2.Adam(learning_rate=0.001), loss='mse')# 訓(xùn)練模型
model.fit(X_train, y_train, epochs=200, batch_size=32, verbose=0)def cnn_rolling_forecast(data, model, n, m):y_pred = []for i in range(len(data) - n):input_data = data.iloc[i+n-1][[f'lag_{j}' for j in range(1, n+1)]].values.astype(np.float32).reshape(1, n, 1)pred = model.predict(input_data)y_pred.extend(pred[0])# Handle overlapping predictions by averagingfor i in range(1, m):for j in range(len(y_pred) - i):y_pred[j+i] = (y_pred[j+i] + y_pred[j]) / 2return np.array(y_pred)# Predict for train_data and validation_data
y_train_pred_cnn = cnn_rolling_forecast(train_data, model, n, m)[:len(y_train)]
y_validation_pred_cnn = cnn_rolling_forecast(validation_data, model, n, m)[:len(validation_data) - n]# Calculate performance metrics for train_data
mae_train = mean_absolute_error(train_data['incidence'].values[n:len(y_train_pred_cnn)+n], y_train_pred_cnn)
mape_train = np.mean(np.abs((train_data['incidence'].values[n:len(y_train_pred_cnn)+n] - y_train_pred_cnn) / train_data['incidence'].values[n:len(y_train_pred_cnn)+n]))
mse_train = mean_squared_error(train_data['incidence'].values[n:len(y_train_pred_cnn)+n], y_train_pred_cnn)
rmse_train = np.sqrt(mse_train)# Calculate performance metrics for validation_data
mae_validation = mean_absolute_error(validation_data['incidence'].values[n:len(y_validation_pred_cnn)+n], y_validation_pred_cnn)
mape_validation = np.mean(np.abs((validation_data['incidence'].values[n:len(y_validation_pred_cnn)+n] - y_validation_pred_cnn) / validation_data['incidence'].values[n:len(y_validation_pred_cnn)+n]))
mse_validation = mean_squared_error(validation_data['incidence'].values[n:len(y_validation_pred_cnn)+n], y_validation_pred_cnn)
rmse_validation = np.sqrt(mse_validation)print("訓(xùn)練集:", mae_train, mape_train, mse_train, rmse_train)
print("驗(yàn)證集:", mae_validation, mape_validation, mse_validation, rmse_validation)
結(jié)果:
(4)多步滾動(dòng)預(yù)測(cè)-vol. 2
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D
from tensorflow.python.keras.optimizers import adam_v2# Loading and preprocessing the data
data = pd.read_csv('data.csv')
data['time'] = pd.to_datetime(data['time'], format='%b-%y')n = 6 # 使用前6個(gè)數(shù)據(jù)點(diǎn)
m = 2 # 預(yù)測(cè)接下來(lái)的2個(gè)數(shù)據(jù)點(diǎn)# 創(chuàng)建滯后期特征
for i in range(n, 0, -1):data[f'lag_{i}'] = data['incidence'].shift(n - i + 1)data = data.dropna().reset_index(drop=True)train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
validation_data = data[(data['time'] >= '2012-01-01') & (data['time'] <= '2012-12-31')]# 只對(duì)X_train、y_train、X_validation取奇數(shù)行
X_train = train_data[[f'lag_{i}' for i in range(1, n+1)]].iloc[::2].reset_index(drop=True).values
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1) # Reshape for CNN# 創(chuàng)建m個(gè)目標(biāo)變量
y_train_list = [train_data['incidence'].shift(-i) for i in range(m)]
y_train = pd.concat(y_train_list, axis=1)
y_train.columns = [f'target_{i+1}' for i in range(m)]
y_train = y_train.iloc[::2].reset_index(drop=True).dropna().values[:, 0] # Only take the first column for simplicityX_validation = validation_data[[f'lag_{i}' for i in range(1, n+1)]].iloc[::2].reset_index(drop=True).values
X_validation = X_validation.reshape(X_validation.shape[0], X_validation.shape[1], 1) # Reshape for CNNy_validation = validation_data['incidence'].values# Building the CNN model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(X_train.shape[1], 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(1))optimizer = adam_v2.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='mse')# Train the model
model.fit(X_train, y_train, epochs=200, batch_size=32, verbose=0)# Predict on validation set
y_validation_pred = model.predict(X_validation).flatten()# Compute metrics for validation set
mae_validation = mean_absolute_error(y_validation[:len(y_validation_pred)], y_validation_pred)
mape_validation = np.mean(np.abs((y_validation[:len(y_validation_pred)] - y_validation_pred) / y_validation[:len(y_validation_pred)]))
mse_validation = mean_squared_error(y_validation[:len(y_validation_pred)], y_validation_pred)
rmse_validation = np.sqrt(mse_validation)# Predict on training set
y_train_pred = model.predict(X_train).flatten()# Compute metrics for training set
mae_train = mean_absolute_error(y_train, y_train_pred)
mape_train = np.mean(np.abs((y_train - y_train_pred) / y_train))
mse_train = mean_squared_error(y_train, y_train_pred)
rmse_train = np.sqrt(mse_train)print("驗(yàn)證集:", mae_validation, mape_validation, mse_validation, rmse_validation)
print("訓(xùn)練集:", mae_train, mape_train, mse_train, rmse_train)
結(jié)果:
(5)多步滾動(dòng)預(yù)測(cè)-vol. 3
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Conv1D, Flatten, MaxPooling1D
from tensorflow.python.keras.optimizers import adam_v2# 數(shù)據(jù)讀取和預(yù)處理
data = pd.read_csv('data.csv')
data_y = pd.read_csv('data.csv')
data['time'] = pd.to_datetime(data['time'], format='%b-%y')
data_y['time'] = pd.to_datetime(data_y['time'], format='%b-%y')n = 6for i in range(n, 0, -1):data[f'lag_{i}'] = data['incidence'].shift(n - i + 1)data = data.dropna().reset_index(drop=True)
train_data = data[(data['time'] >= '2004-01-01') & (data['time'] <= '2011-12-31')]
X_train = train_data[[f'lag_{i}' for i in range(1, n+1)]]
m = 3X_train_list = []
y_train_list = []for i in range(m):X_temp = X_trainy_temp = data_y['incidence'].iloc[n + i:len(data_y) - m + 1 + i]X_train_list.append(X_temp)y_train_list.append(y_temp)for i in range(m):X_train_list[i] = X_train_list[i].iloc[:-(m-1)].valuesX_train_list[i] = X_train_list[i].reshape(X_train_list[i].shape[0], X_train_list[i].shape[1], 1) # Reshape for CNNy_train_list[i] = y_train_list[i].iloc[:len(X_train_list[i])].values# 模型訓(xùn)練
models = []
for i in range(m):# Build CNN modelmodel = Sequential()model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(X_train_list[i].shape[1], 1)))model.add(MaxPooling1D(pool_size=2))model.add(Flatten())model.add(Dense(50, activation='relu'))model.add(Dense(1))optimizer = adam_v2.Adam(learning_rate=0.001)model.compile(optimizer=optimizer, loss='mse')model.fit(X_train_list[i], y_train_list[i], epochs=200, batch_size=32, verbose=0)models.append(model)validation_start_time = train_data['time'].iloc[-1] + pd.DateOffset(months=1)
validation_data = data[data['time'] >= validation_start_time]
X_validation = validation_data[[f'lag_{i}' for i in range(1, n+1)]].values
X_validation = X_validation.reshape(X_validation.shape[0], X_validation.shape[1], 1) # Reshape for CNNy_validation_pred_list = [model.predict(X_validation) for model in models]
y_train_pred_list = [model.predict(X_train_list[i]) for i, model in enumerate(models)]def concatenate_predictions(pred_list):concatenated = []for j in range(len(pred_list[0])):for i in range(m):concatenated.append(pred_list[i][j])return concatenatedy_validation_pred = np.array(concatenate_predictions(y_validation_pred_list))[:len(validation_data['incidence'])]
y_train_pred = np.array(concatenate_predictions(y_train_pred_list))[:len(train_data['incidence']) - m + 1]
y_validation_pred = y_validation_pred.flatten()
y_train_pred = y_train_pred.flatten()mae_validation = mean_absolute_error(validation_data['incidence'], y_validation_pred)
mape_validation = np.mean(np.abs((validation_data['incidence'] - y_validation_pred) / validation_data['incidence']))
mse_validation = mean_squared_error(validation_data['incidence'], y_validation_pred)
rmse_validation = np.sqrt(mse_validation)mae_train = mean_absolute_error(train_data['incidence'][:-(m-1)], y_train_pred)
mape_train = np.mean(np.abs((train_data['incidence'][:-(m-1)] - y_train_pred) / train_data['incidence'][:-(m-1)]))
mse_train = mean_squared_error(train_data['incidence'][:-(m-1)], y_train_pred)
rmse_train = np.sqrt(mse_train)print("驗(yàn)證集:", mae_validation, mape_validation, mse_validation, rmse_validation)
print("訓(xùn)練集:", mae_train, mape_train, mse_train, rmse_train)
結(jié)果:
三、寫(xiě)在后面
本例中,我們只搭建了一個(gè)簡(jiǎn)單的CNN網(wǎng)絡(luò)。具體實(shí)踐中,大家可以換成其他的CNN網(wǎng)絡(luò)結(jié)構(gòu),甚至是之前介紹的各種預(yù)訓(xùn)練模型,VGG19和各種Net系列,可能有驚喜或者驚嚇哦。
四、數(shù)據(jù)
鏈接:https://pan.baidu.com/s/1EFaWfHoG14h15KCEhn1STg?pwd=q41n
提取碼:q41n