Selecting a time series forecasting model is just the beginning. Using the chosen model in practice can pose challenges, including data transformations and storing the model parameters on disk.
In this tutorial, you will discover how to finalize a time series forecasting model and use it to make predictions in Python.
After completing this tutorial, you will know:
- How to finalize a model and save it and required data to file.
- How to load a finalized model from file and use it to make a prediction.
- How to update data associated with a finalized model in order to make subsequent predictions.
A. Process for Making a Prediction
Once you can build and tune forecast models for your data, the process of making a prediction involves the following steps:
- Model Selection. This is where you choose a model and gather evidence and support to defend the decision.
- Model Finalization. The chosen model is trained on all available data and saved to file for later use.
- Forecasting. The saved model is loaded and used to make a forecast.
- Model Update. Elements of the model are updated in the presence of new observations.
We will take a look at each of these elements in this tutorial, with a focus on saving and loading the model to and from file and using a loaded model to make predictions.
B. Daily Female Births Dataset
This dataset describes the number of daily female births in California in 1959.
C. Select Time Series Forecast Model
You must select a model. This is where the bulk of the effort will be in preparing the data, performing analysis, and ultimately selecting a model and model hyperparameters that best capture the relationships in the data.
In this case, we can arbitrarily select an autoregression model (AR) with a lag of 6 on the differenced dataset. We can demonstrate this model below. First, the data is transformed by differencing, with each observation transformed as:
value(t) = obs(t) − obs(t − 1)
Next, the AR(6) model is trained on 66% of the historical data. The regression coefficients learned by the model are extracted and used to make predictions in a rolling manner across the test dataset.
As each time step in the test dataset is executed, the prediction is made using the coefficients and stored. The actual observation for the time step is then made available and stored to be used as a lag variable for future predictions.
# fit and evaluate an AR model
from pandas import read_csv
from matplotlib import pyplot
from statsmodels.tsa.ar_model import AR
from sklearn.metrics import mean_squared_error
import numpy
from math import sqrt
# create a difference transform of the dataset
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
iff.append(value)
return numpy.array(diff)
# Make a prediction give regression coefficients and lag obs
def predict(coef, history):
yhat = coef[0]
for i in range(1, len(coef)):
yhat += coef[i] * history[-i]
return yhat
series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
# split dataset
X = difference(series.values)
size = int(len(X) * 0.66)
train, test = X[0:size], X[size:]
# train autoregression
model = AR(train)
model_fit = model.fit(maxlag=6, disp=False)
window = model_fit.k_ar
coef = model_fit.params
# walk forward over time steps in test
history = [train[i] for i in range(len(train))]
predictions = list()
for t in range(len(test)):
yhat = predict(coef, history)
obs = test[t]
predictions.append(yhat)
history.append(obs)
rmse = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)
# plot
pyplot.plot(test)
pyplot.plot(predictions, color='red')
pyplot.show()
-----Result-----
Test RMSE: 7.259
Line plot of expected values (blue) and AR model predictions (red) on the Daily Female Births dataset |
D. Finalize and Save Time Series Forecast Model
Once the model is selected, we must finalize it. This means save the salient information learned by the model so that we do not have to re-create it every time a prediction is needed. This involves first training the model on all available data and then saving the model to file.
The Statsmodels implementations of time series models do provide built-in capability to save and load models by calling save() and load() on the fit ARResults object. For example, the code below will train an AR(6) model on the entire Female Births dataset and save it using the built-in save() function, which will essentially pickle the ARResults object.
The differenced training data must also be saved, both for the lag variables needed to make a prediction, and for knowledge of the number of observations seen, required by the predict() function of the ARResults object.
Finally, we need to be able to transform the differenced dataset back into the original form. To do this, we must keep track of the last actual observation. This is so that the predicted differenced value can be added to it.
# fit an AR model and save the whole model to file
from pandas import read_csv
from statsmodels.tsa.ar_model import AR
import numpy
# create a difference transform of the dataset
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
diff.append(value)
return numpy.array(diff)
# load dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
X = difference(series.values)
# fit model
model = AR(X)
model_fit = model.fit(maxlag=6, disp=False)
# save model to file
model_fit.save('ar_model.pkl')
# save the differenced dataset
numpy.save('ar_data.npy', X)
# save the last ob
numpy.save('ar_obs.npy', [series.values[-1]])
This code will create a file ar model.pkl that you can load later and use to make predictions. The entire differenced training dataset is saved as ar data.npy and the last observation is saved in the file ar obs.npy as an array with one item.
The NumPy save() function is used to save the differenced training data and the observation. The load() function can then be used to load these arrays later. The snippet below will load the model, differenced data, and last observation.
# load the AR model from file
from statsmodels.tsa.ar_model import ARResults
import numpy
loaded = ARResults.load('ar_model.pkl')
print(loaded.params)
data = numpy.load('ar_data.npy')
last_ob = numpy.load('ar_obs.npy')
print(last_ob)
-----Result-----
[ 0.12129822 -0.75275857 -0.612367 -0.51097172 -0.4176669 -0.32116469 -0.23412997]
The example below saves just the coefficients from the model, as well as the minimum differenced lag values required to make the next prediction and the last observation needed to transform the next prediction made.
# fit an AR model and manually save coefficients to file
from pandas import read_csv
from statsmodels.tsa.ar_model import AR
import numpy
# create a difference transform of the dataset
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
diff.append(value)
return numpy.array(diff)
# load dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
X = difference(series.values)
# fit model
window_size = 6
model = AR(X)
model_fit = model.fit(maxlag=window_size, disp=False)
# save coefficients
coef = model_fit.params
numpy.save('man_model.npy', coef)
# save lag
lag = X[-window_size:]
numpy.save('man_data.npy', lag)
# save the last ob
numpy.save('man_obs.npy', [series.values[-1]])
The coefficients are saved in the local file man model.npy, the lag history is saved in the file man data.npy, and the last observation is saved in the file man obs.npy. These values can then be loaded again as follows:
# load the manually saved model from file
import numpy
coef = numpy.load('man_model.npy')
print(coef)
lag = numpy.load('man_data.npy')
print(lag)
last_ob = numpy.load('man_obs.npy')
print(last_ob)
-----Result-----
[ 0.12129822 -0.75275857 -0.612367 -0.51097172 -0.4176669 -0.32116469
-0.23412997]
[-10 3 15 -4 7 -5]
[50]
E. Make a Time Series Forecast
Making a forecast involves loading the saved model and estimating the observation at the next time step. If the ARResults object was serialized, we can use the predict() function to predict the next time period.
# load AR model from file and make a one-step prediction
from statsmodels.tsa.ar_model import ARResults
import numpy
# load model
model = ARResults.load('ar_model.pkl')
data = numpy.load('ar_data.npy')
last_ob = numpy.load('ar_obs.npy')
# make prediction
predictions = model.predict(start=len(data), end=len(data))
# transform prediction
yhat = predictions[0] + last_ob[0]
print('Prediction: %f' % yhat)
-----Result-----
Prediction: 46.755211
We can also use a similar trick to load the raw coefficients and make a manual prediction.
# load a coefficients and from file and make a manual prediction
import numpy
def predict(coef, history):
yhat = coef[0]
for i in range(1, len(coef)):
yhat += coef[i] * history[-i]
return yhat
# load model
coef = numpy.load('man_model.npy')
lag = numpy.load('man_data.npy')
last_ob = numpy.load('man_obs.npy')
# make prediction
prediction = predict(coef, lag)
# transform prediction
yhat = prediction + last_ob[0]
print('Prediction: %f' % yhat)
-----Result-----
Prediction: 46.755211
F. Update Forecast Model
Once the next real observation is made available, we must update the data associated with the model. Specifically, we must update:
- The differenced training dataset used as inputs to make the subsequent prediction.
- The last observation, providing a context for the predicted differenced value
In the case of the stored AR model, we can update the ar data.npy and ar obs.npy files.
# update the data for the AR model with a new obs
import numpy
# get real observation
observation = 48
# load the saved data
data = numpy.load('ar_data.npy')
last_ob = numpy.load('ar_obs.npy')
# update and save differenced observation
diffed = observation - last_ob[0]
data = numpy.append(data, [diffed], axis=0)
numpy.save('ar_data.npy', data)
# update and save real observation
last_ob[0] = observation
numpy.save('ar_obs.npy', last_ob)
We can make the same changes for the data files for the manual case
# update the data for the manual model with a new obs
import numpy
# get real observation
observation = 48
# update and save differenced observation
lag = numpy.load('man_data.npy')
last_ob = numpy.load('man_obs.npy')
diffed = observation - last_ob[0]
lag = numpy.append(lag[1:], [diffed], axis=0)
numpy.save('man_data.npy', lag)
# update and save real observation
last_ob[0] = observation
numpy.save('man_obs.npy', last_ob)
No comments:
Post a Comment