The residual errors from forecasts on a time series provide another source of information that we can model. Residual errors themselves form a time series that can have temporal structure.
A simple autoregression model of this structure can be used to predict the forecast error, which in turn can be used to correct forecasts. This type of model is called a moving average model, the same name but very different from moving average smoothing.
In this tutorial, you will discover how to model a residual error time series and use it to correct predictions with Python.
After completing this tutorial, you will know:
- About how to model residual error time series using an autoregressive model.
- How to develop and evaluate a model of residual error time series.
- How to use a model of residual error to correct predictions and improve forecast skill.
A. Model of Residual Errors
The difference between what was expected and what was predicted is called the residual error.
It is calculated as:
residual error = expected − predicted
Just like the input observations themselves, the residual errors from a time series can have temporal structure like trends, bias, and seasonality. Any temporal structure in the time series of residual forecast errors is useful.
A simple and effective model of residual error is an autoregression.
An autoregression of the residual error time series is called a Moving Average (MA) model.
B. Daily Female Births Dataset
This dataset describes the number of daily female births in California in 1959.
C. Persistence Forecast Model
The simplest forecast that we can make is to forecast that what happened in the previous time step will be the same as what will happen in the next time step. This is called the naive forecast or the persistence forecast model.
# calculate residual errors for a persistence forecast model
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.metrics import mean_squared_error
from math import sqrt
# load data
series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
# create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t', 't+1']
# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]
# persistence model
predictions = [x for x in test_X]
# skill of persistence model
rmse = sqrt(mean_squared_error(test_y, predictions))
print('Test RMSE: %.3f' % rmse)
# calculate residuals
residuals = [test_y[i]-predictions[i] for i in range(len(predictions))]
residuals = DataFrame(residuals)
print(residuals.head())
-----Result-----
Test RMSE: 9.151
0
0 9.0
1 -10.0
2 3.0
3 -6.0
4 30.0
0
0 9.0
1 -10.0
2 3.0
3 -6.0
4 30.0
D. Autoregression of Residual Error
We can model the residual error time series using an autoregression model. This is a linear regression model that creates a weighted linear sum of lagged residual error terms. For example:
error(t + 1) = b0 + (b1 * error(t)) + (b2 * error(t − 1)) ... + (bn * error(t − n))
# autoregressive model of residual errors
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from statsmodels.tsa.ar_model import AR
series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
# create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t', 't+1']
# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]
# persistence model on training set
train_pred = [x for x in train_X]
# calculate residuals
train_resid = [train_y[i]-train_pred[i] for i in range(len(train_pred))]
# model the training set residuals
model = AR(train_resid)
model_fit = model.fit()
window = model_fit.k_ar
coef = model_fit.params
print('Lag=%d, Coef=%s' % (window, coef))
-----Result-----
Lag=15, Coef=[ 0.10120699 -0.84940615 -0.77783609 -0.73345006 -0.68902061 -0.59270551
-0.5376728 -0.42553356 -0.24861246 -0.19972102 -0.15954013 -0.11045476
-0.14045572 -0.13299964 -0.12515801 -0.03615774]
-0.5376728 -0.42553356 -0.24861246 -0.19972102 -0.15954013 -0.11045476
-0.14045572 -0.13299964 -0.12515801 -0.03615774]
No comments:
Post a Comment