Machine learning: Forecast Models - Part 3 - Moving Average Models for Forecasting

The residual errors from forecasts on a time series provide another source of information that we can model. Residual errors themselves form a time series that can have temporal structure.

A simple autoregression model of this structure can be used to predict the forecast error, which in turn can be used to correct forecasts. This type of model is called a moving average model, the same name but very different from moving average smoothing.

In this tutorial, you will discover how to model a residual error time series and use it to correct predictions with Python.

After completing this tutorial, you will know:

About how to model residual error time series using an autoregressive model.
How to develop and evaluate a model of residual error time series.
How to use a model of residual error to correct predictions and improve forecast skill.

A. Model of Residual Errors

The difference between what was expected and what was predicted is called the residual error.

It is calculated as:

residual error = expected − predicted

Just like the input observations themselves, the residual errors from a time series can have temporal structure like trends, bias, and seasonality. Any temporal structure in the time series of residual forecast errors is useful.

A simple and effective model of residual error is an autoregression.

An autoregression of the residual error time series is called a Moving Average (MA) model.

B. Daily Female Births Dataset

This dataset describes the number of daily female births in California in 1959.

C. Persistence Forecast Model

The simplest forecast that we can make is to forecast that what happened in the previous time step will be the same as what will happen in the next time step. This is called the naive forecast or the persistence forecast model.

# calculate residual errors for a persistence forecast model

from pandas import read_csv

from pandas import DataFrame

from pandas import concat

from sklearn.metrics import mean_squared_error

from math import sqrt

# load data

series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

# create lagged dataset

values = DataFrame(series.values)

dataframe = concat([values.shift(1), values], axis=1)

dataframe.columns = ['t', 't+1']

# split into train and test sets

X = dataframe.values

train_size = int(len(X) * 0.66)

train, test = X[1:train_size], X[train_size:]

train_X, train_y = train[:,0], train[:,1]

test_X, test_y = test[:,0], test[:,1]

# persistence model

predictions = [x for x in test_X]

# skill of persistence model

rmse = sqrt(mean_squared_error(test_y, predictions))

print('Test RMSE: %.3f' % rmse)

# calculate residuals

residuals = [test_y[i]-predictions[i] for i in range(len(predictions))]

residuals = DataFrame(residuals)

print(residuals.head())

-----Result-----

Test RMSE: 9.151
       0
0    9.0
1    -10.0
2    3.0
3    -6.0
4    30.0

D. Autoregression of Residual Error

We can model the residual error time series using an autoregression model. This is a linear regression model that creates a weighted linear sum of lagged residual error terms. For example:

error(t + 1) = b0 + (b1 * error(t)) + (b2 * error(t − 1)) ... + (bn * error(t − n))

# autoregressive model of residual errors

from pandas import read_csv

from pandas import DataFrame

from pandas import concat

from statsmodels.tsa.ar_model import AR

series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

# create lagged dataset

values = DataFrame(series.values)

dataframe = concat([values.shift(1), values], axis=1)

dataframe.columns = ['t', 't+1']

# split into train and test sets

X = dataframe.values

train_size = int(len(X) * 0.66)

train, test = X[1:train_size], X[train_size:]

train_X, train_y = train[:,0], train[:,1]

test_X, test_y = test[:,0], test[:,1]

# persistence model on training set

train_pred = [x for x in train_X]

# calculate residuals

train_resid = [train_y[i]-train_pred[i] for i in range(len(train_pred))]

# model the training set residuals

model = AR(train_resid)

model_fit = model.fit()

window = model_fit.k_ar

coef = model_fit.params

print('Lag=%d, Coef=%s' % (window, coef))

-----Result-----

Lag=15, Coef=[ 0.10120699 -0.84940615 -0.77783609 -0.73345006 -0.68902061 -0.59270551
-0.5376728 -0.42553356 -0.24861246 -0.19972102 -0.15954013 -0.11045476
-0.14045572 -0.13299964 -0.12515801 -0.03615774]

Machine learning

Menu bar

16/11/2021

Forecast Models - Part 3 - Moving Average Models for Forecasting

No comments:

Post a Comment