The Autoregressive Integrated Moving Average Model, or ARIMA for short is a standard statistical model for time series forecast and analysis.
In this lesson, you will discover the Box-Jenkins Method and tips for using it on your time series forecasting problem. Specifically, you will learn:
In this lesson, you will discover the Box-Jenkins Method and tips for using it on your time series forecasting problem. Specifically, you will learn:
- About the ARIMA model and the 3 steps of the general Box-Jenkins Method.
- How to choose the parameters for an ARIMA model.
- How to use overfitting and residual errors to diagnose a fit ARIMA model.
A. Autoregressive Integrated Moving Average Model
An ARIMA model is a class of statistical model for analyzing and forecasting time series data.
- AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
- I: Integrated. The use of differencing of raw observations (i.e. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.
- MA: Moving Average. A model that uses the dependency between an observation and residual errors from a moving average model applied to lagged observations.
The parameters (p,d,q) of the ARIMA model are defined as follows:
- p: The number of lag observations included in the model, also called the lag order.
- d: The number of times that the raw observations are differenced, also called the degree of differencing.
- q: The size of the moving average window, also called the order of moving average.
B. Box-Jenkins Method
The approach starts with the assumption that the process that generated the time series can be approximated using an ARMA model if it is stationary or an ARIMA model if it is non-stationary.
- Identification. Use the data and all related information to help select a sub-class of model that may best summarize the data.
- Estimation. Use the data to train the parameters of the model (i.e. the coefficients).
- Diagnostic Checking. Evaluate the fitted model in the context of the available data and check for areas where the model may be improved.
C. Identification
The identification step is further broken down into: Assess whether the time series is stationary, and if not, how many differences are required to make it stationary. Identify the parameters of an ARMA model for the data.
1. Differencing
Below are some tips during identification.
- Unit Root Tests. Use unit root statistical tests on the time series to determine whether or not it is stationary. Repeat after each round of differencing.
- Avoid over differencing. Differencing the time series more than is required can result in the addition of extra serial correlation and additional complexity.
2. Configuring AR and MA
Two diagnostic plots can be used to help choose the p and q parameters of the ARMA or ARIMA. They are:
- Autocorrelation Function (ACF). The plot summarizes the correlation of an observation with lag values. The x-axis shows the lag and the y-axis shows the correlation coefficient between -1 and 1 for negative and positive correlation.
- Partial Autocorrelation Function (PACF). The plot summarizes the correlations for an observation with lag values that is not accounted for by prior lagged observations.
Some useful patterns you may observe on these plots are:
- The model is AR if the ACF trails off after a lag and has a hard cut-off in the PACF after a lag. This lag is taken as the value for p.
- The model is MA if the PACF trails off after a lag and has a hard cut-off in the ACF after the lag. This lag value is taken as the value for q.
- The model is a mix of AR and MA if both the ACF and PACF trail off.
D. Estimation
Estimation involves using numerical methods to minimize a loss or error term.
E. Diagnostic Checking
The idea of diagnostic checking is to look for evidence that the model is not a good fit for the data. Two useful areas to investigate diagnostics are:
- Overfitting.
- Residual Errors.
1. Overfitting
The first check is to check whether the model overfits the data.
2. Residual Errors
Forecast residuals provide a great opportunity for diagnostics.
No comments:
Post a Comment