Menu bar

13/11/2021

Evaluate Models - Part 1 - Backtest Forecast Models

The goal of time series forecasting is to make accurate predictions about the future. The fast and powerful methods that we rely on in machine learning, such as using train-test splits and k-fold cross-validation, do not work in the case of time series data. This is because they ignore the temporal components inherent in the problem. 

In this tutorial, you will discover how to evaluate machine learning models on time series data with Python. In the field of time series forecasting, this is called backtesting or hindcasting. 

After completing this tutorial, you will know:
  • The limitations of traditional methods of model evaluation from machine learning and why evaluating models on out-of-sample data is required.
  • How to create train-test splits and multiple train-test splits of time series data for model evaluation in Python.
  • How walk-forward validation provides the most realistic evaluation of machine learning models on time series data.
A. Model Evaluation

In applied machine learning, we often split our data into a train and a test set: the training set used to prepare the model and the test set used to evaluate it. We may even use k-fold cross-validation that repeats this process by systematically splitting the data into k groups, each given a chance to be a held out model.

These methods cannot be directly used with time series data. This is because they assume that there is no relationship between the observations, that each observation is independent.

This is not true of time series data, where the time dimension of observations means that we cannot randomly split them into groups. Instead, we must split data up and respect the temporal order in which values were observed.

In time series forecasting, this evaluation of models on historical data is called backtesting. In some time series domains, such as meteorology, this is called hindcasting, as opposed to forecasting. We will look at three different methods that you can use to backtest your machine
learning models on time series problems. They are:
  • Train-Test split that respect temporal order of observations.
  • Multiple Train-Test splits that respect temporal order of observations.
  • Walk-Forward Validation where a model may be updated each time step new data is received.
First, let’s take a look at a small, univariate time series data we will use as context to understand these three backtesting methods: the Sunspot dataset.


B. Monthly Sunspots Dataset

We will use the Monthly Sunspots dataset as an example. This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).


C. Train-Test Split

You can split your dataset into training and testing subsets. Your model can be prepared on the training dataset and predictions can be made and evaluated for the test dataset. This can be done by selecting an arbitrary split point in the ordered list of observations and creating two new datasets. Depending on the amount of data you have available and the amount of data required, you can use splits of 50-50, 70-30 and 90-10.

# calculate a train-test split of a time series dataset
from pandas import read_csv
series = read_csv('sunspots.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
X = series.values
train_size = int(len(X) * 0.66)
train, test = X[0:train_size], X[train_size:len(X)]
print('Observations: %d' % (len(X)))
print('Training Observations: %d' % (len(train)))
print('Testing Observations: %d' % (len(test)))
pyplot.plot(train)
pyplot.plot([None for i in train] + [x for x in test])
pyplot.show()

-----Result-----

Observations: 2820
Training Observations: 1861
Testing Observations: 959


Line plot train (blue) and test (green) split of the Monthly Sunspot dataset


D. Multiple Train-Test Splits

We can repeat the process of splitting the time series into train and test sets multiple times. This will require multiple models to be trained and evaluated, but this additional computational expense will provide a more robust estimate of the expected performance of the chosen method and configuration on unseen data. We could do this manually by repeating the process described in the previous section with different split points.

Alternately, the scikit-learn library provides this capability for us in the TimeSeriesSplit object. You must specify the number of splits to create and the TimeSeriesSplit to return the indexes of the train and test observations for each requested split. The total number of training and test observations are calculated each split iteration (i) as follows:

training size = i×n_samples/(n_splits + 1)+n_samples mod (n_splits+1)
test size = n_samples/n_splits + 1

Assume we have 100 observations and we want to create 2 splits. For the first split, the train size would be calculated as 

train = 1 × 100/(2 + 1) + 100 mod (2 + 1) = 33.3 = 33

The test set size for the first split would be calculated as:
test = 100/(2 + 1) = 33.3= 33

Or the first 33 records are used for training and the next 33 records are used for testing.
The size of the train set on the second split is calculated as follows:
train = 2 × 100/(2 + 1) + 100 mod (2 + 1) = 66.6 = 66

The test set size on the second split is calculated as follows:
test = 100/(2 + 1) = 33.3 = 33

Or, the first 67 records are used for training and the remaining 33 records are used for testing. You can see that the test size stays consistent. This means that performance statistics calculated on the predictions of each trained model will be consistent and can be combined and compared.

Let’s look at how we can apply the TimeSeriesSplit on our sunspot data. The dataset has 2,820 observations. Let’s create 3 splits for the dataset. Using the same arithmetic above, we would expect the following train and test splits to be created:
  • Split 1: 705 train, 705 test
  • Split 2: 1,410 train, 705 test
  • Split 3: 2,115 train, 705 test
# calculate repeated train-test splits of time series data
from pandas import read_csv
from sklearn.model_selection import TimeSeriesSplit
from matplotlib import pyplot
series = read_csv('sunspots.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
X = series.values
splits = TimeSeriesSplit(n_splits=3)
pyplot.figure(1)
index = 1
for train_index, test_index in splits.split(X):
train = X[train_index]
test = X[test_index]
print('Observations: %d' % (len(train) + len(test)))
print('Training Observations: %d' % (len(train)))
print('Testing Observations: %d' % (len(test)))
pyplot.subplot(310 + index)
pyplot.plot(train)
pyplot.plot([None for i in train] + [x for x in test])
index += 1
pyplot.show()

-----Result-----

Observations: 1410
Training Observations: 705
Testing Observations: 705

Observations: 2115
Training Observations: 1410
Testing Observations: 705

Observations: 2820
Training Observations: 2115
Testing Observations: 705

Line plots of repeated train (blue) and test (green) splits of the Monthly Sunspot
dataset.


Using multiple train-test splits will result in more models being trained, and in turn, a more accurate estimate of the performance of the models on unseen data.


E. Walk Forward Validation

# walk forward evaluation model for time series data
from pandas import read_csv
series = read_csv('sunspots.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
X = series.values
n_train = 500
n_records = len(X)
for i in range(n_train, n_records):
train, test = X[0:i], X[i:i+1]
print('train=%d, test=%d' % (len(train), len(test)))

-----Result-----

train=500, test=1
train=501, test=1
train=502, test=1
...
train=2815, test=1
train=2816, test=1
train=2817, test=1
train=2818, test=1
train=2819, test=1


No comments:

Post a Comment