Menu bar

12/11/2021

Temporal Structure - Part 5 - Use and Remove Seasonality

Time series datasets can contain a seasonal component. This is a cycle that repeats over time, such as monthly or yearly. This repeating cycle may obscure the signal that we wish to model when forecasting, and in turn may provide a strong signal to our predictive models. In this tutorial, you will discover how to identify and correct for seasonality in time series data with Python.

After completing this tutorial, you will know:
  • The definition of seasonality in time series and the opportunity it provides for forecasting with machine learning methods.
  • How to use the difference method to create a seasonally adjusted time series of daily temperature data.
  • How to model the seasonal component directly and explicitly subtract it from observations.
A. Seasonality in Time Series

Time series data may contain seasonal variation. Seasonal variation, or seasonality, are cycles that repeat regularly over time.

1. Benefits to Machine Learning

Understanding the seasonal component in time series can improve the performance of modeling with machine learning. This can happen in two main ways:
  • Clearer Signal: Identifying and removing the seasonal component from the time series can result in a clearer relationship between input and output variables.
  • More Information: Additional information about the seasonal component of the time series can provide new information to improve model performance.
2. Types of Seasonality

There are many types of seasonality; for example:
  • Time of Day.
  • Daily.
  • Weekly.
  • Monthly.
  • Yearly
3. Removing Seasonality

Once seasonality is identified, it can be modeled. The model of seasonality can be removed from the time series. 

This process is called Seasonal Adjustment, or Deseasonalizing. 

A time series where the seasonal component has been removed is called seasonal stationary.


B. Minimum Daily Temperatures Dataset

This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.


C. Seasonal Adjustment with Differencing

A simple way to correct for a seasonal component is to use differencing

# deseasonalize a time series using differencing
from pandas import read_csv
from matplotlib import pyplot
series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
X = series.values
diff = list()
days_in_year = 365
for i in range(days_in_year, len(X)):
value = X[i] - X[i - days_in_year]
diff.append(value)
pyplot.plot(diff)
pyplot.show()

-----Result-----

Line plot of the deseasonalized Minimum Daily Temperatures dataset using
differencing



# calculate and plot monthly average
from pandas import read_csv
from matplotlib import pyplot
series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
resample = series.resample('M')
monthly_mean = resample.mean()
print(monthly_mean.head(13))
monthly_mean.plot()
pyplot.show()

-----Result-----

Date
1981-01-31 17.712903
1981-02-28 17.678571
1981-03-31 13.500000
1981-04-30 12.356667
1981-05-31 9.490323
1981-06-30 7.306667
1981-07-31 7.577419
1981-08-31 7.238710
1981-09-30 10.143333
1981-10-31 10.087097
1981-11-30 11.890000
1981-12-31 13.680645
1982-01-31 16.567742

Line plot of the monthly Minimum Daily Temperatures dataset


# deseasonalize monthly data by differencing
from pandas import read_csv
from matplotlib import pyplot
series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
resample = series.resample('M')
monthly_mean = resample.mean()
X = series.values
diff = list()
months_in_year = 12
for i in range(months_in_year, len(monthly_mean)):
value = monthly_mean[i] - monthly_mean[i - months_in_year]
diff.append(value)
pyplot.plot(diff)
pyplot.show()

-----Result-----

Line plot of the deseasonalized monthly Minimum Daily Temperatures dataset


Next, we can use the monthly average minimum temperatures from the same month in the previous year to adjust the daily minimum 
temperature dataset.

# deseasonalize a time series using month-based differencing
from pandas import read_csv
from matplotlib import pyplot
series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0,
parse_dates=True, squeeze=True)
X = series.values
diff = list()
days_in_year = 365
for i in range(days_in_year, len(X)):
month_str = str(series.index[i].year-1)+' -'+str(series.index[i].month)
month_mean_last_year = series[month_str].mean()
value = X[i] - month_mean_last_year
diff.append(value)
pyplot.plot(diff)
pyplot.show()

-----Result-----


Line plot of the deseasonalized Minimum Daily Temperatures dataset using monthly data


D. Seasonal Adjustment with Modeling

A dataset can be constructed with the time index of the sine wave as an input, or x-axis, and the observation as the output, or y-axis. 

For example:

Time Index, Observation
1, obs1
2, obs2
3, obs3
4, obs4
5, obs5

The NumPy library provides the polyfit() function that can be used to fit a polynomial of a chosen order to a dataset.

# model seasonality with a polynomial model
from pandas import read_csv
from matplotlib import pyplot
from numpy import polyfit
series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
# fit polynomial: x^2*b1 + x*b2 + ... + bn
X = [i%365 for i in range(0, len(series))]
y = series.values
degree = 4
coef = polyfit(X, y, degree)
print('Coefficients: %s' % coef)
# create curve
curve = list()
for i in range(len(X)):
value = coef[-1]
for d in range(degree):
value += X[i]**(degree-d) * coef[d]
curve.append(value)
# plot curve over original data
pyplot.plot(series.values)
pyplot.plot(curve, color='red', linewidth=3)
pyplot.show()

-----Result-----

Line plot of the Minimum Daily Temperatures dataset (blue) and a nonlinear
model of the seasonality (red)



We can now use this model to create a seasonally adjusted version of the dataset.

# deseasonalize by differencing with a polynomial model
from pandas import read_csv
from matplotlib import pyplot
from numpy import polyfit
series = read_csv('daily-minimum-temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)
# fit polynomial: x^2*b1 + x*b2 + ... + bn
X = [i%365 for i in range(0, len(series))]
y = series.values
degree = 4
coef = polyfit(X, y, degree)
# create curve
curve = list()
for i in range(len(X)):
value = coef[-1]
for d in range(degree):
value += X[i]**(degree-d) * coef[d]
curve.append(value)
# create seasonally adjusted
values = series.values
diff = list()
for i in range(len(values)):
value = values[i] - curve[i]
diff.append(value)
pyplot.plot(diff)
pyplot.show()

-----Result-----


Line plot of the deseasonalized Minimum Daily Temperatures dataset using a
nonlinear model




No comments:

Post a Comment