Menu bar

11/11/2021

Temporal Structure - Part 2 - A Gentle Introduction to the Random Walk

How do you know your time series problem is predictable? This is a difficult question with time series forecasting. There is a tool called a random walk that can help you understand the predictability of your time series forecast problem. In this tutorial, you will discover the random walk and its properties in Python. 

After completing this tutorial, you will know:
  • What the random walk is and how to create one from scratch in Python.
  • How to analyze the properties of a random walk and recognize when a time series is and is not a random walk.
  • How to make predictions for a random walk

A. Random Series

The Python standard library contains the random module that provides access to a suite of functions for generating random numbers. The randrange() function can be used to generate a random integer between 0 and an upper limit. We can use the randrange() function to generate a list of 1,000 random integers between 0 and 10. 

# create and plot a random series
from random import seed
from random import randrange
from matplotlib import pyplot
seed(1)
series = [randrange(10) for i in range(1000)]
pyplot.plot(series)
pyplot.show()

-----Result-----

Plot of a Random Series

This is not a random walk. It is just a sequence of random numbers also called white noise. A common mistake that beginners make is to think that a random walk is a list of random numbers, and this is not the case at all.


B. Random Walk

A random walk is different from a list of random numbers because the next value in the sequence is a modification of the previous value in the sequence.

It is this dependency that gives the process its name as a random walk or a drunkard’s walk. A simple model of a random walk is as follows:
  1. Start with a random number of either -1 or 1.
  2. Randomly select a -1 or 1 and add it to the observation from the previous time step
  3. Repeat step 2 for as long as you like.
We can describe this process as:
y(t) = B0 + B1 × X(t − 1) + e(t)

Where y(t) is the next value in the series. 
B0 is a coefficient that if set to a value other than zero adds a constant drift to the random walk. 
B1 is a coefficient to weight the previous time step and is set to 1.0. 
X(t-1) is the observation at the previous time step. 
e(t) is the white noise or random fluctuation at that time.
# create and plot a random walk
from random import seed
from random import random
from matplotlib import pyplot
seed(1)
random_walk = list()
random_walk.append(-1 if random() < 0.5 else 1)
for i in range(1, 1000):
     movement = -1 if random() < 0.5 else 1
     value = random_walk[i-1] + movement
     random_walk.append(value)
pyplot.plot(random_walk)
pyplot.show()


Plot of a Random Walk


We can see that it looks very different from our above sequence of random numbers.


C. Random Walk and Autocorrelation

We can calculate the correlation between each observation and the observations at previous time steps. A plot of these correlations is called an autocorrelation plot or a correlogram.

We would expect a strong autocorrelation with the previous observation and a linear fall off from there with previous lag values.

We can use the autocorrelation plot() function in Pandas to plot the correlogram for the random walk.

# plot the autocorrelation of a random walk
from random import seed
from random import random
from matplotlib import pyplot
from pandas.plotting import autocorrelation_plot
seed(1)
random_walk = list()
random_walk.append(-1 if random() < 0.5 else 1)
for i in range(1, 1000):
movement = -1 if random() < 0.5 else 1
value = random_walk[i-1] + movement
random_walk.append(value)
autocorrelation_plot(random_walk)
pyplot.show()

-----Result-----

Plot of a Random Walk Correlogram


Running the example, we generally see the expected trend, in this case across the first few hundred lag observations.


D. Random Walk and Stationarity

A stationary time series is one where the values are not a function of time. 

Given the way that the random walk is constructed and the results of reviewing the autocorrelation, we know that the observations in a random walk are dependent on time.

The current observation is a random step from the previous observation.

Therefore we can expect a random walk to be non-stationary.

In fact, all random walk processes are non-stationary.

Additionally, a non-stationary time series does not have a consistent mean and/or variance over time.

A review of the random walk line plot might suggest this to be the case. 

We can confirm this using a statistical significance test, specifically the Augmented Dickey-Fuller test.

We can perform this test using the adfuller() function in the Statsmodels library.

# calculate the stationarity of a random walk
from random import seed
from random import random
from statsmodels.tsa.stattools import adfuller
# generate random walk
seed(1)
random_walk = list()
random_walk.append(-1 if random() < 0.5 else 1)
for i in range(1, 1000):
movement = -1 if random() < 0.5 else 1
value = random_walk[i-1] + movement
random_walk.append(value)
# statistical test
result = adfuller(random_walk)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))

-----Result-----

ADF Statistic: 0.341605
p-value: 0.979175
Critical Values:
5%: -2.864
1%: -3.437
10%: -2.568


The null hypothesis of the test is that the time series is non-stationary. Running the example, we can see that the test statistic value was 0.341605. This is larger than all of the critical values at the 1%, 5%, and 10% confidence levels.

Therefore, we can say that the time series does appear to be non-stationary with a low likelihood of the result being a statistical fluke.


E. Predicting a Random Walk

A random walk is unpredictable; it cannot reasonably be predicted


F. Is Your Time Series a Random Walk?

Your time series may be a random walk

No comments:

Post a Comment