Machine learning: Project 1 - Monthly Sales of French Champagne

We will work through a time series forecasting project from end-to-end, from downloading the dataset and defining the problem to training a final model and making predictions. This project is not exhaustive, but shows how you can get good results quickly by working through a time series forecasting problem systematically.

The steps of this project that we will through are as follows.

Problem Description.
Test Harness.
Persistence.
Data Analysis.
ARIMA Models.
Model Validation.

A. Problem Description

The problem is to predict the number of monthly sales of champagne for the Perrin Freres label. The dataset provides the number of monthly sales of champagne from January 1964 to September 1972, or just under 10 years of data. The values are a count of millions of sales and there are 105 observations.

"Month", "Sales"

"1964-01",2815

"1964-02",2672

"1964-03",2755

"1964-04",2721

"1964-05",2946

B. Test Harness

We must develop a test harness to investigate the data and evaluate candidate models. This involves two steps:

Defining a Validation Dataset.
Developing a Method for Model Evaluation

C. Validation Dataset

This final year of data will be used to validate the final model. The code below will load the dataset as a Pandas Series and split into two, one for model development (dataset.csv) and the other for validation (validation.csv).

# separate out a validation dataset

from pandas import read_csv

series = read_csv('champagne.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

split_point = len(series) - 12

dataset, validation = series[0:split_point], series[split_point:]

print('Dataset %d, Validation %d' % (len(dataset), len(validation)))

dataset.to_csv('dataset.csv', header=False)

validation.to_csv('validation.csv', header=False)

-----Result-----

Dataset 93, Validation 12

dataset.csv: Observations from January 1964 to September 1971 (93 observations).
validation.csv: Observations from October 1971 to September 1972 (12 observations).

D. Model Evaluation

The RMSE performance measure and walk-forward validation will be used for model evaluation.

E. Persistence

The first step before getting bogged down in data analysis and modeling is to establish a baseline of performance.

This will provide both a template for evaluating models using the proposed test harness and a performance measure by which all more elaborate predictive models can be compared.

The baseline prediction for time series forecasting is called the naive forecast, or persistence.

# evaluate persistence model on time series

from pandas import read_csv

from sklearn.metrics import mean_squared_error

from math import sqrt

# load data

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

# prepare data

X = series.values

X = X.astype('float32')

train_size = int(len(X) * 0.50)

train, test = X[0:train_size], X[train_size:]

# walk-forward validation

history = [x for x in train]

predictions = list()

for i in range(len(test)):

# predict

yhat = history[-1]

predictions.append(yhat)

# observation

obs = test[i]

history.append(obs)

print('>Predicted=%.3f, Expected=%3.f' % (yhat, obs))

# report performance

rmse = sqrt(mean_squared_error(test, predictions))

print('RMSE: %.3f' % rmse)

-----Result-----

...

>Predicted=4676.000, Expected=5010

>Predicted=5010.000, Expected=4874

>Predicted=4874.000, Expected=4633

>Predicted=4633.000, Expected=1659

>Predicted=1659.000, Expected=5951

RMSE: 3186.501

F. Data Analysis

We can use summary statistics and plots of the data to quickly learn more about the structure of the prediction problem.

Summary Statistics.
Line Plot.
Seasonal Line Plots
Density Plots.
Box and Whisker Plot

1. Summary Statistics

Summary statistics provide a quick look at the limits of observed values. It can help to get a quick idea of what we are working with.

# summary statistics of time series

from pandas import read_csv

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

print(series.describe())

-----Result-----

count 93.000000

mean 4641.118280

std 2486.403841

min 1573.000000

25% 3036.000000

50% 4016.000000

75% 5048.000000

max 13916.000000

2. Line Plot

A line plot of a time series can provide a lot of insight into the problem.

# line plot of time series

from pandas import read_csv

from matplotlib import pyplot

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

series.plot()

pyplot.show()

-----Result-----

Line plot of the training set for the Champagne Sales dataset

Run the example and review the plot.

There may be an increasing trend of sales over time.
There appears to be systematic seasonality to the sales for each year.
The seasonal signal appears to be growing over time, suggesting a multiplicative relationship (increasing change).
There do not appear to be any obvious outliers.
The seasonality suggests that the series is almost certainly non-stationary.

There may be benefit in explicitly modeling the seasonal component and removing it. The increasing trend or growth in the seasonal component may suggest the use of a log or other power transform.

3. Seasonal Line Plots

We can confirm the assumption that the seasonality is a yearly cycle by eyeballing line plots of the dataset by year. The example below takes the 7 full years of data as separate groups and creates one line plot for each.

# multiple line plots of time series

from pandas import read_csv

from pandas import DataFrame

from pandas import Grouper

from matplotlib import pyplot

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

groups = series['1964':'1970'].groupby(Grouper(freq='A'))

years = DataFrame()

pyplot.figure()

i = 1

n_groups = len(groups)

for name, group in groups:

pyplot.subplot((n_groups*100) + 10 + i)

i += 1

pyplot.plot(group)

pyplot.show()

-----Result-----

Multiple yearly line plots of the training set for the Champagne Sales dataset

4. Density Plot

The example below creates a histogram and density plot of the observations without any temporal structure.

# density plots of time series

from pandas import read_csv

from matplotlib import pyplot

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

pyplot.figure(1)

pyplot.subplot(211)

series.hist()

pyplot.subplot(212)

series.plot(kind='kde')

pyplot.show()

-----Result-----

Density plots of the training set for the Champagne Sales dataset

Run the example and review the plots. Some observations from the plots include:

The distribution is not Gaussian.
The shape has a long right tail and may suggest an exponential distribution

5. Box and Whisker Plots

We can group the monthly data by year and get an idea of the spread of observations for each year and how this may be changing.

# boxplots of time series

from pandas import read_csv

from pandas import DataFrame

from pandas import Grouper

from matplotlib import pyplot

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

groups = series['1964':'1970'].groupby(Grouper(freq='A'))

years = DataFrame()

for name, group in groups:

years[name.year] = group.values

years.boxplot()

pyplot.show()

-----Result-----

Box and whisker plots of the training set for the Champagne Sales dataset

Running the example creates 7 box and whisker plots side-by-side, one for each of the 7 years of selected data.

The median values for each year (red line) may show an increasing trend.
The spread or middle 50% of the data (blue boxes) does appear reasonably stable.
There are outliers each year (black crosses); these may be the tops or bottoms of the seasonal cycle.
The last year, 1970, does look different from the trend in prior years.

G. ARIMA Models

We will develop Autoregressive Integrated Moving Average, or ARIMA, models for the problem. We will approach modeling by both manual and automatic configuration of the ARIMA model.

Manually Configure the ARIMA.
Automatically Configure the ARIMA.
Review Residual Errors.

1. Manually Configured ARIMA

The ARIMA(p,d,q) model requires three parameters and is traditionally configured manually. Analysis of the time series data assumes that we are working with a stationary time series. The time series is almost certainly non-stationary. We can make it stationary this by first differencing the series and using a statistical test to confirm that the result is stationary.

The seasonality in the series is seemingly year-to-year. Seasonal data can be differenced by subtracting the observation from the same time in the previous cycle, in this case the same month in the previous year. This does mean that we will lose the first year of observations as there is no prior year to difference with. The example below creates a deseasonalized version of the series and saves it to file stationary.csv.

# create and summarize stationary version of time series

from pandas import read_csv

from pandas import Series

from statsmodels.tsa.stattools import adfuller

from matplotlib import pyplot

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] - dataset[i - interval]

diff.append(value)

return Series(diff)

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

X = series.values

X = X.astype('float32')

# difference data

months_in_year = 12

stationary = difference(X, months_in_year)

stationary.index = series.index[months_in_year:]

# check if stationary

result = adfuller(stationary)

print('ADF Statistic: %f' % result[0])

print('p-value: %f' % result[1])

print('Critical Values:')

for key, value in result[4].items():

print('\t%s: %.3f' % (key, value))

# save

stationary.to_csv('stationary.csv', header=False)

# plot

stationary.plot()

pyplot.show()

-----Result-----

ADF Statistic: -7.134898

p-value: 0.000000

Critical Values:

5%: -2.898

1%: -3.515

10%: -2.586

Running the example outputs the result of a statistical significance test of whether the differenced series is stationary. Specifically, the augmented Dickey-Fuller test. The results show that the test statistic value -7.134898 is smaller than the critical value at 1% of -3.515. This suggests that we can reject the null hypothesis with a significance level of less than 1% (i.e. a low probability that the result is a statistical fluke). Rejecting the null hypothesis means that the process has no unit root, and in turn that the time series is stationary or does not have time-dependent structure.

For reference, the seasonal difference operation can be inverted by adding the observation for the same month the year before. This is needed in the case that predictions are made by a model fit on seasonally differenced data. The function to invert the seasonal difference operation is listed below for completeness.

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

A plot of the differenced dataset is also created. The plot does not show any obvious seasonality or trend, suggesting the seasonally differenced dataset is a good starting point for modeling. We will use this dataset as an input to the ARIMA model. It also suggests that no further differencing may be required, and that the d parameter may be set to 0.

Line plot of the differenced Champagne Sales dataset

The next first step is to select the lag values for the Autoregression (AR) and Moving Average (MA) parameters, p and q respectively. We can do this by reviewing Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots. Note, we are now using the seasonally differenced stationary.csv as our dataset. This is because the manual seasonal differencing performed is different from the lag=1 differencing performed by the ARIMA model with the d parameter. The example below creates ACF and PACF plots for the series.

# ACF and PACF plots of time series

from pandas import read_csv

from statsmodels.graphics.tsaplots import plot_acf

from statsmodels.graphics.tsaplots import plot_pacf

from matplotlib import pyplot

series = read_csv('stationary.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

pyplot.figure()

pyplot.subplot(211)

plot_acf(series, lags=25, ax=pyplot.gca())

pyplot.subplot(212)

plot_pacf(series, lags=25, ax=pyplot.gca())

pyplot.show()

-----Result-----

ACF and PACF plots of the differenced Champagne Sales dataset

Run the example and review the plots for insights into how to set the p and q variables for the ARIMA model. Below are some observations from the plots.

The ACF shows a significant lag for 1 month
The PACF shows a significant lag for 1 month, with perhaps some significant lag at 12 and 13 months.
Both the ACF and PACF show a drop-off at the same point, perhaps suggesting a mix of AR and MA.

A good starting point for the p and q values is also 1. The PACF plot also suggests that there is still some seasonality present in the differenced data. We may consider a better model of seasonality, such as modeling it directly and explicitly removing it from the model rather than seasonal differencing.

This quick analysis suggests an ARIMA(1,0,1) on the stationary data may be a good starting point.

Further experimentation showed that adding one level of differencing to the stationary data made the model more stable. The model can be extended to ARIMA(1,1,1).

The example below demonstrates the performance of this ARIMA model on the test harness.

# evaluate manually configured ARIMA model

from pandas import read_csv

from sklearn.metrics import mean_squared_error

from statsmodels.tsa.arima_model import ARIMA

from math import sqrt

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] - dataset[i - interval]

diff.append(value)

return diff

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

# load data

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

# prepare data

X = series.values

X = X.astype('float32')

train_size = int(len(X) * 0.50)

train, test = X[0:train_size], X[train_size:]

# walk-forward validation

history = [x for x in train]

predictions = list()

for i in range(len(test)):

# difference data

months_in_year = 12

diff = difference(history, months_in_year)

# predict

model = ARIMA(diff, order=(1,1,1))

model_fit = model.fit(trend='nc', disp=0)

yhat = model_fit.forecast()[0]

yhat = inverse_difference(history, yhat, months_in_year)

predictions.append(yhat)

# observation

obs = test[i]

history.append(obs)

print('>Predicted=%.3f, Expected=%.3f' % (yhat, obs))

# report performance

rmse = sqrt(mean_squared_error(test, predictions))

print('RMSE: %.3f' % rmse)

-----Result-----

...

>Predicted=3157.018, Expected=5010

>Predicted=4615.082, Expected=4874

>Predicted=4624.998, Expected=4633

>Predicted=2044.097, Expected=1659

>Predicted=5404.428, Expected=5951

RMSE: 956.942

2. Grid Search ARIMA Hyperparameters

The ACF and PACF plots suggest that an ARIMA(1,0,1) or similar may be the best that we can do. To confirm this analysis, we can grid search a suite of ARIMA hyperparameters and check that no models result in better out-of-sample RMSE performance.

In this section, we will search values of p, d, and q for combinations (skipping those that fail to converge), and find the combination that results in the best performance on the test set.

# grid search ARIMA parameters for time series

import warnings

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

from sklearn.metrics import mean_squared_error

from math import sqrt

import numpy

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] - dataset[i - interval]

diff.append(value)

return numpy.array(diff)

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

# evaluate an ARIMA model for a given order (p,d,q) and return RMSE

def evaluate_arima_model(X, arima_order):

# prepare training dataset

X = X.astype('float32')

train_size = int(len(X) * 0.50)

train, test = X[0:train_size], X[train_size:]

history = [x for x in train]

# make predictions

predictions = list()

for t in range(len(test)):

# difference data

months_in_year = 12

diff = difference(history, months_in_year)

model = ARIMA(diff, order=arima_order)

model_fit = model.fit(trend='nc', disp=0)

yhat = model_fit.forecast()[0]

yhat = inverse_difference(history, yhat, months_in_year)

predictions.append(yhat)

history.append(test[t])

# calculate out of sample error

rmse = sqrt(mean_squared_error(test, predictions))

return rmse

# evaluate combinations of p, d and q values for an ARIMA model

def evaluate_models(dataset, p_values, d_values, q_values):

dataset = dataset.astype('float32')

best_score, best_cfg = float("inf"), None

for p in p_values:

for d in d_values:

for q in q_values:

order = (p,d,q)

try:

rmse = evaluate_arima_model(dataset, order)

if rmse < best_score:

best_score, best_cfg = rmse, order

print('ARIMA%s RMSE=%.3f' % (order,rmse))

except:

continue

print('Best ARIMA%s RMSE=%.3f' % (best_cfg, best_score))

# load dataset

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

# evaluate parameters

p_values = range(0, 7)

d_values = range(0, 3)

q_values = range(0, 7)

warnings.filterwarnings("ignore")

evaluate_models(series.values, p_values, d_values, q_values)

-----Result-----

...

ARIMA(5, 1, 2) RMSE=1003.200

ARIMA(5, 2, 1) RMSE=1053.728

ARIMA(6, 0, 0) RMSE=996.466

ARIMA(6, 1, 0) RMSE=1018.211

ARIMA(6, 1, 1) RMSE=1023.762

Best ARIMA(0, 0, 1) RMSE=939.464

We will select this ARIMA(0,0,1) model going forward.

3. Review Residual Errors

A good final check of a model is to review residual forecast errors.Ideally, the distribution of residual errors should be a Gaussian with a zero mean. We can check this by using summary statistics and plots to investigate the residual errors from the ARIMA(0,0,1) model.

# summarize ARIMA forecast residuals

from pandas import read_csv

from pandas import DataFrame

from statsmodels.tsa.arima_model import ARIMA

from matplotlib import pyplot

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] - dataset[i - interval]

diff.append(value)

return diff

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

# load data

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

# prepare data

X = series.values

X = X.astype('float32')

train_size = int(len(X) * 0.50)

train, test = X[0:train_size], X[train_size:]

# walk-forward validation

history = [x for x in train]

predictions = list()

for i in range(len(test)):

# difference data

months_in_year = 12

diff = difference(history, months_in_year)

# predict

model = ARIMA(diff, order=(0,0,1))

model_fit = model.fit(trend='nc', disp=0)

yhat = model_fit.forecast()[0]

yhat = inverse_difference(history, yhat, months_in_year)

predictions.append(yhat)

# observation

obs = test[i]

history.append(obs)

# errors

residuals = [test[i]-predictions[i] for i in range(len(test))]

residuals = DataFrame(residuals)

print(residuals.describe())

# plot

pyplot.figure()

pyplot.subplot(211)

residuals.hist(ax=pyplot.gca())

pyplot.subplot(212)

residuals.plot(kind='kde', ax=pyplot.gca())

pyplot.show()

-----Result-----

count 47.000000

mean 165.904728

std 934.696199

min -2164.247449

25% -289.651596

50% 191.759548

75% 732.992187

max 2367.304748

The distribution of residual errors is also plotted. The graphs suggest a Gaussian-like distribution with a bumpy left tail, providing further evidence that perhaps a power transform might be worth exploring.

Density plots of residual errors on the Champagne Sales dataset

We could use this information to bias-correct predictions by adding the mean residual error of 165.904728 to each forecast made.

bias = 165.904728

yhat = bias + inverse_difference(history, yhat, months_in_year)

The performance of the predictions is improved very slightly from 939.464 to 924.699, which may or may not be significant.

Finally, density plots of the residual error do show a small shift towards zero. It is debatable whether this bias correction is worth it, but we will use it for now.

Density plots of residual errors of a bias corrected model on the Champagne Sales
dataset

H. Model Validation

After models have been developed and a final model selected, it must be validated and finalized. Validation is an optional part of the process, but one that provides a last check to ensure we have not fooled or misled ourselves. This section includes the following steps:

Finalize Model: Train and save the final model.
Make Prediction: Load the finalized model and make a prediction.
Validate Model: Load and validate the final model.

1. Finalize Model

Finalizing the model involves fitting an ARIMA model on the entire dataset, in this case on a transformed version of the entire dataset. Once fit, the model can be saved to file for later use. The example below trains an ARIMA(0,0,1) model on the dataset and saves the whole fit object and the bias to file.

# save finalized model

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

import numpy

# monkey patch around bug in ARIMA class

def __getnewargs__(self):

return ((self.endog),(self.k_lags, self.k_diff, self.k_ma))

ARIMA.__getnewargs__ = __getnewargs__

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] - dataset[i - interval]

diff.append(value)

return diff

# load data

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

# prepare data

X = series.values

X = X.astype('float32')

# difference data

months_in_year = 12

diff = difference(X, months_in_year)

# fit model

model = ARIMA(diff, order=(0,0,1))

model_fit = model.fit(trend='nc', disp=0)

# bias constant, could be calculated from in-sample mean residual

bias = 165.904728

# save model

model_fit.save('model.pkl')

numpy.save('model_bias.npy', [bias])

-----Result-----

Running the example creates two local files:

model.pkl This is the ARIMAResult object from the call to ARIMA.fit(). This includes the coefficients and all other internal data returned when fitting the model.
model_bias.npy This is the bias value stored as a one-row, one-column NumPy array

2. Make Prediction

A natural case may be to load the model and make a single forecast. This is relatively straightforward and involves restoring the saved model and the bias and calling the forecast() function. To invert the seasonal differencing, the historical data must also be loaded. The example below loads the model, makes a prediction for the next time step, and prints the prediction.

# load finalized model and make a prediction

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMAResults

import numpy

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

series = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

months_in_year = 12

model_fit = ARIMAResults.load('model.pkl')

bias = numpy.load('model_bias.npy')

yhat = float(model_fit.forecast()[0])

yhat = bias + inverse_difference(series.values, yhat, months_in_year)

print('Predicted: %.3f' % yhat)

-----Result-----

Predicted: 6794.773

3. Validate Model

In the test harness section, we saved the final 12 months of the original dataset in a separate file to validate the final model. We can load this validation.csv file now and use it see how well our model really is on unseen data.

There are two ways we might proceed:

Load the model and use it to forecast the next 12 months. The forecast beyond the first one or two months will quickly start to degrade in skill.
Load the model and use it in a rolling-forecast manner, updating the transform and model for each time step. This is the preferred method as it is how one would use this model in practice as it would achieve the best performance.

As with model evaluation in previous sections, we will make predictions in a rolling-forecast manner. This means that we will step over lead times in the validation dataset and take the observations as an update to the history.

# load and evaluate the finalized model on the validation dataset

from pandas import read_csv

from matplotlib import pyplot

from statsmodels.tsa.arima_model import ARIMA

from statsmodels.tsa.arima_model import ARIMAResults

from sklearn.metrics import mean_squared_error

from math import sqrt

import numpy

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] - dataset[i - interval]

diff.append(value)

return diff

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

# load and prepare datasets

dataset = read_csv('dataset.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

X = dataset.values.astype('float32')

history = [x for x in X]

months_in_year = 12

validation = read_csv('validation.csv', header=None, index_col=0, parse_dates=True, squeeze=True)

y = validation.values.astype('float32')

# load model

model_fit = ARIMAResults.load('model.pkl')

bias = numpy.load('model_bias.npy')

# make first prediction

predictions = list()

yhat = float(model_fit.forecast()[0])

yhat = bias + inverse_difference(history, yhat, months_in_year)

predictions.append(yhat)

history.append(y[0])

print('>Predicted=%.3f, Expected=%.3f' % (yhat, y[0]))

# rolling forecasts

for i in range(1, len(y)):

# difference data

months_in_year = 12

diff = difference(history, months_in_year)

# predict

model = ARIMA(diff, order=(0,0,1))

model_fit = model.fit(trend='nc', disp=0)

yhat = model_fit.forecast()[0]

yhat = bias + inverse_difference(history, yhat, months_in_year)

predictions.append(yhat)

# observation

obs = y[i]

history.append(obs)

print('>Predicted=%.3f, Expected=%.3f' % (yhat, obs))

# report performance

rmse = sqrt(mean_squared_error(y, predictions))

print('RMSE: %.3f' % rmse)

pyplot.plot(y)

pyplot.plot(predictions, color='red')

pyplot.show()

-----Result-----

>Predicted=6794.773, Expected=6981

>Predicted=10101.763, Expected=9851

>Predicted=13219.067, Expected=12670

>Predicted=3996.535, Expected=4348

>Predicted=3465.934, Expected=3564

>Predicted=4522.683, Expected=4577

>Predicted=4901.336, Expected=4788

>Predicted=5190.094, Expected=4618

>Predicted=4930.190, Expected=5312

>Predicted=4944.785, Expected=4298

>Predicted=1699.409, Expected=1413

>Predicted=6085.324, Expected=5877

RMSE: 361.110

Line plot of the expected values (blue) and predictions (red) for the validation
dataset

Machine learning

Menu bar

18/11/2021

Project 1 - Monthly Sales of French Champagne

No comments:

Post a Comment