# Leveraging Advanced Forecasting Techniques with StatsForecast: A Case Study

- Von Torben Windler

#### Share post:

## INTRODUCTION

*StatsForecast*-package developed by Nixtla. It provides a set of robust forecasting models specifically designed to handle irregular data. Find out what the StatsForecast package can do and which evaluation methods should actually be used for irregular data in this blog post!

## The challenge

### Problem statement

Our client provided us with a dataset spanning several years, detailing past events across different customers and parts. These events were rare occurrences, making the dataset highly intermittent. Our goal was to predict when these events might happen in the future, grouped by customer and part, to enable better resource allocation and planning. Although the future is mathematically independent from the past, forecasts from this dataset could give us a decent estimation of what could happen.### Description of the data

**Intermittent nature**: Events occurred very rarely, leading to sparse data points.

**Multiple customers and parts**: The predictions needed to be made for different customers and parts, requiring a segmented approach.

### Example:

On the right, you can see an example of what the data looks like when it is processed and ready to forecast:

`ds` is a weekly timestamp (we want weekly forecasts), `unique_id` represents the group (concatenated from part and customer), and `y` denotes the number of events in that specific week.

```
| ds | unique_id | y |
| --- | --- | --- |
| 2023-07-24 | part_a, customer_a | 1 |
| 2022-02-28 | part_b, customer_b | 2 |
| 2024-04-22 | part_b, customer_c | 1 |
| 2024-03-18 | part_b, customer_d | 1 |
| 2024-03-25 | part_b, customer_d | 0 |
| … | … | … |
| 2017-04-24 | part_y, customer_a | 0 |
| 2017-05-01 | part_y, customer_a | 1 |
| 2017-05-08 | part_y, customer_a | 0 |
| 2017-05-15 | part_y, customer_a | 1 |
| 2021-04-26 | part_b, customer_z | 2 |
```

## Solution approach

To address this challenge, I researched several approaches before settling on the StatsForecast package. Below are some of the techniques I evaluated:

**1. Classical Decomposition**

This technique breaks down time series data into trend, seasonal, and residual components. While it was useful for understanding patterns, it did not handle intermittency well.

**2. Croston’s Optimized Model**

The Croston Optimized model is an advanced forecasting method for intermittent demand data, combining exponential smoothing to capture trends and seasonality with separate estimations for non-zero demand occurrences and sizes. This approach helps balance over- and under-forecasting and provides more accurate predictions for sporadic demand patterns.

**3. IMAPA (Intermittent Multiple Aggregation Prediction Algorithm)**

The Intermittent Multiple Aggregation Prediction Algorithm (IMAPA) is an algorithm that forecasts future values of intermittent time series by aggregating the time series values at regular intervals and then using any forecast model, such as optimized Simple Exponential Smoothing (SES), to predict these aggregated values. IMAPA is robust to missing data, computationally efficient, and easy to implement, making it effective for various intermittent time series forecasting tasks.

**4. ADIDA (Aggregate-Disaggregate Intermittent Demand Approach)**

The ADIDA model uses Simple Exponential Smoothing (SES) on temporally aggregated data to forecast intermittent demand, where demand is aggregated into buckets of mean inter-demand interval size. Forecasts are then disaggregated to the original time periods.

**5. TSB (Teunter-Syntetos-Babai)**

The TSB model is an advanced method used in inventory management and demand forecasting for products with intermittent demand, proposed as an extension to Croston’s model. It updates demand probability every period rather than only when a demand occurs, making it more suitable for managing risk of obsolescence in data with many zeros.

## THE STATSFORECAST PACKAGE

```
import polars as pl
from statsforecast import StatsForecast
from statsforecast.models import ADIDA, CrostonClassic, CrostonOptimized, CrostonSBA, IMAPA, TSB
# Create the models (can be extended with hyperparameter tuning)
models = [
ADIDA(),
CrostonClassic(),
CrostonOptimized(),
CrostonSBA(),
IMAPA(),
TSB(alpha_d=0.2, alpha_p=0.2)
]
# Store the names of the models to use them later in polars colum selection
model_names = [m.alias for m in models]
# We want weekly forecasts
sf = StatsForecast(models=models, freq='1w', n_jobs=-1, verbose=True)
sf.fit(train)
FORECAST_HORIZON = 52 # weeks
forecasts_df = sf.predict(h=FORECAST_HORIZON)
# join the forecasts with the target values ('y') from the test data for evaluation
forecasts_df = (
forecasts_df
.join(test, on=["unique_id", "ds"], how="left")
.fill_null(0)
)
```

This concise code snippet was the basis for me to quickly set up and run multiple models on the dataset. In fact, we tuned the hyperparameters of every model with proven methods.

## Handling Uniform Distributions

The forecasts produced by these models were originally discrete uniform distributions over the forecast horizon. To fulfill the requirement of having specific estimated times for predicted events, I have chosen the following approach:### Cumulative Probabilities Modulo 1

To solve this problem, I cumulated all probabilities over time and calculated the cumulative value modulo 1. This allowed me to compare the original probability values with these new values. If the original probability is greater than or equal to the modulo value, this indicates an event at that point in time.### Implementation steps

1. cumulate probabilities: Addition of the probabilities for each forecast period.### Code example

```
forecasts_df = (
forecasts_df
# Calculate the value with modulo 1, to obtain some kind of 'virtual' probability
.with_columns(pl.col(model_names).cum_sum().mod(1).over("unique_id").name.suffix("_cum"))
# If the modulo values are smaller than the original, this indicates
# an event with the demand from the original column rounded to the next integer.
# Otherwise we do not have an event and set the value to None.
.with_columns([
pl.when(pl.col(m) >= pl.col(f"{m}_cum"))
.then(pl.col(m).ceil())
.otherwise(None)
for m in model_names]
)
# Here we do not need the cumulated values anymore
.drop(pl.col("^*_cum$"))
)
```

### Validity of the Approach

This approach effectively converts a uniform distribution into discrete event predictions based on cumulative probabilities. By comparing each forecasted probability to its corresponding cumulative value modulo 1, we can identify specific points in time where events are likely to occur.

## Evaluation of the Performance

### Cumulative Forecast Error (CFE)

##### Implementation

```
def cfe(df: pl.DataFrame, model_names: list[str]) -> pl.DataFrame:
"""Calculate the Cumulative Forecast Error (CFE) for multiple models in a Polars DataFrame.
This function calculates the Cumulative Forecast Error (CFE) for each forecast model in a DataFrame. CFE is defined as the cumulative sum of the difference between actual values (`y`) and forecast values (`
````). The result includes the minimum, maximum, and last CFE value for each unique identifier.
The function takes a Polars DataFrame with the following structure:
- `unique_id`: A unique identifier for each data point
- `y`: The actual target value
- `model_names` (list of strings): Column names representing different model predictions
Parameters:
df (pl.DataFrame): Input DataFrame containing the data. It must include a column named `y` representing the actual values, and columns named `` for each model in the `model_names` list.
model_names (list[str]): A list of strings representing the names of the forecast models.
Returns:
pl.DataFrame: Output DataFrame with CFE results. It contains columns for `unique_id`, `model`, and statistics such as `cfe_min` (minimum CFE), `cfe_max` (maximum CFE), and `cfe_last` (last CFE) for each unique identifier and model.
"""
df = (
df
# Calculate Cumulative Forecast Error for each model per unique_id
.with_columns((pl.col(model_names) - pl.col("y")).cum_sum().over("unique_id"))
# Unpivot the DataFrame to have a single column for CFE values and corresponding model names
.unpivot(model_names, index="unique_id", variable_name="model", value_name="cfe")
# Group by unique_id and model, then aggregate to get min, max, and last CFE value
.group_by("unique_id", "model")
.agg(
pl.min("cfe").alias("cfe_min"),
pl.max("cfe").alias("cfe_max"),
pl.last("cfe").abs().alias("cfe_last")
)
)
return df
cfe_df = forecasts_df.pipe(cfe, model_names=model_names)

### Stock-keeping-oriented Prediction Error Costs(SPEC)

The *Stock-keeping-oriented Prediction Error Costs* (SPEC) measures the prediction accuracy by comparing actual events and forecast in the form of virtually incurred costs over the forecast horizon. If the forecasts predicts events before the target event occures, we create **stock keeping costs**. The other way round we get **opportunity costs**. The relationship between both errors are weighted with $\alpha \in [0, 1]$. Higher alphas weigh the opportunity costs more, while lower alphas give more weight to the stock-keeping costs. In this scenario, both of them are important, hence we set $\alpha = 0.5$.

In addition, we slightly changed the *implementation of the authors* to a custom metric, which performes much faster and which is providing us not only the sum of stock-keeping and opportunity costs, but also both individual costs:

##### Implementation

```
def spec(df: pl.DataFrame, model_names: list[str], alpha: float = 0.5):
"""Stock-keeping-oriented Prediction Error Costs (SPEC)
Read more in the :ref:`https://arxiv.org/abs/2004.10537`.
The function takes a Polars DataFrame with the following structure:
- `unique_id`: A unique identifier for each data point
- `y`: The actual target value
- `model_names` (list of strings): Column names representing different model predictions
Parameters:
df (pl.DataFrame): Input DataFrame containing the data. It must include a column named `y` representing the actual values, and columns named `
```` for each model in the `model_names` list.
model_names (list[str]): A list of strings representing the names of the forecast models.
alpha (float): Provides the weight of the opportunity costs. The weight of stock-keeping costs is taken as (1 - alpha), hence alpha should be in the interval [0, 1].
Returns
-------
pl.DataFrame: Output DataFrame with SPEC results. It contains columns for `unique_id`, `model`, and the SPEC error for each unique identifier and model. The SPEC error is also provided as opportunity costs, stock-keeping costs, and the sum of both.
"""
df = (
df
.sort("unique_id", "ds")
# Calculate cum sum for every prediction and the target
.with_columns(pl.col(["y"] + model_names).cum_sum().over("unique_id").name.suffix("_cum_sum"))
# Compute differences in both directions from the predictions cumsum and the target cumsum
.with_columns(
*[(-pl.col(f"{m}_cum_sum") + pl.col("y_cum_sum")).alias(f"{m}_diff_y_f") for m in model_names],
*[(pl.col(f"{m}_cum_sum") - pl.col("y_cum_sum")).alias(f"{m}_diff_f_y") for m in model_names]
)
# Multiply the first difference with alpha (opportunity costs)
# and the second difference with (1 - alpha) (stock-keeping costs)
.with_columns(
*[(pl.col(f"{m}_diff_y_f") * alpha).clip(lower_bound=0).alias(f"{m}_o") for m in model_names],
*[(pl.col(f"{m}_diff_f_y") * (1 - alpha)).clip(lower_bound=0).alias(f"{m}_s") for m in model_names]
)
# Add the opportunity and stock-keeping costs
.with_columns([(pl.sum_horizontal(f"{m}_o", f"{m}_s")).alias(f"{m}_total") for m in model_names])
# Group by unique_id and model, then aggregate to get average SPEC values
.group_by("unique_id")
.agg(*[pl.col(f"{m}_total", f"{m}_o", f"{m}_s").mean() for m in model_names])
# Unpivot the DataFrame to have separate columns for SPEC values and corresponding model names
.unpivot([x for xs in [[f"{m}_total", f"{m}_o", f"{m}_s"] for m in model_names] for x in xs], index="unique_id", variable_name="model", value_name="spec")
# extract the different spec value indicators (total, o, s) to a separate column
.with_columns(pl.col("model").str.split("_").list.to_struct(fields=["model", "error"]))
.unnest("model")
# pivot by the different splits (total, o, s) in the error column
.pivot("error", index=["unique_id", "model"], values="spec")
# rename accordingly
.rename({"total": "spec_total", "o": "spec_o", "s": "spec_s"})
)
return df
spec_df = forecasts_df.pipe(spec, model_names=model_names)

### CFE and SPEC together

**Trend analysis**: CFE helps identify trends in the error over time, allowing you to see if the model consistently underpredicts or overpredicts.**Magnitude and direction**: The minimum and maximum values of CFE can provide insights into the overall performance and direction of the errors.

**Time-based Deviation**: SPEC calculates prediction errors based on their effect on future inventory levels, penalizing errors that could lead to high holding costs or stockouts.**Weighted costs**: It encourages businesses to focus on minimizing both overstock and understock, leading to improved operational efficiency.

## CONCLUSION

## NOTE

As you can see, I use *Polars* instead of Pandas, Numpy, etc. for all computations. At m2hycon we have been using Polars since early 2023 where we saw the potential and impact over other libraries like Pandas, Dask, Ray and even Apache Spark. A blog post about how we made the switch and its impact on our productivity and results is coming soon!

#### Torben Windler

Lead AI Engineer