Vorhersagetechniken

Lever­aging Advanced Forecas­ting Techni­ques with Stats­Fo­re­cast: A Case Study

Share post:

INTRO­DUC­TION

We recently solved an interes­ting problem for a customer: Rare events (the failure of certain compon­ents) were to be predicted on the basis of irregular histo­rical data. The predic­tion was to be grouped by customer and compo­nent. When I was working on the task, I used the Stats­Fo­re­cast-package developed by Nixtla. It provides a set of robust forecas­ting models speci­fi­cally designed to handle irregular data. Find out what the Stats­Fo­re­cast package can do and which evalua­tion methods should actually be used for irregular data in this blog post!

The chall­enge

Problem state­ment

Our client provided us with a dataset spanning several years, detailing past events across diffe­rent custo­mers and parts. These events were rare occur­rences, making the dataset highly inter­mit­tent. Our goal was to predict when these events might happen in the future, grouped by customer and part, to enable better resource alloca­tion and planning. Although the future is mathe­ma­ti­cally indepen­dent from the past, forecasts from this dataset could give us a decent estima­tion of what could happen.

Descrip­tion of the data

Inter­mit­tent nature: Events occurred very rarely, leading to sparse data points. Multiple custo­mers and parts: The predic­tions needed to be made for diffe­rent custo­mers and parts, requi­ring a segmented approach.

Example:

On the right, you can see an example of what the data looks like when it is processed and ready to forecast:

`ds` is a weekly timestamp (we want weekly forecasts), `unique_id` repres­ents the group (conca­te­n­ated from part and customer), and `y` denotes the number of events in that specific week.

| ds         | unique_id           | y   |
| ---        | ---                 | --- |
| 2023-07-24 | part_a, customer_a  | 1   |
| 2022-02-28 | part_b, customer_b  | 2   |
| 2024-04-22 | part_b, customer_c  | 1   |
| 2024-03-18 | part_b, customer_d  | 1   |
| 2024-03-25 | part_b, customer_d  | 0   |
| …          | …                   | …   |
| 2017-04-24 | part_y, customer_a  | 0   |
| 2017-05-01 | part_y, customer_a  | 1   |
| 2017-05-08 | part_y, customer_a  | 0   |
| 2017-05-15 | part_y, customer_a  | 1   |
| 2021-04-26 | part_b, customer_z  | 2   |

Solution approach

To address this chall­enge, I resear­ched several approa­ches before settling on the Stats­Fo­re­cast package. Below are some of the techni­ques I evaluated:

1. Classical Decom­po­si­tion
This technique breaks down time series data into trend, seasonal, and residual compon­ents. While it was useful for under­stan­ding patterns, it did not handle inter­mit­tency well.
 
2. Croston’s Optimized Model
The Croston Optimized model is an advanced forecas­ting method for inter­mit­tent demand data, combi­ning exponen­tial smoot­hing to capture trends and seaso­na­lity with separate estima­tions for non-zero demand occur­rences and sizes. This approach helps balance over- and under-forecas­ting and provides more accurate predic­tions for sporadic demand patterns.
 
3. IMAPA (Inter­mit­tent Multiple Aggre­ga­tion Predic­tion Algorithm)
The Inter­mit­tent Multiple Aggre­ga­tion Predic­tion Algorithm (IMAPA) is an algorithm that forecasts future values of inter­mit­tent time series by aggre­ga­ting the time series values at regular inter­vals and then using any forecast model, such as optimized Simple Exponen­tial Smoot­hing (SES), to predict these aggre­gated values. IMAPA is robust to missing data, compu­ta­tio­nally efficient, and easy to imple­ment, making it effec­tive for various inter­mit­tent time series forecas­ting tasks.
 
4. ADIDA (Aggre­gate-Disag­gre­gate Inter­mit­tent Demand Approach)
The ADIDA model uses Simple Exponen­tial Smoot­hing (SES) on tempo­rally aggre­gated data to forecast inter­mit­tent demand, where demand is aggre­gated into buckets of mean inter-demand interval size. Forecasts are then disag­gre­gated to the original time periods.
 
5. TSB (Teunter-Syntetos-Babai)
The TSB model is an advanced method used in inven­tory manage­ment and demand forecas­ting for products with inter­mit­tent demand, proposed as an exten­sion to Croston’s model. It updates demand proba­bi­lity every period rather than only when a demand occurs, making it more suitable for managing risk of obsole­s­cence in data with many zeros.

THE STATS­FO­RE­CAST PACKAGE

After evalua­ting these models, I stumbled across the Stats­Fo­re­cast package, which offers imple­men­ta­tions of all the methods above in just a few lines of code. Here is a very simple example of how one could imple­ment it:
import polars as pl
from statsforecast import StatsForecast
from statsforecast.models import ADIDA, CrostonClassic, CrostonOptimized, CrostonSBA, IMAPA, TSB

# Create the models (can be extended with hyperparameter tuning)
models = [
    ADIDA(),
    CrostonClassic(),
    CrostonOptimized(),
    CrostonSBA(),
    IMAPA(),
    TSB(alpha_d=0.2, alpha_p=0.2)
]

# Store the names of the models to use them later in polars colum selection
model_names = [m.alias for m in models]

# We want weekly forecasts
sf = StatsForecast(models=models, freq='1w', n_jobs=-1, verbose=True)

sf.fit(train)
FORECAST_HORIZON = 52   # weeks

forecasts_df = sf.predict(h=FORECAST_HORIZON)

# join the forecasts with the target values ('y') from the test data for evaluation
forecasts_df = (
    forecasts_df
    .join(test, on=["unique_id", "ds"], how="left")
    .fill_null(0)
)

This concise code snippet was the basis for me to quickly set up and run multiple models on the dataset. In fact, we tuned the hyper­pa­ra­me­ters of every model with proven methods.

Handling Uniform Distri­bu­tions

The forecasts produced by these models were origi­nally discrete uniform distri­bu­tions over the forecast horizon. To fulfill the requi­re­ment of having specific estimated times for predicted events, I have chosen the follo­wing approach:

Cumula­tive Proba­bi­li­ties Modulo 1

To solve this problem, I cumulated all proba­bi­li­ties over time and calcu­lated the cumula­tive value modulo 1. This allowed me to compare the original proba­bi­lity values with these new values. If the original proba­bi­lity is greater than or equal to the modulo value, this indicates an event at that point in time.

Imple­men­ta­tion steps

1. cumulate proba­bi­li­ties: Addition of the proba­bi­li­ties for each forecast period.
2. Calcu­late Modulo 1: Compute the cumula­tive value modulo 1.
3. Predict Events: f the original proba­bi­lity is greater than or equal to the modulo value, predict an event at that time.

Code example

Here’s a Python code snippet to illus­trate this process:
forecasts_df = (
    forecasts_df
    # Calculate the value with modulo 1, to obtain some kind of 'virtual' probability
    .with_columns(pl.col(model_names).cum_sum().mod(1).over("unique_id").name.suffix("_cum"))
    # If the modulo values are smaller than the original, this indicates 
    # an event with the demand from the original column rounded to the next integer.
    # Otherwise we do not have an event and set the value to None.
    .with_columns([
        pl.when(pl.col(m) >= pl.col(f"{m}_cum"))
        .then(pl.col(m).ceil())
        .otherwise(None)
        for m in model_names]
    )
    # Here we do not need the cumulated values anymore
    .drop(pl.col("^*_cum$"))
)

Validity of the Approach

This approach effec­tively converts a uniform distri­bu­tion into discrete event predic­tions based on cumula­tive proba­bi­li­ties. By compa­ring each forecasted proba­bi­lity to its corre­spon­ding cumula­tive value modulo 1, we can identify specific points in time where events are likely to occur.

Evalua­tion of the Perfor­mance

To evaluate the predic­tion results, I deal with two essen­tial metrics for irregular, sporadic predic­tions: the Cumula­tive Predic­tion Error (CFE) and the Stock-keeping-oriented Predic­tion Error Costs (SPEC). Standard metrics such as MAPE or RMSE are not really suitable for these forecasts, as they do not take suffi­cient account of time shifts or cost-related aspects.

Cumula­tive Forecast Error (CFE)

The cumula­tive forecast error (CFE) measures the cumula­tive sum of the diffe­rence between actual values (`y`) and predicted values (`<Model>`). It provides insights into how well a model’s forecasts align with the actual outcomes over time.
Imple­men­ta­tion
def cfe(df: pl.DataFrame, model_names: list[str]) -> pl.DataFrame:
    """Calculate the Cumulative Forecast Error (CFE) for multiple models in a Polars DataFrame.
    
    This function calculates the Cumulative Forecast Error (CFE) for each forecast model in a DataFrame. CFE is defined as the cumulative sum of the difference between actual values (`y`) and forecast values (``). The result includes the minimum, maximum, and last CFE value for each unique identifier.

    The function takes a Polars DataFrame with the following structure:
    - `unique_id`: A unique identifier for each data point
    - `y`: The actual target value
    - `model_names` (list of strings): Column names representing different model predictions

    Parameters:
        df (pl.DataFrame): Input DataFrame containing the data. It must include a column named `y` representing the actual values, and columns named `` for each model in the `model_names` list.
        
        model_names (list[str]): A list of strings representing the names of the forecast models.

    Returns:
        pl.DataFrame: Output DataFrame with CFE results. It contains columns for `unique_id`, `model`, and statistics such as `cfe_min` (minimum CFE), `cfe_max` (maximum CFE), and `cfe_last` (last  CFE) for each unique identifier and model.
    """
    df = (
        df
        # Calculate Cumulative Forecast Error for each model per unique_id
        .with_columns((pl.col(model_names) - pl.col("y")).cum_sum().over("unique_id"))
        # Unpivot the DataFrame to have a single column for CFE values and corresponding model names
        .unpivot(model_names, index="unique_id", variable_name="model", value_name="cfe")
        # Group by unique_id and model, then aggregate to get min, max, and last CFE value
        .group_by("unique_id", "model")
        .agg(
            pl.min("cfe").alias("cfe_min"),
            pl.max("cfe").alias("cfe_max"),
            pl.last("cfe").abs().alias("cfe_last")
        )
    )
    return df

cfe_df = forecasts_df.pipe(cfe, model_names=model_names)

Stock-keeping-oriented Predic­tion Error Costs(SPEC)

The Stock-keeping-oriented Predic­tion Error Costs (SPEC) measures the predic­tion accuracy by compa­ring actual events and forecast in the form of virtually incurred costs over the forecast horizon. If the forecasts predicts events before the target event occures, we create stock keeping costs. The other way round we get oppor­tu­nity costs. The relati­onship between both errors are weighted with $\alpha \in [0, 1]$. Higher alphas weigh the oppor­tu­nity costs more, while lower alphas give more weight to the stock-keeping costs. In this scenario, both of them are important, hence we set $\alpha = 0.5$.

In addition, we slightly changed the imple­men­ta­tion of the authors to a custom metric, which performes much faster and which is provi­ding us not only the sum of stock-keeping and oppor­tu­nity costs, but also both indivi­dual costs:

Imple­men­ta­tion
def spec(df: pl.DataFrame, model_names: list[str], alpha: float = 0.5):
    """Stock-keeping-oriented Prediction Error Costs (SPEC)
    Read more in the :ref:`https://arxiv.org/abs/2004.10537`.

    The function takes a Polars DataFrame with the following structure:
    - `unique_id`: A unique identifier for each data point
    - `y`: The actual target value
    - `model_names` (list of strings): Column names representing different model predictions

    Parameters:
        df (pl.DataFrame): Input DataFrame containing the data. It must include a column named `y` representing the actual values, and columns named `` for each model in the `model_names` list.
        model_names (list[str]): A list of strings representing the names of the forecast models.
        alpha (float): Provides the weight of the opportunity costs. The weight of stock-keeping costs is taken as (1 - alpha), hence alpha should be in the interval [0, 1].

    Returns
    -------
    pl.DataFrame: Output DataFrame with SPEC results. It contains columns for `unique_id`, `model`, and the SPEC error for each unique identifier and model. The SPEC error is also provided as opportunity costs, stock-keeping costs, and the sum of both.

    """
    df = (
        df
        .sort("unique_id", "ds")
        # Calculate cum sum for every prediction and the target
        .with_columns(pl.col(["y"] + model_names).cum_sum().over("unique_id").name.suffix("_cum_sum"))
        # Compute differences in both directions from the predictions cumsum and the target cumsum
        .with_columns(
            *[(-pl.col(f"{m}_cum_sum") + pl.col("y_cum_sum")).alias(f"{m}_diff_y_f") for m in model_names],
            *[(pl.col(f"{m}_cum_sum") - pl.col("y_cum_sum")).alias(f"{m}_diff_f_y") for m in model_names]
        )
        # Multiply the first difference with alpha (opportunity costs)
        # and the second difference with (1 - alpha) (stock-keeping costs)
        .with_columns(
            *[(pl.col(f"{m}_diff_y_f") * alpha).clip(lower_bound=0).alias(f"{m}_o") for m in model_names],
            *[(pl.col(f"{m}_diff_f_y") * (1 - alpha)).clip(lower_bound=0).alias(f"{m}_s") for m in model_names]
        )
        # Add the opportunity and stock-keeping costs
        .with_columns([(pl.sum_horizontal(f"{m}_o", f"{m}_s")).alias(f"{m}_total") for m in model_names])
        # Group by unique_id and model, then aggregate to get average SPEC values
        .group_by("unique_id")
        .agg(*[pl.col(f"{m}_total", f"{m}_o", f"{m}_s").mean() for m in model_names])
        # Unpivot the DataFrame to have separate columns for SPEC values and corresponding model names
        .unpivot([x for xs in [[f"{m}_total", f"{m}_o", f"{m}_s"] for m in model_names] for x in xs], index="unique_id", variable_name="model", value_name="spec")
        # extract the different spec value indicators (total, o, s) to a separate column
        .with_columns(pl.col("model").str.split("_").list.to_struct(fields=["model", "error"]))
        .unnest("model")
        # pivot by the different splits (total, o, s) in the error column
        .pivot("error", index=["unique_id", "model"], values="spec")
        # rename accordingly
        .rename({"total": "spec_total", "o": "spec_o", "s": "spec_s"})
    )
    return df

spec_df = forecasts_df.pipe(spec, model_names=model_names)

CFE and SPEC together

Using both CFE and SPEC provides a compre­hen­sive evalua­tion of forecast models. Here’s why:
1. CFE (Cumula­tive Predic­tion Error):
  • Trend analysis: CFE helps identify trends in the error over time, allowing you to see if the model consis­t­ently under­pre­dicts or overpre­dicts.
  • Magni­tude and direc­tion: The minimum and maximum values of CFE can provide insights into the overall perfor­mance and direc­tion of the errors.
2. SPEC (Stock-keeping-oriented Predic­tion Error Costs):
  • Time-based Devia­tion: SPEC calcu­lates predic­tion errors based on their effect on future inven­tory levels, penali­zing errors that could lead to high holding costs or stock­outs.
  • Weighted costs: It encou­rages businesses to focus on minimi­zing both overstock and under­stock, leading to improved opera­tional effici­ency.
By combi­ning both metrics, you can make a more informed decision about which model to use based on your specific needs. If under­stan­ding trends over time is crucial, CFE could provide additional insights. If the diffe­rences in predic­tion and target times are of interest, SPEC could be of higher interest.

CONCLU­SION

Our approach using the Stats­Fo­re­cast package and our handling of uniform distri­bu­tions has provided a robust solution for predic­ting rare events. The Stats­fo­re­cast package has made a signi­fi­cant contri­bu­tion to our ability to use various forecas­ting models easily and effici­ently. The evalua­tion methods for forecas­ting show that you need to know exactly which methods to use and when, rather than blindly choosing one. If you have any questions about predic­tions or other mathe­ma­tical issues, feel free to contact us!

NOTE

As you can see, I use Polars instead of Pandas, Numpy, etc. for all compu­ta­tions. At m2hycon we have been using Polars since early 2023 where we saw the poten­tial and impact over other libra­ries like Pandas, Dask, Ray and even Apache Spark. A blog post about how we made the switch and its impact on our produc­ti­vity and results is coming soon!

Picture of Torben Windler

Torben Windler

Lead AI Engineer

Projektanfrage

Vielen Dank für Ihr Interesse an den Leistungen von m²hycon. Wir freuen uns sehr, von Ihrem Projekt zu erfahren und legen großen Wert darauf, Sie ausführlich zu beraten.

Von Ihnen im Formular eingegebene Daten speichern und verwenden wir ausschließlich zur Bearbeitung Ihrer Anfrage. Ihre Daten werden verschlüsselt übermittelt. Wir verarbeiten Ihre personenbezogenen Daten im Einklang mit unserer Datenschutzerklärung.

Project request

Thank you for your interest in m²hycon’s services. We look forward to hearing about your project and attach great importance to providing you with detailed advice.

We store and use the data you enter in the form exclusively for processing your request. Your data is transmitted in encrypted form. We process your personal data in accordance with our privacy policy.