
Effortless Time Series Feature Generation: A Practical Approach for Simple Models
- Von Mark Willoughby
Share post:
Among all the steps involved in time series analysis and forecasting, feature generation is one of the most important. Time series features are able to capture the trend, seasonality, lags, or rolling statistics that help the model comprehend temporal dynamics.
In this post, we will demonstrate how to use some practical and simple techniques for time series feature generation. We will avoid deep learning complications and focus on effective, simple approaches to create features that will empower simpler models to deliver robust explainable predictions.
How to generate features: Feature engineering vs. deep learning methods?
While deep learning models can automatically learn complex representations from raw time series data and can be trained directly with limited or even no feature transformation, more classical models such as linear regression and decision trees rely heavily on well-engineered features. These models perform best when they are fed with meaningful features extracted from the raw data; feature engineering is then an essential ingredient for good model performance and explainability.
The good news is that you don’t need a sophisticated AI to generate meaningful time series features. Instead, tools and libraries focused on deterministic functional transformations make it possible to extract them with very little effort: efficient, interpretable methods are ideal for simple machine learning models.
Python libraries for generating time series Features
A few libraries are able to automate the process of extracting features from time series data. Among the best known are Catch22 which provides 22 carefully curated time series features, and tsfresh a comprehensive library for extracting a wide range of time series features. While these tools are extremely powerful, they can be very computationally intensive and are therefore not as well suited to very large data sets or real-time applications common in industry.
In this post, we will focus on Functime
, a lightweight and efficient open-source library for fast feature generation built on Rust and seamlessly integrated with Polars. Functime
offers an excellent balance between computational speed and flexibility, making it ideal for scenarios where simplicity and performance are crucial. It is optimized to allow effortless calculation of statistical features, including lags, over a desired periodicity. With Functime
, features for simpler models can be generated quickly, without the complexity and overhead of more elaborate tools.
UseCase: Predicting the shutdown of a water pump
For this use case, we aim to predict potential shutdowns of a water pump using time series data. The data set consists of raw sensor readings collected at regular intervals from 52 sensors, together with a timestamp and a status label (machine_status
). Each sensor records a specific aspect of the pump’s operation, such as pressure, temperature or flow rate.
The dataset is structured as follows:
timestamp # Zeitstempel jeder Beobachtung
sensor_00 # Messwerte von Sensor 00
...
sensor_51 # Messwerte von Sensor 51
maschine_status # Betriebsstatus der Pumpe (NORMAL, WARNUNG oder ABGESCHALTET)
The data for our example comes from the use case of the same name on Kaggle
ts_sensor_path = Path(os.path.abspath("")).parents[1] / "data" / "ts_data"/ "ts_sensor_data.csv"
ts_sensor_data = pl.read_csv(source=ts_sensor_path)
ts_sensor_data.head()

Create features with FUNCTIME
With Functime, we take on the challenges of modeling: the generation of time series features. The tool computed statistical features over specific time periods and helps transform the original data into a set of intuitive characteristics. For the purposes of this presentation, we calculated statistical features such as the absolute maximum and root mean square for all six hours of data from each sensor. Over 312 features were computed from the data from 52 sensors in just 73 milliseconds, showcasing the efficiency of the feature generation process. This rapid computation makes it feasible to handle real-time or high-frequency sensor data streams without significant computational effort. The computed features can be used directly as input for classical models such as Linear Regression, Random Forests, or Gradient Boosted Trees.
def generate_features_for_timeseries(column_name: str) -> dict:
ts = pl.col(column_name).ts
return {
f"mean_n_absolute_max_{column_name}": ts.mean_n_absolute_max(n_maxima=3),
f"range_over_mean_{column_name}": ts.range_over_mean(),
f"root_mean_square_{column_name}": ts.root_mean_square(),
f"first_location_of_maximum_{column_name}": ts.first_location_of_maximum(),
f"last_location_of_maximum_{column_name}": ts.last_location_of_maximum(),
f"absolute_maximum_{column_name}": ts.absolute_maximum()
}
sensor_columns = [col for col in ts_sensor_data.columns if col not in ['timestamp', 'machine_status', '']]
new_features = {
feature_name: calculation
for sensor_column in sensor_columns
for feature_name, calculation in generate_features_for_timeseries(sensor_column).items()
}
timeseries_features = (
ts_sensor_data.group_by_dynamic(
index_column="timestamp",
every="6h",
group_by="machine_status",
start_by="window"
)
.agg(**new_features)
)
timeseries_features.head()

The following graph shows how well our features correlate with our targets. These time series features provide valuable insights into the behavior of the sensors and their relationship to the pump status (NORMAL
, BROKEN
or RECOVERING
). For example, we can determine that the pump is broken when the root mean square value of sensor 48 is close to 0 . We also expect that higher absolute maximum values for sensors 3, 4 and 11 increase the probability that the pump is in a NORMAL
state.

Model predictions with our time series features
SelectKBest
, we select the 30 most important features based on ANOVA F-Statistiken
. As a base model, we chose a HistGradientBoostingClassifier
that is robust to unbalanced classes and inherently provides good predictions.
This streamlined approach, powered by lightweight feature generation, shows how classical models can provide high-quality predictions when combined with well-engineered time series features. - Feature generation leverages Rust-based
functime
andpolars
data processing libraries, which make it possible to work with large data sets even on a simple notebook. - The model handles effectively class imbalances, achieving high metrics across all categories. This demonstrates the strength of
HistGradientBoostingClassifier
combined with well-crafted time series features. - Minor performance dips for the
RECOVERING
class indicate possible improvements, such as fine-tuning the model or including additional features tailored to transition states.
X = timeseries_features[timeseries_features.columns[2:]]
y = timeseries_features["machine_status"]
selector = SelectKBest(score_func=f_classif, k=30).set_output(transform="pandas")
X_selected = selector.fit_transform(X, y)
X_train, X_test, y_train, y_test = train_test_split(X_selected, y, test_size=0.3, random_state=42, stratify=y)
model = HistGradientBoostingClassifier(random_state=42, class_weight="balanced")
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
report = classification_report(y_test, y_pred)
print(report)
PRECISION | recall | F1-SCORE | N (Support) | |
---|---|---|---|---|
BROKEN | 1.00 | 1.00 | 1.00 | 2 |
NORMAL | 0.99 | 1.00 | 1.00 | 346 |
RECOVERING | 0.96 | 0.92 | 0.94 | 26 |
accuracy | 0.99 | 374 | ||
macro avg | 0.98 | 0.97 | 0.98 | 374 |
weighted avg | 0.99 | 0.99 | 0.99 | 374 |
With streamlined feature generation, the model demonstrates exceptionally promising performance across all classes.
- BROKEN: The model makes perfect predictions with precision, recall and F1 score of 1.00, but may not be very reliable as there are only two examples (support = 2);
- NORMAL: The model is almost perfect with 99% precision and 100% recall, which shows that almost all normal examples have been correctly identified;
- RECOVERING: There was a slight drop in performance (F1 score = 0.94) due to some false negatives, suggesting that improvements are possible through feature engineering or hyperparameter tuning .
Conclusion: Functime package as a real help for time series features
The Python package Functime
can really make life easier by creating the time series features in a matter of seconds. For our model for predicting the functionality of water pumps, the performance was already really promising without us having to do any time-consuming fine tuning. Another advantage of automated feature creation is, of course, that no feature is accidentally forgotten and the procedure can be easily repeated with new or extended data.

Mark Willhoughby
Data Scientist