April 2025

Predicting IFRS 17 Net Income Ranges: A Journey from Feature Selection to Explainable AI

Introduction

For IFRS 17, companies often rely on a deterministic calculation mechanism to derive net income and balance sheet figures from actuarial cashflows. The pressing question now is whether incorporating machine learning (ML) can add any value to this existing approach. Although a deterministic pipeline delivers accurate and rule-based results, ML can introduce supplementary advantages that may not be readily visible.

For scenario modeling and fast projections, ML models can quickly estimate results without recalculating the entire pipeline, This would allow you to have net income estimates in minutes instead of recalculating a pipeline for hours making them ideal for "what-if" simulations across numerous economic or portfolio scenarios.

ML also excels in pattern recognition and detecting nonlinear interactions. While deterministic pipelines follow predefined rules, ML can uncover hidden patterns and interactions between features. This is particularly useful when actual results deviate from expectations due to subtle dynamics not explicitly modeled.

In addition, ML is excellent for capturing and explaining uncertainty around results using prediction intervals or probabilistic models. This can be combined with SHAP (SHapley Additive exPlanations) to explain variations by contract group or portfolio segment.

The objective of this article is to predict IFRS 17 net income ranges based on discounted cashflow amounts for premiums, claims, commissions, expenses, and risk adjustments and to provide transparency on the underlying drivers . For the net income, distinguishing between onerous (loss-making) and non-onerous contracts is crucial, as this distinction affects the results in a non-linear manner, with onerous contracts potentially causing significant shifts in the net income.

Determine Key Features

Our dataset (financial cashflows) includes several numerical and categorical features that contribute to net income regression. However, many of these features are correlated or overlapping, thus requiring careful handling.

Primary Features & Their Challenges

Feature	Description	Potential Issue
Premium, claims, commissions, expense, risk adjustment amounts	Expected cashflow amounts over the lifetime of the contract	Premiums, claims and commissions are redundant in a sense that high premiums usually go hand in hand with high claims etc.
Cashflow release patterns premiums, claims, commissions, expenses and risk adjustments	Expected release pattern (starting from 100% reflecting the total volume down to 0% - fully settled) for the cashflows	Release patterns can be represented by an a curve or an array of values. This array needs to be functionally transformed to be utilized effectively by machine learning prediction techniques.
Contract Year	Year of contract issuance	Time dependency could affect performance
Region	Geographic segmentation	Potentially correlates with Line of Business as products vary by region
Line of business (LOB)	Type of insurance (e.g., Life, P&C), insurance product	Insurance products may be restricted to certain regions

Cashflow release patterns

A cashflow pattern shows how money is gradually released over time, starting from the full amount (100%) and decreasing to zero by the end. The curve is composed of multiple data points and can be represented as an array that describes the cashflow pattern over time (see graph below). However, for use as model features, scalar values are typically preferred. Therefore we extract the key characteristics of the curve, such as its average slope, the distribution center (e.g., the first index i where 50% or more of the cashflows have been released), as well as statistical measures like the mean or standard deviation.

cashflow-pattern

The chart above illustrates a sample cashflow release pattern that demonstrates a balanced release, with 50% of the value being released within the first third of the 45-year span. In our calculations, we define a balanced release as having 50% of the release occur between one quarter and three quarters of the contract span.

Multicollinearity

As already indicated in the features table above, multicollinearity (reflecting a strong linear relationships among predictor variables) in a dataset might be a challenge for our ML model. For linear regression multicollinearity can inflate the standard errors of the coefficients, making them unstable and difficult to interpret. This can lead to unreliable estimates and reduced model interpretability. Algorithms like Decision Trees, Random Forests, and Gradient Boosting Machines are generally robust to multicollinearity. However multicollinearity can affect the reliability of feature importance in Random Forest (RF) models. When predictor variables are highly correlated, it becomes challenging for the model to distinguish the individual importance of each variable. This is because correlated variables can share the same underlying effect, making it difficult to determine which variable is driving the outcome.

To evaluate multicollinearity, we therefore iterate through all pairs of columns to fill the correlation matrix. For numerical-numerical pairs, we calculate the Pearson correlation. For categorical-categorical pairs, we calculate Cramér's V and for numerical-categorical pairs, we use mutual information to measure the dependency.

compute_mixed_2

There is a strong intercorrelation between variables such as claims volume, risk adjustment (RA) volume, sum assured (SA) volume, and indirect claims handling volume, indicating that they scale together and likely reflect the portfolio or policy size. Additionally, strong correlations are observed among the release slopes, particularly for claims release slope, risk adjustment release slope, sum assured release slope, and indirect claims handling expense release slope, which likely reflect timing profiles for matching cashflows. However, premiums release slope behaves differently and is less correlated with the others, suggesting a different amortization timing logic for revenue versus expenses. Furthermore, there are extremely strong positive correlations among risk adjustment distribution, sum assured distribution, and premiums distribution, indicating that they move in lockstep. Moderate positive correlations are also observed between volume and distribution variables, suggesting that larger volumes might lead to smoother or broader distributions.

Feature Selection

Given the strong correlations among cashflow volumes, cashflow release slopes, and cashflow release distributions, we select only one representative feature from each category. Additionally, we incorporate the cost ratio, calculated by dividing the total expenses by the premium. The label iot (initial onerousness testing) feature indicates whether a contract is onerous at inception.

The refined correlation matrix below indicates no strong correlations, with the highest being approximately 72% between claims release slope and claims distribution. As expected, the diagonal values are all 1. This feature set, with no strong correlations, provides a solid foundation for applying the models discussed in the next section: Random Forest Regressor and GBT Regressor.

compute_mixed (3)

Prediction using Tree-Based Models

Tree-based models, such as Random Forests and Gradient Boosting Machines (both ensemble learning methods), naturally handle non-linear relationships by splitting the data into regions and fitting simple models within each region, effectively capturing complex patterns. Additionally, tree-based models can automatically detect and model interactions between features, which is particularly useful in IFRS 17, where interactions between various financial metrics and cashflows can be complex and non-linear. Unlike linear models, tree-based models are less affected by multicollinearity, allowing them to handle correlated features without significant issues, which is beneficial when dealing with financial data that often exhibits high intercorrelations.

Let's take a closer look at the GBT Regressor and Random Forest Regressor. Both models require preprocessing steps for the input datasets, specifically string indexing and one-hot encoding to handle categorical features. Fortunately, scaling is not necessary for tree-based models, and we also do not have correlating features to worry about due to our preparation steps. The process further involves building train and test sets, training the model on the training set, and then applying it to the test set. Below are the results of these models.

Random Forest Model

The feature importance analysis from the Random Forest model reveals that Premiums volume is the most influential variable with an importance of 48%, indicating its critical role in predicting the target variable. Cost ratio follows with an importance of 26%, highlighting the significant impact of premiums collected. Claims release slope, with an importance of 15%, contributes by indicating the timing and rate of claims payouts. Although the encoded categorical features, such as IFRS line of business and Claims distribution, contribute less to the model's importance (11% in total), they still provide valuable insights by distinguishing between different business lines and claims distribution patterns.

The normalized root mean square error (RMSE) of the final model is 15%.

\[
\text{Normalized RMSE} = \frac{\text{RMSE}}{\text{mean(target)}}
\]

GBT Boosting Tree Model

The normalized root mean square error (RMSE) of the final model is only 5% which significantly outperforms the Random Forest Model. For a net income of 100'000$ for an insurance contract, we would expect a prediction error of around 5'000$.

Potential reasons why GBT outperforms Random Forest in our model

Gradient Boosting Trees (GBTs) excel in sequential error correction by learning residuals step-by-step, allowing them to adaptively correct previous mistakes. In contrast, Random Forests (RFs) train each tree independently on a bootstrapped sample and only aggregate results at the end. GBTs focus on hard cases and fine-tuning, often leading to better accuracy on regression tasks and lower bias, as they model training data more accurately. RFs may underfit complex patterns due to the independent learning of their trees. In a nutshell, GBT is generally better when modeling complex patterns or when high accuracy and low bias (high variance) are priorities. On the other hand, Random Forest is often more stable on noisy or imbalanced data and benefits from faster parallel training.

Feature Attribution via SHAP

As a next step, we explore SHAP (SHapley Additive exPlanations), a method grounded in cooperative game theory that attributes model predictions to individual features in a fair and consistent way. SHAP can be applied to tree-based models like Gradient Boosted Trees (GBT) or Random Forests (RF) to generate intuitive visualizations, offering deep insights into how each feature influences both global model behavior and individual predictions.

Findings based on selected Shapley graphs

Shapley values originate from game theory, where the goal is to fairly distribute a total reward among a group of players based on their individual contributions. In machine learning the total reward is the model's prediction, the players are the features in the dataset. SHAP calculates how much each feature contributes to shifting the prediction away from the baseline (expected) value.

Fimportance

The bar plot shows that the feature Premiums is the most influential - this dominance suggests that it drives the bulk of prediction variation in the model. The sample portfolio in our case has an overweight on non-onerous contracts hence pushing the importance of this feature. The feature cost ratio is also impactful, though significantly less so due to the imbalanced setup - in a balanced setup (by oversampling onerous contracts) cost ratio would become the most important feature. The features Claims release pattern has moderate influence, suggesting that the shape of claims development plays a secondary but meaningful role in the model's predictions.

beeswarmcostratio

The beeswarm plot visualizes how the top features in out dataset impact the model’s output. In our case, the feature Premiums is the most impactful, with higher values (red) strongly pushing predictions up, while lower values (blue) decrease the prediction. This feature shows a wide spread, indicating a large variation in effect. Similarly, cost ratio is negatively correlated with the model output, where low values consistently drive predictions upward.

The feature claims cashflow pattern slope is more nuanced, as both high and low values can push predictions up or down, suggesting a potentially nonlinear or context-dependent effect. On the other hand, cost ratio exhibits an inverse relationship, where high values (red) slightly lower the prediction, and low values (blue) slightly increase it.

dependency-costratio

The dependency plot shows that there is a strong negative relationship between cost ratio and its SHAP value, meaning that as cost ratio decreases, its SHAP value increases, pushing the prediction higher. Additionally, the color gradient reveals that higher values of Premiums amounts (represented by red dots) generally align with lower cost ratio SHAP values. For high-value contracts (high premiums), the model is much less forgiving of inefficiency (i.e., high cost ratios).

Conclusion

This article set out to explore how machine learning can support IFRS 17 reporting and along the way, we uncovered a few key lessons worth sharing.

First, we saw that the model performs best when it focuses on features closely tied to the economics of insurance contracts: premium volumes, claims development patterns, and cost ratios. Not all features added value, and that’s a helpful reminder that careful selection often beats sheer quantity.

Applying the Gradient Boosted Trees model demonstrated exceptional performance, effectively managing interactions and non-linearities with minimal data preparation. This robust approach allowed for a seamless integration of complex patterns, ensuring accurate and reliable predictions.

More importantly, we found that a model’s usefulness goes beyond accuracy. Using SHAP, we were able to explain not just what the model predicts - but why. This turned complex relationships into understandable patterns, like how lower cost ratio values consistently increase the predicted outcome, especially when paired with high premium volumes.

The broader takeaway is this: explainability is a bridge between data science and business. It builds trust, improves communication, and helps teams make decisions with confidence. While there's more to explore - like time-based effects or model uncertainty- we now have a strong, interpretable foundation that can grow with the needs of our stakeholders.

Alberto Desiderio is deeply passionate about data analytics, particularly in the contexts of financial investment, sports, and geospatial data. He thrives on projects that blend these domains, uncovering insights that drive smarter financial decisions, optimise athletic performance, or reveal geographic trends.