New machine learning prediction model delivers 37% more accuracy than any traditional approach whilst maintaining explainability.
Our new prediction model predicts an income or expenditure amount to the penny 8.7x more frequently than our previous approach. This new model enables you to use highly accurate predictions in automated risk and affordability decisions, and evidence every decision.
Accurate predictions - whether for income, essential expenditure or risky behaviours - are central to using Open Banking data in credit decisions.
Previously our predictions used a statistical prediction method: outliers were removed and a weighted average - meaning the most recent months had the most bearing - was applied to the rest of the payment. This approach had the highest overall accuracy rate of all statistical methods. Other providers use something similar or a regular (non-weighted) average of the last three, six or twelve months.
These methods do not predict an amount as well as a human can: for example, an underwriter is able to see and account for seasonality and changes in circumstance such as a pay rise or house move.
Harnessing machine learning and five years of transactional data, we’ve completely upgraded our prediction methods. Our new model is more accurate than any traditional statistical approach such as an average of previous months. Measured on millions of real historical transactions, it’s 37% more accurate than any kind of historical average. It predicts an income amount to the penny 8.7x more frequently than our previous approach.
How does our machine learning model work?
Our new method uses a proprietary machine learning model trained on tens of millions of transactions. The model makes each prediction individually by considering many different aspects of that particular inflow or payment. For each type of income and expenditure, for example utilities, the model has learnt particular patterns of seasonality, outliers, trends and circumstance changes. This closely replicates the ways an underwriter will synthesise many pieces of data to make a prediction.
Crucially for defensible decisioning, the model maintains the explainability of traditional approaches. It does this by harnessing machine learning to select the most appropriate statistical method for each prediction. This ensures every prediction is explainable in terms of a statistical approach, but is much more accurate than using the same statistical approach for every prediction.
What accuracy increase is achieved with the new model?
Measured on a huge corpus of real historical data, the new machine learning model is:
- 48.6% more accurate than using a six-month mean
- 38.9% more accurate than using a six-month median
- 43.5% more accurate than using a three-month mean
- 38.8% more accurate than using a three-month median
The new model performs particularly well on essential expenses:
- It’s 46% better at predicting a housing payment than any of the traditional approaches
- It’s 51% better at predicting a monthly loan repayment amount
- It’s 41% better at predicting council tax payments
- It’s 44% better at predicting a transport and fuel cost
On recurring income sources, the model predicts a monthly amount to the penny far more often than our previous method. It gets 9.5x as many exact matches for salary payments and 5x as many matches for pension payments. It fares similarly well with inflowing bank transfers, with 9.8x as many exact predictions.
Learning complex patterns
For example, if Samantha Pull used to earn £2000 per month and as of four months ago earns £2200 per month, an average of the last six months would be £2133 - an underestimation given the change in circumstances. Our predictions model, which has ‘seen’ and learnt from thousands of similar scenarios, will recognise this is a pay rise and choose to predict £2200 for next month’s income.
Samantha may splash out on groceries in December, spending twice as much as usual. If she applies for credit in January or February, using a mean average to predict her next spend on groceries would overestimate her normal monthly spend. Our predictions model, on the other hand, knows from historical data that a spike in December won’t mean a high spend in January and February, and would instead choose to take the median monthly spend of the last six months.
Samantha’s commuting costs, on the other hand, have been evolving throughout the pandemic. The model, trained on historical data to understand the real impact of these changes on future spend, will use an exponentially-weighted average to accurately predict next month.
If you'd like to hear more about this new release or about Credit Kudos you can get in touch with the team.