Skip to main content
Article

Ensuring Fairness, Interpretability, Frugality, and Stability in Companies’ AI models

Ensuring Fairness, Interpretability, Frugality, and Stability in Companies’ AI models
Artificial Intelligence
Published on:

Ever since the arrival of the AI boom, this technology has been revolutionizing industries by enabling both sophisticated decision-making systems at scale and remarkably accurate forecasting. But does it safeguard companies from possible discriminatory and legal abuse? Does AI accuracy integrate the challenges of transparency and fairness? Does it build trust? These are some of the questions we have been exploring for the past four years by monitoring banks and companies and their use of AI in daily business transactions. Our conclusions feed into both HEC’s teaching and partner companies’ practices.

Double exposure of brain drawing over us dollars bill background. Technology concept.

Photo Credits: peshkova on 123rf

Key findings & research contributions:

  • Development of statistical tools to assess fairness, interpretability, frugality, and stability in AI models.
  • Methodologies to identify biases and their causes, and improve fairness without compromising predictive power.
  • Alignment with legal frameworks like the Equal Credit Opportunity Act (ECOA) in the US and the AI Act in Europe.
     

No one doubts the remarkable progress AI is offering in all fields of economy and business. For instance, in credit markets, its algorithms process vast amounts of data to distinguish between creditworthy and high-risk borrowers, reducing defaults and enhancing lender profitability. 

In the audit fields, AI red flags suspicious transactions and fraudulent activities for which human scrutiny is required. 

In the insurance industry, automated claim management systems are used to speed up reimbursements without any further costly human expertise. 

In E-commerce platforms, AI is used to change prices dynamically and make suggestions to customers which maximizes the sellers’ profit. 

In hiring, AI is employed to screen a large number of CVs, detecting promising applicants and matching their skills to a given job. 

In marketing, the technology’s churning models are employed to quickly detect customers who are likely to stop doing business with the company. This gives it enough time to preemptively act to retain them.

All these examples exemplify how multiple metrics can be used to check whether the AI delivers on its promises. Those metrics focus on the accuracy of the predictions and can be measured either in statistical terms (average error) or in dollar terms (profit-and-loss). 

Going Beyond Accuracy: a Must

However, our research suggests that in many applications we need to go beyond accuracy. For instance, when automatically assessing the creditworthiness of loan applicants, credit scoring models can place groups of individuals sharing a protected attribute, such as gender, age, or racial origin, at a systematic disadvantage in terms of access to credit. 

One of the best-known illustrations occurred when software developer David Heinemeier revealed how Apple’s credit card offered him much better terms than his wife - despite identical wealth and income: “The @AppleCard is such a fucking sexist program,” he wrote on Twitter. “Apple’s black box algorithm thinks I deserve 20x the credit limit she does. No appeals worked”. 

Clearly, testing the fairness of an AI is both a societal and a business imperative. On top of ethical issues, it avoids negative impact in terms of reputation for companies whose use of AI proves to be discriminatory – with all the legal implications this could have.

In other applications, the extra dimension that companies need to heed is the interpretability of the model. Complex machine learning models often act as “black boxes,” making it difficult to explain their decisions. Business leaders need models that are interpretable—able to provide clear and actionable insights on the how-and-why decisions are being made. This transparency also builds trust amongst stakeholders and helps comply with regulations.

A third dimension is the frugality of AI: efficient AI models that achieve high performance with minimal computational resources are particularly appealing to businesses. Frugal models reduce costs, enhance scalability, and minimize environmental impact, making them a practical choice for companies aiming to balance innovation with resource constraints. The example of DeepSeek’s breakthrough in January is a dramatic illustration of this.

Finally, recent research at HEC, "Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences" (by Vassilis Digalakis and coauthors from Harvard and MIT) underlines how stability plays an important role. AI models should produce consistent outputs even as new data become available or hyperparameters are adjusted. Instability can lead to unpredictable performance, undermining trust and acceptability of the systems. 

In medical applications, stability is particularly critical when physicians rely on AI for disease detection or treatment recommendations. The internal mechanisms of the model must be both understandable and stable to ensure reliability. Otherwise, medical professionals will be reluctant to integrate AI-generated insights into their clinical decision-making."

Shooting for a Fair AI Model

My team and I have made it an ongoing research priority to provide companies the tools to monitor these extra dimensions. We do so by developing statistical tests to assess the score of a particular model on one or several of these dimensions. Alternatively, we design AI tools that natively display the properties required by the company to implement platforms answering questions such as fairness and transparency. 

In a recent paper, "The Fairness of Credit Scoring Models," published in 2024 in Management Science, we answer the three following questions. How can we know whether a credit scoring model is unfair against groups of individuals that society would like to protect? If the model is shown to be unfair, how can we find the causes? 

Finally, how can we boost the fairness of this model while maintaining a high level of predictive performance? A further, and non-negligible part of our work is that we propose a methodology which fits harmoniously with current legal framework ensuring fairness in lending on both sides of the Atlantic. This includes the Equal Credit Opportunity Act (ECOA) in the US and the AI Act in Europe.

 

We developed statistical tests to assess the score of a particular AI model on fairness, interpretability, frugality or stability.

 

In another article, "Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring," just out in January 2025, we show how to identify the driving forces behind the performance of any black-box machine learning model. This is of primary importance in credit scoring for banking supervisors as they need to understand why a given model is working or not and, if so, for which borrowers. We have successfully applied our methodology to a novel dataset of auto loans provided by an international bank, thus demonstrating the usefulness of this method.

From Research Lab to Classroom… to Big Business!

Research is about pushing the boundaries of knowledge. This production of knowledge allows HEC to fuel its course programs with original, state-of-the-art, rich contents. For over four years, I have been teaching a course entitled Fairness and Interpretability in the Master in Data Science and AI for Business. 

This joint program between HEC Paris and Ecole Polytechnique integrates much of the tools and conclusions from our research. The technically challenging course introduces students to the latest techniques in fairness and interpretability. It combines advanced research methods with a business focus that is not common in courses typically offered by business schools. And it corresponds well to HEC’s “Teach, Think, Act” DNA.

A number of the statistical tools we develop to test the fairness of AI models are currently being used by several large French banks. They permit these institutions to comply with the EU’s AI Act that started to come into force on August 1, 2024. 

The new European regulation classifies algorithmic lending as a high-risk example of AI applications. This academic-industry partnership is going beyond simple proofs-of-concept as some of our methods are now used in production at scale by the banks. But it goes well beyond a simple application: experts from the banks challenge the researchers and provide feedback, ideas and access to real data. 

Such exchanges are then relayed to our HEC students, creating a virtuous triangle. We are thus enjoying a win-win collaboration that is set to grow.

The Fairness of Credit Scoring Models, by Christophe Pérignon of HEC Paris, Christophe Hurlin and Sébastien Saurin of the University of Orleans, published in Management Science in November 2024, and Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring, published in Arxiv in January 2025. These papers are part of a research agenda on identifying and addressing fairness, interpretability, frugality, and stability in AI models.