Features Stability

The features stability tests calculate the Population Stability Index (PSI).

PSI is a statistical measure commonly used to assess the stability of the population distribution across different datasets or time periods.

It quantifies the degree of change in the distribution of a variable between a reference population (often the training dataset) and a comparison population (e.g., validation dataset or production data).

PSI compares the expected and observed values within predefined bins or categories and calculates a composite index that reflects the magnitude of distributional change.

Why are Features Stability tests important?

PSI analysis is important in bias testing because it provides a systematic and quantitative approach to assess model fairness, detect biases, and ensure equitable outcomes across different population segments.

How does FAIRLY perform Features Stability tests?

The project is accessed through a collaborative effort. Evaluation involves a combination of qualitative questionnaire-based assessment and quantitative model testing.

Supporting evidence can be attached to each of the controls to substantiate the provided answers.

The formula for calculating PSI involves the following steps:

Divide the variable’s range into several bins or intervals.
Calculate the proportion of observations falling into each bin for both the reference and comparison populations.
Calculate the cumulative proportion for each bin.
Calculate the absolute difference in cumulative proportions between the reference and comparison populations for each bin.
Calculate the natural logarithm of the ratio of the cumulative proportion differences.
Multiply the logarithmic differences by the difference in the proportion of total observations between the reference and comparison populations for each bin.
Sum up the contributions from all bins to obtain the PSI value.

Here’s the formula for PSI calculation:

[ PSI = \sum_{i=1}^{n} (p_i^R - p_i^C) \cdot ln\left(\frac{p_i^R}{p_i^C}\right) ]

Where:

(n) is the number of bins or intervals.
(p_i^R) is the cumulative proportion of observations in the (i^{th}) bin for the reference population.
(p_i^C) is the cumulative proportion of observations in the (i^{th}) bin for the comparison population.

In practice, the number of bins and their boundaries can vary depending on the data and the analyst’s judgment. However, it’s common to use quantiles (e.g., deciles) or equally spaced intervals to divide the variable’s range.

How do I interpret the Features Stability results?

A high PSI suggests a significant shift in the population, which may indicate issues such as model degradation or changes in the underlying data generating process, prompting further investigation and potential model recalibration.

Conversely, a low PSI indicates a relatively stable population, signifying that the model’s performance has remained consistent over time.

Limitations

Keep in mind that while the PSI provides a useful measure of population stability, it is not an absolute indicator of bias or fairness.

It should be used in conjunction with other fairness metrics and domain knowledge to evaluate the performance of machine learning models, especially in sensitive applications like lending.