Performance Testing
Fairly performs performance testing using a combination of techniques and methodologies depdends on the model type
How does Fairly perform performance testing?
In addition to well known industry standard performance metrics such as accuracy score, true negative, true positive, false negative, false positive etc.
The Fairly Platform uses a combination of techniques and testing methodologies that are industry standards and best-in-class to perform performance testing:
For Supervised Binary Classification Models
Objective: Assess model performance and fairness according to ISO/IEC TR 24027 standards.
Procedure: Apply ISO/IEC TR 24027 metrics to evaluate the model’s performance across each protected group individually. [We will assess Gender and Race.] Ensure that all relevant performance metrics, such as accuracy, precision, recall, F1 score, and fairness metrics like Demographic Parity and Equalized Odds, are calculated and reported for each group.
For Supervised Multi-Class Classification Models
Objective: Adapt and evaluate multi-class models under the ISO/IEC TR 24027 standards.
Procedure: Multi-Class Adaptation: Convert the multi-class outputs into binary problems for each protected group. [We will assess Gender and Race.] This can be done by using a “one-vs-all” strategy, where each class is treated as the positive class and all others as negative. Apply ISO/IEC TR 24027 metrics for each binary problem to evaluate performance and fairness for each protected group. Aggregate or analyze the results across all binary adaptations to provide a comprehensive view of model fairness and performance.
For Unsupervised Models
Objective: Adapt and evaluate unsupervised models under the ISO/IEC TR 24027 standards.
Procedure: Unsupervised Adaptation: Convert the multi-class outputs into binary problems for each protected group. [We will assess Gender and Race.] This can be done by using a “one-vs-all” strategy, where each class is treated as the positive class and all others as negative. Apply ISO/IEC TR 24027 metrics for each binary problem to evaluate performance and fairness for each protected group. Aggregate or analyze the results across all binary adaptations to provide a comprehensive view of model fairness and performance.