AI/ML Model Validation Framework

Original Source Here

AI/ML Model Validation Framework

It’s more than a simple MRM issue

Source: Unsplash

Model Risk Management (MRM) is a standard practice for any financial institution to assess the model risk. However, in the analytics space, there is a paradigm shift from earlier mainstream models/methodologies to cutting-edge Artificial Intelligence/Machine learning (AI/ML) techniques. In line with advancement of the analytics space, MRM policies/frameworks also need to be upgraded to ensure that any incremental risk due to the AI/ML methodologies are well captured, highlighted and mitigated.

The article discusses a robust framework for validating the AI/ML models. Before we deep dive into the framework, it’s imperative to discuss two key aspects:

A) Risk Tiering of AI/ML solution:

To make AI/ML trustworthy, Risk Tiering of the related products/solutions is vital. This particularly helps to somewhat quantify any potential harm from the solution. Recently European Commission has also come up with a set of guidelines around risk tiering. With the given defined standards in the industry, the associated risks with AI/ML solutions are categorized as follows:

Image from Author

B) Bias & Fairness

In Deep Learning, bias & fairness are important considerations both from the data profiling and modelling methodology perspective. Both of which could have dire repercussions. From a data perspective, biasness could exist because of empirical skewness. For example, if we search “nurses” in google it highlights one specific gender more. Similarly, the credit scores of certain race/skin colour could also be biased. From modelling perspective, the deep learning models always tend to overfit the data, thereby creating biasness.

Unfortunately, to the dismay of the validators there are no standard definitions of bias and fairness. When a decision derived from the model likely to impact individuals or firms, bias & fairness comes into question. To address these concerns:

a) Validators should evaluate the variables with caution and call out if there are any issues one might face with respect to the variables. Sensitive variables (like age, gender, religion, profession etc.) should be excluded from data or modelling, if not extremely necessary.

b) Validators should review the bias & fairness appropriately based on above mentioned risk tiering. High risk models might have high exposure to bias & fairness issues.

c) Validators need to define some standard techniques or practices to identify and address this issue. For example, datarobot provides an option to automatically identify the bias.

Given the context, let’s deep dive into the AI/ML model validation framework. This framework can be divided into following dimensions:

Image from Author

1) Data Appropriateness

Training of AI/ML models usually requires a humongous amount of data, which could be quite unstructured as well. This would warrant ensuring:

a. Protection of PII or any personal data. Additionally, the process of collection and treatment of data should be taken into consideration.

b. Testing of the integrity and appropriateness of the data, so that it can be used for the right purpose and right way.

c. Pre-processing, if any (like transformations, normalization, missing value computation etc.) is applied to both train and test data.

d. Evaluation the completeness of the data by reviewing time-periods, sources, distributions and access the definition of labelling if required

e. There are no bias & fairness issues as mentioned above

2) Methodology and Model testing

Unlike traditional models, AI/ML models are usually black boxes. Therefore, the testing of the model parameters, output and sensitivity to the inputs becomes challenging. To ensure that the model provides expected results and is stable over the period, the validators need to:

a. Understand the objective of the methodologies along with business need

b. Review the hyperparameters to tune the model:

i. Vectorization techniques (Word2Vec, Glove, FastText, One-hot-encoding),

ii. Optimization functions (Gradient descent, SGD, MiniBatch, Adam)

iii. Activation functions (Sigmoid, Tanh, ReLU)

iv. Loss functions (MSE, Cross Entropy Loss, Hinge Loss)

v. Number of layers

vi. Batch size

vii. Drop-out rate

viii. Pooling methods (max, average)

c. Ensure the hyperparameters are aligned with the model purpose and usage.

d. Assess whether the performance metrics like False positive, Precision, Recall are properly defined as per business need.

e. Evaluate the model accuracy and stability through more computationally intensive methods, like re-executing the model using different subsets of data, k-fold cross validation, Leave-one-out Cross-Validation (LOOCV), Nested Cross-Validation.

f. Ensure sensitivity analysis has been thoroughly performed by which impact of each feature can be measured. There are more advanced methods for global interpretability such as partial dependence plots (PDP), which allows finding out trends by visualizing the average partial relationship between the predicted response and one or more features.

g. Assess the likelihood of the scenario and the impact of it, once the sensitivity is captured. To ensure that the model is tolerant to any extreme scenarios or noises, scenario analysis needs to be performed.

h. Evaluate the benchmarking or challenger models and compare with the final model.

i. Ensure the plan for Adaptive or continuous learning is in place (if any) so that model is capable of learning from new data.

j. Assess the use of any pre-trained model (like Glove, FastText, ResNet etc.) cautiously as per data and problem in hand.

3) Conceptual Soundness and Interpretability

Compared to traditional techniques, AI/ML techniques are still not widely acceptable by regulators or practitioners. This is primarily due to its black box nature which makes it difficult to establish the explainability and fitment with respect to the modeling or business context at hand.

To measure the transparency, explainability and feature importance of a black box model, SHAP, LIME or Explainable Boosting Machines (EBM) is highly used, which are model-agnostic and also provides the interaction term for the models. Validators need to ensure that this type of analysis has been performed and the conclusion from the analysis is aligned with the business problem.

4) Model Implementation and Model Security

Once the model is developed, a key important step is to implement the model in production either in server or cloud like Azure or GCP. In this step validators need to assess the readiness and design of the model implementation plan meticulously. Validators also need to evaluate whether the applications including libraries, modules and configurations are appropriate for the implementation whilst considering the potential impacts from future release. In this case, Docker/Kubernetes makes it easier to create, deploy, and run applications by using containers.

Apart from implementation, a viewpoint on model security like adversarial attack, model theft etc. is likely to be designed in this validation framework. To this extent, risk tiering (mentioned earlier) should be an important consideration while defining transparency of the solution.

5) Model Documentation and Version Control

The documentation should be self-explanatory and extensive enough so that it allows validators to replicate the model.

Documentation should describe the development data extraction and pre-processing, model theory and design, development approach and model performance including challenger models, with proper model documentation guidelines. It should appropriately mention the assumptions, weaknesses and limitations, provide an estimate of their impact, and document the mitigants for the associated model risks. Finally, codes should be sufficiently commented and have a brief explanation about the functions and should maintain the version control.

6) Ongoing Monitoring and Governance

Validators should assess the monitoring plan to ensure that the components like scope, objectives, stakeholders, and roles and responsibilities are well covered. Also, frequency and time-interval of scheduled revisit or recalibrations should be evaluated. Effective oversight from management bodies will ensure that management is aware of all associated model risks.


As the industries continue to improve on the adoption of new AI/ML regime, the MRM framework needs to become more robust and comprehensive than before. The validators need to evaluate the models across all key dimensions as highlighted in the article.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: