I had been totally tied up with an MLOps project for a couple of weeks, so I could not write testing blogs in the past week. The experience from the MLOps project was rich in terms of possibilities in tools, and also gave great insights into how to test Machine Learning models. I will share a few points in this blog so that testing practitioners can explore further and specialize in one or more of these areas, and probably find their niche too!
MLOps is Machine Learning in production. It is not sufficient to build a model in a notebook like Jupyter and run it locally, but your model should be out there serving customer requests, in production. There are a few approaches of how to make the model available, but the most important part in this exercise is to make the most current, relevant, and eligible model available, because Machine Learning is all about the data with which the model is built. Data is dynamic and changes often, and hence the model has to catch up with it, otherwise the results won’t be right. When the model gets outdated because the data has moved on, it’s called a ‘drift’, and by updating the model at the right intervals, the drift can be significantly reduced. To find the right interval is a challenge, and it comes by experience of the data analyst and the builder of the model.
Coming to the testing part, I found a few key areas interesting.
Testing the underlying libraries
The foundation of machine learning lies in their libraries. There are various types of libraries, and the important ones being the math, statistical, and the scientific libraries. We all assume that these libraries are doing the right thing, and most often they do, but as testers know, there will always be surprises. When we inherit libraries that we usually trust, we take responsibility of the quality of our product based on those libraries. Testing these libraries are not easy, and not simple either. We need the required background and knowledge to figure if the libraries are doing the right thing, but by constant effort, we will get there. That said, it is pretty important to make sure that the libraries are doing the right thing.
Testing the model
We need to test if the model is giving the right outputs for the data as per the expectations. We need to check for ‘overfitting‘ when the model is biased towards one set of data but gives poor results for other. We need to check for ‘bias‘ where the data itself is skewed towards a particular pattern. We need to be constantly testing for ‘drift‘ to check if the model is outdated. We need to test for ‘outliers‘ if the model is performing as expected for a datapoint which lies out of the desired range.
Testing with experiments and picking the right model
As discussed above while talking about ‘drift’, it is important to run a series of experiments to retrain the model at regular intervals to make sure that model works well with the current data. These experiments are actually tests with different Machine Learning algorithms, their hyperparameters. Cross Validation and Feature Engineering also may need to be redone to make sure that the built model is appropriate, as feature engineering done with a set of data might not be appropriate with a different set of data, likewise with cross validation. After running a series of experiments with these various combinations, the right model is selected and promoted to production.
Testing the Model Performance
Another method to assess model performance is to run tests based on thresholds. As it is difficult to monitor models manually, automation checks are available to see if the model crosses a certain threshold when it becomes underperforming. It is important to run these checks also at pre-determined intervals.
Acceptance Tests
As with any software, we need to randomly and constantly be checking the performance of the Machine Learning model during various intervals. I would suggest spreading out the test data randomly so that we cover many scenarios, and not stick to a set of tests with predetermined test data. Play around with the APIs, negative scenarios, unexpected inputs, and stuff. Be creative.
As you can see, there’s a whole lot of testing that can be done with Machine Learning. Pick the areas that excite you and excel in those. With time, you can become an expert in certain areas of testing Machine Learning models.
For testing Machine Learning projects and consulting for your organisation, contact me.