MLOps vs DevOps: Why is MLOps different? (MLOps-2)
ML operations aim to accelerate, scale and sustain model development, deployment and maintenace.
MLOps is a typical Dev-Ops process, which is the sequence of actions taking an ML project from development to production, except MLOPs have a different nature than Dev-Ops due to the reasons listed below.
- ML projects are experimental by nature. They require lots of trial and error in addition to testing, validation, integration and deployment before the project is ready for production. Each of these steps can affect the data or the model in a way that ruins the predictions.
- ML projects require frequent tuning and retraining due to the rapidly changing nature of data during the development and the runtime as well.
- Many ML issues cannot be detected or realized until tested in the runtime, which makes the standard quality engineering process not sufficient to ensure the model is working. ML quality process requires continuous evaluation, continuous training and continuous deployment. This requires going through the whole process to deployment every time before we can ensure that the model is ready to be used.
- ML projects have to be validated ethically before release to ensure that they do not reproduce discriminatory bias or repeat any unethical behaviours that happened in the past.
- ML models are prone to many run-time issues such as anomalies, data drift, concept drift, segment underrepresentation, etc.
ML projects are loopy and reliant on continuously changing data. This requires a highly fluid and responsive operational environment.
The model journey from development to production pass through source control, virtualization, building, testing, issue tracking and deployment. Each of these steps requires some operational task, which is usually performed by a DevOps engineer. This high dependency on DevOps engineers creates an overload on their shoulders and creates a bottleneck in the flow. The sequence diagram below illustrates the journey of the ML model from development to production (figure 1). This sequence diagram shows three problems; (1) The DevOps engineers are overloaded with many tasks to do and the tasks keep repeating in an endless loop, (2) The time between the code check-in and the issue going back to the scientist to be fixed is too long, and (3) The runtime monitoring loop depends on the development loop as it requires human interference to tune/retrain the model in the runtime.
DevOps enables CI/CD (Continuous Integration/ Continuous Deployment) during development time.
MLOPs enables CI/CE/CT/CD, which is Continuous (Integration, Evaluation, Training and Deployment) during development time and runtime.
In the perfect world, the MLOps life cycle will look like the sequence diagram illustrated below (Figure 2). Once the ML scientist submits his code, it will be checked in, virtualized (Dockerized), compiled, built, and deployed to the hosting environment automatically. The reviewer will be notified to check on the code performance. If the reviewer (leader or quality engineer) finds any issue, he/she will add the issue to the issue tracker, which will notify the scientist to fix the issue. Once the reviewer approves the code, it will be deployed to production. Hence the model will exit the development cycle and starts the runtime cycle where the model is continuously evaluated, continuously trained, and continuously deployed.
To Enable MLOps in any organization, we need to have a platform that supports Web-based-IDE, source control, issue tracking, model validation, virtualization, scaling, process monitoring, authentication, model monitoring, champion challenger, CT/CD, and it will be great if it has a unified data interface, and it was cloud-agnostic. The figure below (fig 3) illustrates the features required for DevOps and MLOps
The implementation of the MLOps varies according to the underlying architecture, which can be on-premise architecture, cloud architecture or a third-party multi-cloud architecture. In the next article, we will describe how to build an MLOps platform using on-premise open-access tools. To know more about what is MLOPs and why it is important, check this article.
Hany is an AI/ML enthusiast, academic researcher, and lead scientist @ Catch.com.au Australia. I like to make sense of data and help businesses to be data-driven