Machine Learning Architecture – Designing Scalable Solutions

A large part of my job involves designing, building and deploying machine learning solutions for the company I work for. In this post I'm going to quickly highlight some of my key thoughts on this so that if you are new to ML architecture, you can have some good pointers to get from from "blank sheet of paper" to "deployed and running" faster.

The Main ML Architectures

The way I see it, there are only a few ML solution architecture types (they may have fancy names somewhere but I won't know them). I reckon you could assign most, if not all, machine learning implementations into one of these categories (probably because they are so vague/wide-reaching):

The ML Microservice
The Streaming process,
The Batch process,
The Integrated approach

I'll go through each of these and list some examples to get you acquainted with what I mean.

The ML Microservice

Imagine you have a map based product, and the user can interact with this product by clicking on one of the entities on the map, for example a truck, and then they can click a button that says "Predict ETA at destination". If I was building an ML based solution for providing that prediction then it would most likely be behind a REST API to some application that read in a request for a prediction and quickly returned that prediction. One way of doing this would be to deploy a lightweight web application (like a Flask app) to a cloud based hosting service. You could design the Flask app to read in your pre-trained machine learning model from a model repository (cloud based storage) and serve predictions via a simple HTTP request-response interaction.

The Streaming Process

In a 'streaming' approach, you are going to work directly with the data as it comes in. This is a type of 'event driven' architecture, whereby an action in one part of the system initiates the next and so on. It is particularly useful for processes which can be a bit ad-hoc in terms of frequency requirements. For example, you may want a streaming process to analyse click data from your web site as it comes in since the production of results works on the same frequency as the ingestion of the data. When there are more clicks, there will be more analysis, when there are no clicks, no analysis. This also means that you do not have to wait for some threshold amount of data to be ingested before performing your action, like you might in a 'batch' approach (see below).

You can do this using tools like Apache Kafka or Spark Streaming.

The Batch Process (or 'ETML')

If the solution has the aim of producing a relatively standardised dataset with some kind of machine learning result on a schedule or very consistent basis, then it may make sense to deploy this as a batch processing service. In this case, you build a pipeline that extracts data, performs the relevant transformations to prepare for modelling, you then perform the model based calculations and surface the results. These steps are why I often call this 'ETML' ('ETL' being 'Extract, Transform, Load', 'ETML' is 'Extract, Transform, Machine Learning').

In my experience, this sort of architecture is often the easiest to develop and deploy. You are building something that is very stable and you usually do not have to worry about successful triggering of the pipeline. In streaming or dynamic services for example, you have to ensure that the REST API request or the event-driven trigger you employ works appropriately in a variety of circumstances through rigorous testing. For a scheduled batch process this is usually much simpler. You just set up your schedule via a cron job or other process, ensure that it works the first few times and then you're off.

The Integrated Approach

This final option is really a kind of exceptional case. I can totally imagine a scenario where you want to embed the machine learning capability directly within some general solution, rather than externalised as a separate service or process. Some good examples where this may apply are:

'Edge' deployments: When you have machine learning running on an IoT device, potentially one with limited connectivity, it might make sense to directly put the machine learning functionality in your main piece of software. However, I would recommend actually following the 'ML Microservice' approach here using something like Docker containers.
Integration with another microservice: In this case, you may already be developing a microservice that performs some simple logic and the machine learning is really just an extension or variant of that. For example, if you have already built a Flask web app that returns some information or result, the machine learning could just be another end-point.
Out of the box product: If your team is in charge of adding functionality to a piece of software that you are selling onto a client, you may have to package up the machine learning inside the solution somehow. I'm less sure how this would work and I do think that you'd be better using a microservice approach in general.

In Summary

There's a few different ways to package up your machine learning solutions, and it's important to understand the pros and cons of each.

Machine Learning