An idea that has been around since the 1990s, AutoML – Automated Machine Learning – has been described as a “quiet revolution in AI” that is poised to radically change the landscape of Data Science by automating much of the Machine Learning (ML) process at scale, while maintaining model quality. AutoML should also reduce the time needed to get ML models ready for industrialization with extraordinary ease and efficiency. Academic researchers, start-ups and technology giants have begun to develop AutoML methods and tools ranging from simple open source prototypes to industry-wide software products.
There are two main reasons for the birth of AutoML. Deep Learning (DL) has been applied in various fields and used to solve many challenging AI tasks. However, these models have all been designed manually by experts via a trial and error process. In addition, traditional ML development is resource intensive, requiring significant domain knowledge and time to produce and compare dozens of models.
How does AutoML work?
The AutoML pipeline consists of several processes: data preparation, feature engineering, model generation and model evaluation. Let’s take a look at each of these processes in more detail.
1. Data Preparation
The first step can be introduced in three aspects. Data collection to create a new dataset or extend the existing dataset. Data cleaning to filter the data so that the downstream model training is not compromised. Data augmentation to improve the robustness and performance of the model.
2. Feature Engineering
Feature engineering aims at maximizing the extraction of features from raw data for use by algorithms and models. It includes three processes: selection, extraction and construction of features. Selection to reduce redundancy by selecting important features. Extraction and construction are variants of feature transformation, whereby a new set of features is created. In most cases, extraction aims to reduce dimensionality by applying specific features, while construction is used to extend the original spaces.
3. Model generation
Model generation is divided into two parts: search space and optimization methods. The search space defines the model structures that can be designed and optimized in principle. The types of models can be broadly divided into two categories: traditional ML models (such as the support vector machine and the k-nearest neighbour algorithm) and DL networks. There are two types of parameters for optimization methods: hyperparameters used for training, such as the learning rate, and those used for model design, such as the filter size and the number of layers in the DL model.
4. Model Evaluation
Once a model is generated, its performance must be evaluated. The simplest approach is to train the model to converge on the training set and then estimate the model’s performance on the validation set; however, this method is time-consuming and resource intensive. Some advanced automatic methods can speed up the evaluation process, but lose accuracy in the process, such as low fidelity, weight sharing, early stopping, etc.
What are the advantages of AutoML?
- Efficiency: it speeds up and simplifies the machine learning process and reduces the learning time of machine learning models;
- Performance: AutoML algorithms also tend to be more efficient than hand-coded models;
- Cost savings: having a faster, more efficient ML process means that a company can save money by spending less of its budget on maintaining that process;
- Democratization of ML: You don’t need advanced knowledge of data science or AI to use AutoML either. However, the use of AutoML must be adapted to the context. Some situations require human intervention to choose the most appropriate programming approach.
What are the drawbacks of AutoML?
- Black box: the main drawback of AutoML is the black box effect. The user has very little information about how the pipeline works and the different choices made in the different pipeline components;
- Not generic: this is a relatively new field and some of the most popular tools are not yet fully developed. In addition, AutoML solutions are not adapted to very complex business problems;
- Overrated: one of the main challenges of AutoML is the temptation to consider it as a substitute for human knowledge. A human should still be involved in evaluating and supervising the model, but no longer needs to participate in the step-by-step ML process.
What are the different AutoML technologies?
Research in AutoML is very diverse and has brought up packages and methods targeted at both researchers and end users. Here are some of the top technologies to consider in both of those areas:
DataRobot
DataRobot is probably the best-known commercial solution for AutoML and one of the unicorns in the AI space. It provides the main functionalities required in an AutoML solution.
H2O.ai
H20.ai defines its platform as “the open source leader in AI and Machine Learning with a mission to democratize AI for everyone.” It offers an AutoML package as part of its open-source platform.
Dataiku
This AutoML technology integrated with Dataiku’s Data Science studio offers a code visualization system to better understand the chosen architecture and interpret the results obtained.
Google Cloud AutoML
Google has developed a suite of ML products that allows even developers with little knowledge of ML to train high-quality models tailored to their specific needs.
Azure Automated ML
The approach combines collaborative filtering and Bayesian optimization ideas to intelligently and efficiently search a vast space of possible ML pipelines.
AWS AutoML
AWS offers a range of AutoML solutions for all levels of expertise for ML practitioners looking for an open source solution and for data scientists who prefer a fully managed service that automatically creates models based on the use case. Developers or business users with no ML experience can take advantage of out-of-the-box solutions for specific use cases.
Take part in the Devoteam community
This article is a part of a greater series centred around the technologies and themes found within the first edition of the Devoteam TechRadar. To see what our community of tech leaders said about the current position of AutoML in the market, take a look at the most recent edition of the Devoteam TechRadar.