What is Automated Machine Learning (AutoML)?
AutoML is a subfield of machine learning concerned with the automation of repetitive tasks of ML processes. It offers pre-designed data analysis tools that allow businesses to obtain well-performing machine learning algorithms for accurate, low-cost, and quick predictions. Wikipedia defines AutoML as "the process of automating the end-to-end process of applying machine learning to real-world problems."
Which machine learning processes can we automate?
AutoML solutions aim to automate some or all steps of the machine learning process, which includes:
- Data pre-processing: While real-world data likely contain errors and often incomplete, this process transforms raw data into an understandable format. Techniques like data cleaning, data integration, data transformation, and data reduction are included in this step.
- Feature engineering: It is a method of using domain knowledge of the data to construct features that make machine learning algorithms work.
- Feature extraction: This process combines or reduces variables in the raw data to obtain useful features and reduce the amount of data to be processed.
- Feature selection: Within the raw data, there might be many features that contain irrelevant data. You can choose and use only useful features for analysis in this process.
- Algorithm selection & hyperparameter optimization: A hyperparameter is a parameter whose value is used to control the learning process. AutoML tools can choose a set of optimal hyperparameters for a learning algorithm, and even select the algorithm that works best with the given conditions.
Why is AutoML important?
In a world where people generate increasing amounts of data, businesses require a wide range of data science techniques to conduct accurate analyses and make careful decisions. Without these methods, organizations might be unable to understand their customers clearly, notice sales trends, and can take actions that might result in huge losses. In this environment where data science is becoming more critical for businesses, data science talent is scarce, and projects take significant time. AutoML aims to solve both problems through automation and is, therefore, being adopted by global enterprises.
Human error and bias can undermine the consistency of an organization's models and lead to less accurate predictions. AutoML allows companies to quickly adopt machine learning solutions and leverage the expertise of data scientists on human-level cognitive tasks that can not be easily automated. This increases the return on investment in data science projects and shortens the amount of time it takes to go live and generate business benefits.
What are the benefits of AutoML software?
AutoML solutions support companies to provide more efficient services. The main benefits can be summarized as below:
- Cost Reductions: AutoML solutions save a significant amount of time by eliminating manual parts of the analyses and providing faster deployment. With that, the productivity of machine learning processes increases. Also, AutoML reduces the demand for data scientists by democratizing machine learning.
- Improved Accuracy: As companies grow, the amount of data expands, and trends in the industry evolve. AutoML leads to better models by combining human expertise with machine precision on automatable tasks. As a consequence, all potential errors are removed, and continuously evolving algorithms increase accuracy. For this advantage, businesses can achieve a high degree of accuracy in their forecasts and increase their revenues and customer satisfaction with more accurate insights.
What are typical use cases?
Businesses can automate their machine learning processes in a wide range of use cases. Mostly, companies want to boost the efficiency of their machine learning methods and reach automated insights for better data-driven decisions and forecasts. Typical use cases include:
- Fraud Detection
- Sales Management
What are potential pitfalls with AutoML?
Although we expect AutoML solutions to grow stronger, there are still limitations that restrain AutoML from its full capacity. Here are the primary pitfalls:
- Still under development: AutoML is still a growing technology that hasn't reach its potential yet. While it mostly focuses on only supervised models, we can observe that humans beat models that are generated by AutoML solutions.
- Requires high computational power: To run machine learning processes automatically, companies need to satisfy high computing and storage requirements. Most businesses might prefer more straightforward solutions, as they might not meet them.
- Lack of explanability: Businesses look for models that are transparent and understandable. Thus, complex models wouldn't be preferred. However, AutoML models can be more complicated than manually configured models, as automated models tend to add complexity to improve results. However, there is a significant effort in this field to ensure that autoML models do not bring additional complexity.
What are AutoML solution providers?
While you can find AutoML solution providers above, we can collect them under three main categories:
- Open Source: Even secretive tech giants like Apple have released their research findings on AutoML. However, open-source tools require a user to write at least a few lines of code in Python or R to initiate processes.
- Startups: Many startups aim to provide AutoML tools that can be operated by a non-technical user. Many of these solutions also offer a visualization for greater transparency of the resulting models.
- Tech Giants: Tech giants like Google start to offer AutoML solutions for businesses. While Google Cloud AutoML is one of the first AutoML tools to be introduced by a tech giant, IBM's SPSS is one of the most common analytics software providers and offers numerous tools, like auto-classifiers.
How will AutoML evolve in the future?
Data scientists predict that AutoML will get better every day and allow the data-driven industries to handle their core processes efficiently. No matter in which area you're doing business, AutoML is likely to become a powerful solution that can manage the manual parts of your machine learning processes. According to a recent ODSC West 2018 talk by Randal S. Olson, Ph.D., in the next five years, AutoML solutions will:
- handle most of the data cleaning processes.
- improve the performance of deep learning algorithms.
- be more scalable, meaning that large datasets will be handled more efficiently.
- become human competitive.
- be a step towards a broader meta-learning movement.
What are some best practices for AutoML software?
Several best practices can be implemented to aid in AutoML processes. According to DataRobot, one of the leading vendors, the best practices of AutoML tools include the following:
- Start by collecting data: Businesses should describe the tangible result that they intend to forecast, like revenue or consumer turnover. They also need to understand that paper-based data is challenging to obtain, and they have to invest in digitalization.
- Focus on low-risk endeavors that can be completed in less than six months: Colin Priest, the vice president of DataRobot, states that any project that takes more than a year is "almost certainly doomed for failure," and ones that last longer than six months are also at high risk due to project drags. Thus, companies should seek ideas that can be delivered to the market in a shorter time.
- Beware of team silos: One primary reason for abandoned projects is that IT teams aren't informed early enough in the project's life cycle. Companies should ensure that their services can be applied alongside with the new project.
- Debunking the ‘replacement’ myth: The best types of problems to address are those that involve bringing in more customers, developing your product, boosting customer satisfaction, and optimizing production lines. At the same time, AutoML projects that are about reducing expenses or replacing staff tend to fail.