Microsoft Azure ML
Microsoft Azure ML was made generally available in Feb’2015 so like Amazon ML it’s relatively young offering but it’s a feature-rich offering! There’s something for everyone — beginner to advanced users. Users who are just starting out, they also have a workflow that helps them get started quickly and for intermediate/advanced users, there is support of R and IPython notebooks
about Azure ML:
Azure ML has a workflow and a visual editor that beginners can easily follow and build their first ML project with Azure ML!
Azure ML supports following data sources: CSV, SQL Database tables, RData among others. You can check out the list here: https://azure.microsoft.com/en-us/documentation/articles/machine-learning-data-science-import-data/ — you can also automate it using the Azure Data Factory which is one of the other services that is part of the Azure Cloud offering as well.
Azure ML has common data cleaning and transformation tasks that you can use or you can also build the data pipeline using R code with Azure ML
Azure ML supports following problems:
Binary & Multiclass classification
For each problem, Azure ML gives you the option to try multiple algorithms — you can also bring other algorithms supported on R or Anaconda Python
Azure ML also helps you tune the parameters for each algorithm — in fact they have a “sweep parameter” task that iterates you multiple input options for each algorithm parameter and identifies the optimal parameter setting for your problem
Azure ML also makes it easy to compare the performance of different algorithms and help you select the best one for the problem at hand!
It also supports R and Anaconda Python notebooks so you can port your existing R/Python code as well and use Azure Platform to operationalize your Machine learning project
Amazon ML was announced in April 2015 and it’s a relatively young offering and so it’s understandable that it’s limited in capabilities/algorithms offered. It seems that Amazon launched a version-1 of their ML product for their existing AWS customers to help them get started — and if there is more demand from customers then I think the service would evolve over time. Here are few more things you should know about Amazon ML:
Amazon ML has a wizard that walks you through each step and so it enables developers without ML know-how to get started
Amazon ML supports data sources available on AWS platform like Redshift, S3 etc — so you will have to move your data to AWS before you can use Amazon ML — but it’s great if you are an existing customer!
Amazon ML supports basic data cleaning and transformation tasks — but you will have to do the heavy lifting of cleaning/transformation data somewhere else for intermediate to complex needs.
Amazon ML currently supports following ML problems:
Binary and Multi-class classification
Amazon ML does not let the developer select the algorithm for the problem at hand — for instance, if you have a binary classification problem then it automatically uses Logistic Regression algorithm for you. It doesn’t let you change the algorithm to something like Two-class SVM or Two-class decision forest
For each algorithm, you can set some training and evaluation parameters — so it’s limiting for advanced users
Amazon ML does give you common performance metrics to evaluate your model’s performance — for example, if you are building a binary classification model then it gives you Binary AUC.
So with that, let’s use our framework, to evaluate Amazon ML:
Machine learning as a service
ML-as-a-service platforms cover most infrastructure issues as far as data pre-processing, model training, and model evaluation, with further prediction performed in a cloud. Prediction results can be bridged with your internal IT infrastructure through REST APIs. Amazon Machine Learning, Azure Machine Learning, and Google Prediction API are three leading cloud services that allow for fast model training and deployment with little to no data science expertise. These should be considered first if you assemble a homegrown data science team out of available software engineers. Have a look at our data science team structures story to have a better idea of roles distribution.
This post isn’t intending to provide exhaustive instructions of when and how to use these platforms, but rather what to look for before you start reading through their documentation.
Amazon Machine Learning
Amazon Machine Learning is one of the most automated solutions on the market and the best fit for deadline-sensitive operations. The service can load data from multiple sources, including Amazon RDS, Amazon Redshift, CSV files, etc. All data preprocessing operations are performed automatically: The service identifies which fields are categorical and which are numerical, and it doesn’t ask a user to choose the methods of further data preprocessing (dimensionality reduction and whitening).
Prediction capacities of Amazon ML are limited to three options: binary classification, multiclass classification, and regression. That said, Amazon doesn’t support any unsupervised learning methods, and a user must select a target variable to label it in a training set. Also, a user isn’t required to know any machine learning methods because Amazon chooses them automatically after looking at the provided data.
This high automation level acts both as an advantage and disadvantage for Amazon ML use. If you need a fully automated yet limited solution, the service can match your expectations. However, it doesn’t contribute a lot to understanding machine learning specifics and can’t be used as a launch pad to train domestic developers in data science.
Microsoft Azure Machine Learning
Unlike the Amazon ML product, Azure Machine Learning is aimed at setting a powerful playground both for newcomers and experienced data scientists. Almost all operations in Azure ML must be completed manually. This includes data exploration, preprocessing, choosing methods, and validating modeling results.
Approaching machine learning with Azure entails quite a steep learning curve. But it eventually leads to a deeper understanding of all major techniques in the field. On the other hand, Azure ML supports graphical interface to visualize each step within the workflow. Perhaps the main benefit of using Azure is the variety of algorithms available to play with. The Studio supports around 100 methods that address classification (binary+multiclass), anomaly detection, regression, recommendation, and text analysis. It’s worth mentioning that the platform has one clustering algorithm (K-means).
Happy machine learning !!