MLOps at AWS Re: Invent 2020
Updated: Jul 8
Swami Sivasubramanian's Machine Learning keynote was full of exciting announcements of new tools and features. While the vast majority of them are industry breakthroughs, we choose just three to show you how they empower data scientists and reduce the time to market of ML solutions. These announcements have a common idea behind: MLOps - avoid spending money and valuable data scientists' time on infrastructure details to focus on the actual data analysis and model development.
The first announcement - which is double really - is the inclusion of model creation tools along with two data-related tools: Redshift ML and Athena ML. This new service allows you to generate quick models trained and tuned using Autopilot features directly from your data sets. This of course is not intended to replace your data science efforts, but to complement them, to make validation and experimentation with models in much earlier steps of the Machine Learning Development Lifecycle (MLDL). With this service you can test your first hypotheses as soon as you have gathered your data, from the same console, so you can avoid taking it to later steps, reducing time and costs. On top of that, RedshiftML allows you to create, train, and implement machine ML models using familiar SQL commands - you can even use the models as SQL functions in your normal queries!
The second big one is SageMaker Clarify, a tool that helps you detect biases on your training data AND in your running models. Bias is a new, burning-hot topic in the ML field because it can impact a business globally (think racial discrimination biases in job selection for example). SageMaker Clarify automatically evaluates your input data, offering reports about the biases in each dataset before you even train your models. It also evaluates a running model and produces a different set of reports that helps you understand potential issues regarding biases. Moreover, Clarify can constantly evaluate your models and raise alarms when the models bias indicators start drifting through the integration with SageMager Model Monitor.
Lastly but not less important there is Amazon HealthLake. The official description states that “HIPAA-eligible service that enables healthcare providers, health insurance companies, and pharmaceutical companies to store, transform, query, and analyze health data at petabyte scale.”. This is a huge improvement for health-related organizations seeking to capitalize on their data while being compliant and protecting their patient’s information. You can combine the knowledge extraction capabilities of Comprehend Medical with the powerful analytics and the data lake processing capabilities of HealthLake to get insights impossible until now.
The MLOps umbrella is a set of ideals, goals, and some tools and best practices to grease the gears of the machine learning development process. In the MLOps world, data scientists teams only focus on the actual data science, having tools that even automate data cleansing, computing power provisioning and models deployment and testing.
In the next post we will use some of the new tools to train and evaluate ML models, and use the power of MLOps to do it quicker and cheaper than ever before.
If you need assistance for improving your ML workloads or if you are just starting your ML journey, contact us at Teracloud, we are AWS Select Consulting Partners
Comments