End-to-End ML Platform Features and Amazon SageMaker
ML Platforms are Complex Tools for Managing a Large Variety of Development Steps, from Data Pre-Processing, to Training a Model and Running Inference
End-to-end Machine Learning platforms provide a unified ecosystem for the entire ML workflow, from data preparation and model development to deployment and monitoring. Their aim is to simplify and streamline the process of building ML-enabled applications and to make it more accessible to a broader range of users, including those without extensive programming or data science backgrounds.
These platforms offer a rich set of tools and functionalities, which include:
Model Management tools, such as:
Model building algorithms and pre-built models of all types, ranging from supervised to unsupervised and reinforcement-based ones
Access to compute infrastructure and management thereof
Model hosting and storage
Manager for the deployment endpoints and up- and down-scaling tools
Model monitoring and bias detection
Data Management features, such as:
Data labeling and annotation
Synthetic data generation
Data cleaning and preparation
Feature stores
Training experiment tracker
Hyperparameter tuning
Operations tools, such as:
Automated workflow management
Data storage and versioning
IDEs, which range from fully automated to customizable workflows for model development
Amazon, Google and Microsoft all provide End-to-End ML platforms integrated with compute infrastructure, foundation models and data and operations tools. Below is an overview of Amazon’s platform - SageMaker.
Amazon SageMaker - An Overview
Amazon SageMaker offers a variety of features, which can be used together through the integrated IDEs or separately as part of a customized Machine Learning workflow.
For model building, SageMaker provides various algorithms, including over 15 that are built-in and optimized for the platform, and over 150 pre-built models from popular model hubs. It works with variety of model building tools including Amazon SageMaker Studio Notebooks and RStudio where you can run ML models on a small scale to see results and view reports on their performance so you can come up with high-quality working prototypes.
For model training, SageMaker provides:
Access to AWS compute infrastructure
Distributed Training libraries for faster training through data or model parallelism,
Automated up- and down- scaling,
Training Compiler for training hardware acceleration
Real-Time monitoring and training metrics and system resources management
For model deployment, SageMaker offers a variety of inference options based on latency and traffic needs. They include Real-Time, Serverless and Asynchronous inference and single, multi-model and multi-container choices. It integrates with MLOps tools, so you can scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden. To evaluate new model performance against production ones, SageMaker provides a shadow testing tool which helps spot potential configuration errors and performance issues before they impact end users.
SageMaker Ground Truth enables you to label data, such as images, text files, and videos, and generate labeled synthetic data to create high-quality datasets for model training. It offers two options, a self-serve and a fully-managed service, which provide you with the flexibility to use an expert workforce to create and manage data labeling workflows on your behalf or manage your own data labeling workflows.
For data preparation, including pre-processing, selection, cleaning, exporation and vizualization, SageMaker Data Wrangler offers a single vizual interface to monitor data quality, transform it and push it forward through the model development process. Data Wrangler contains over 300 built-in data transformations, so you can quickly transform data without writing any code.
SageMaker’s Feature Store is a fully managed, purpose-built repository to store, share, and manage features for ML models. You can provide data from a variety of sources, such as application and service logs, clickstreams, sensors, and tabular data from Amazon Simple Storage Service (Amazon S3), Amazon Redshift, AWS Lake Formation, Snowflake, and Databricks Delta Lake. Using feature processing, you can specify your batch data source and feature transformation function and the tool will transform it into ML features. SageMaker Feature Store also provides lineage tracking, offline storage and point-in-time queries.
SageMaker Experiments allows you to analyze and compare ML training iterations, keep track of parameters, metrics and artifacts to troubleshoot and reproduce models. You can manage the experiments’ metadata, evaluate them through vizualizations such as scatter plats, bar charts and histograms and share them through SageMaker Studio. The free tier provides you with 100,000 metric records ingested per month, 1 million metric records retrieved (via APIs) per month, and 100,000 metric records stored per month and is available for the first 6 months.
For hyperparameter tuning, SageMaker Automatic Model Tuning finds the best version of your model by running multiple training jobs on your dataset using your specified algorithm and hyperparameter ranges. It then chooses the hyperparameter values that result in the best performing model, as determined by your chosen metric.
To manage ML workflows at scale, SageMaker Pipelines provides automation of the development steps including data loading, data transformation, training and tuning, and deployment. You can create ML workflows with an easy-to-use Python SDK, and then visualize and manage it using Amazon SageMaker Studio. Pipelines logs every step of your workflow creating an audit trail of model components such as training data, platform configurations, model parameters, and learning gradients. It also has a model registry, to track the various workflow versions in a central repository and choose the one you need every time.
For detecting bias in ML data and models, the Clarify tool can detect potential imbalances and bias during data preparation, after model training, and in the deployed model and send you alerts and reports. Its feature importance scoring tool helps you explain how your model makes predictions and produces explainability reports in bulk or real time.
SageMaker offers a variety of IDEs to manage and customize your model development process. They include:
Studio - fully integrated development environment to perform all steps, from preparing raw data to deploying and monitoring ML models, with access to the most comprehensive set of tools in a single web-based visual interface.
Studio Lab - No-setup ML learning and experimenting environment with pre-configured GitHub integration with the most popular tools, frameworks and libraries
Canvas - No Code, visual interface with ready-to-use models to generate accurate ML predictions for classification, regression, forecasting, natural language processing (NLP), and computer vision (CV).
Autopilot - Automatically build, train, and tune ML models based on your data, while maintaining full control and visibility
Jumpstart - ML hub with foundation models, built-in algorithms, and prebuilt solutions that you can deploy with just a few clicks
Edge Manager - optimizer for deployment on a wide variety of edge devices, from edge servers to smart cameras and IoT sensors
Machine Learning platforms are complex tools, which manage a large variety of development steps, from data pre-processing, to training the model and running inference points. However, given the rising demand for ML-enabled applications for enterprises and consumers alike, this toolset will continue to evolve and mature and produce even more complex and sophisticated platforms and associated services.
Whether you are building your own Machine Learning platform or customizing one from a 3rd-party, such as Amazon SageMaker, I hope it takes you on an interesting journey and produces great results and AI-based experiences!