ML Model Monitoring and Observability Tools
The Model Monitoring Segment is Fragmented and Minimally Differentiated With a Lot of Opportunity for Diversification and Specialization of Observability Tools
Machine learning monitoring tools track and report on the proper functioning and performance of models according to the use cases and applications they are built for.
Their features include:
Data Quality Monitoring: They can detect anomalies, missing values, and data drift that might affect model performance.
Model Performance Monitoring: They can track metrics such as accuracy, precision, recall, F1-score, and more in real time. If a model's performance degrades, it can trigger alerts for further investigation.
Anomaly Detection: They can identify unusual behavior or patterns in the input data or model predictions. Anomalies can be signs of data issues or model problems.
Bias and Fairness Monitoring: They can assess whether the model is making fair predictions or assessments.
Concept Drift Detection: In dynamic environments, the relationships between variables can change over time. ML monitoring tools detect concept drift to adapt models accordingly.
Model Explainability: They can provide insights into why a model made a specific prediction and facilitate understanding of the model's decisions, especially in regulated industries.
Compliance Monitoring: They can help ensure that models comply with legal and industry-specific regulations.
The ML model monitoring and observability segment is relatively fragmented and undifferentiated, with a number of early stage and growth start-ups offering tools and guidance on how to track and report data quality and model performance in order to maximize the utility of the AI applications they power and mitigate legal, financial and compliance risk.
Given that it is an emerging segment that is minimally differentiated, there is a lot of potential for specialization and diversification of the product offerings within it. The tools featured here are: Arize AI, Evidently AI, Fiddler, Censius, Arthur, Truera, Aporia and WhyLabs.
Arize AI provides analytics and workflows to catch model issues, troubleshoot the root cause and continuously improve performance. Its features enable ML engineers to:
Monitor embedding drift for NLP, CV, LLM and generative models alongside tabular data
Vizualize models in interactive 2D and 3D to isolate problematic clusters for fine-tuning
Automatically monitor for model input and output drift
Trace which features contribute the most prediction drift impact to the model’s performance
Pinpoint clusters of problems in prompt/response pairs, find similar exmaples and resolve issues.
Create workflows for fine-tuning and prompt engineering
Surface worst-performing slices of predictions with heatmaps
A/B compare model versions, environments and time periods
Arize’s tool integrates with ML products across the value chain, including data storage tools, features stores, vector databases, LLM frameworks, model serving tools and foundation model providers.
Based in Berkeley, CA, Arize last raised a US$38M Series B in September 2022 led by TCV, with participation from Battery Ventures, Foundation Capital and Swift Ventures. Their previous rounds were a US$19M Series A in September 2021 and a US$4M Seed in February 2020.
Fiddler AI describes itself as the full stack AI observability platform. With support for both generative and predictive models, it offers tools to deliver high performance AI, reduce costs and increase ROI and ensure responsible governance. Its model monitoring features include performance and accuracy monitoring, data drift identification, prediction drift impact, class imbalance detection, observability of feature quality and updating of ground truth labels.
Based in Palo Alto, CA, Fiddler last raised a US$32M Series B in June 2021 from investors including Insight Partners, Lightspeed Venture Partners, Lux Capital, Bloomberg Beta and Haystack.
Evidently AI is “The open-source ML observability platform” to evaluate, test and monitor ML models from validation to production. It provides tools to:
Run exploratory analysis and profiling of data
Spot and solve nulls, duplicates and range violations in production pipelines
Track model features and ensure compliance with data quality KPIs
Catch shifts in predictions and input data distributions
Monitor the quality of model responses and data inputs
With HQ in San Francisco, CA, Evidently last raised a US$1.1M Seed round in July 2022.
Censius is “The AI Observability Platform for Enterprise ML Teams” which enables end-to-end visibility of structured and unstructured production models and a proactive approach toward model management to continuously deliver reliable ML.
Based in Austin, Texas, Censius offers automatic AI monitoring for model regressions, prediction explainability and dashboards to track performance metrics and issue alerts.
Aporia’s mission is to enable Responsible AI by tackling AI Hallucinations with Observability and to help companies certify that every AI product is transparent, compliant and aligned with business goals. Its features include ML dashboards to visualize and share model performance, live alerts to detect drift, bias and data integrity issues, production IR and explainability tools.
Based in San Jose, CA, Aporia last raised a US$25M Series A round from investors including Tiger Global Management, Samsung Next, Tal Capital, Vertex Ventures and TLV Partners.
Arthur provides an AI performance solution for LLMs, Computer Visions, Tabular Data and NLP. It allows ML engineers to become aware of issues with their models before they get to production, observe the impact of and improve the resiliency to model and system changes, and mitigate risk, ensure compliance and create safe, responsible and trustworthy AI. Besides their observability tools, the company offers Arthur Shield, a firewall for LLMs designed to protect organizations against the risks of deployed LLMs, including against PII or sensitive data leaks, toxic, offensive or problematic language generation and malicious prompts and prompt injection.
With HQ in New York, NY, Arthur last raised a US$42M Series B in September 2022 from VCs including Greycroft and Acrew Capital. Previous rounds include a US$15M Series A in December 2020.
TruEra is a full lifecycle AI observability platform with comprehensive, continuous monitoring, reporting and alerting of model performance, fast and accurate debugging, automated model testing and explainability features.
Based in Redwood City, CA, TruEra last raised a US$25M Series B funding round from investors including Greylock, Menlo Ventures, Wing Venture Capital, Harpoon Ventures, Conversion Capital and Forgepoint Capital.
WhyLabs’ tools enable observability to detect data and ML issues faster, deliver continuous improvements, and avoid costly incidents. They provide secure integration with the rest of the ML value chain, features to monitor model and data health and enterprise-grade security.
With HQ in Seattle, WA, WhyLabs last raised a US$10M Series A in November 2021 from VCs including Defy Partners and AI Fund.
Are you using or building an AI observability tool? Let me know so I can add it to my collection.