Amazon SageMaker HyperPod now supports data capture for inference workloads
Amazon SageMaker HyperPod now supports data capture for inference workloads, a new capability that records inference request and response payloads from production endpoints to Amazon S3. Customers deploying generative AI models on HyperPod need visibility into model inputs and outputs to detect drift, troubleshoot production issues, build evaluation datasets, and continuously improve their deployed models, but previously had to build custom logging pipelines outside of the service to obtain this visibility.
With data capture, customers can train speculative decoding draft models from their real production traffic for better performance than generic draft models, build evaluation pipelines from production data, feed fine-tuning jobs with real-world inputs, and maintain audit trails for compliance. Customers choose where to capture inference traffic on each endpoint, at the SageMaker endpoint, the load balancer, or the model pod. Captured data is delivered asynchronously to their Amazon S3 bucket without blocking inference, and supports configurable sampling and customer-managed AWS KMS encryption. You can enable data capture when deploying models through the HyperPod Inference Operator, and use the captured data with Amazon SageMaker Model Monitor and your existing evaluation, fine-tuning, and draft-model training workflows.
This feature is available for SageMaker HyperPod clusters using the EKS orchestrator in all AWS Regions where Amazon SageMaker HyperPod is supported. To learn more, see Data capture for inference on HyperPod.