Status: Done


Abstract

Learn how to use Redis as high-performance data storage for Triton and Merlin inference pipelines. Data movement can significantly impact the latency of machine learning inference. As deployments shift from offline batch inference to online inference to create real-time systems, fast and reliable data access is crucial. We’ll show you how to improve data latencies for real-time inference with Redis and NVIDIA SDKs.

Specifically, we’ll cover these three areas:

  • How to deploy and scale Merlin recommendation system model storage;
  • How to benefit from intelligent inference response caching with NVIDIA Triton; and
  • How to construct real-time AI inference systems using an online feature store with Triton.

Slides

You can download the slides the the talk here.