Slides for NVIDIA GTC Talk: Improving Data Systems for Merlin and Triton

Status: Done

Abstract

Learn how to use Redis as high-performance data storage for Triton and Merlin inference pipelines. Data movement can significantly impact the latency of machine learning inference. As deployments shift from offline batch inference to online inference to create real-time systems, fast and reliable data access is crucial. We’ll show you how to improve data latencies for real-time inference with Redis and NVIDIA SDKs.

Specifically, we’ll cover these three areas:

How to deploy and scale Merlin recommendation system model storage;
How to benefit from intelligent inference response caching with NVIDIA Triton; and
How to construct real-time AI inference systems using an online feature store with Triton.

Slides

You can download the slides the the talk here.

About Sam Partee

I am the author of this blog and a Principal AI Engineer at Redis. Previously, I worked at Cray/HPE working on AI applications in high performance computing.

San Francisco, California partee.io