How to Build a Distributed Inference Cache with NVIDIA Triton and Redis

by Sam Partee on September 1, 2023 | Technical Rating: 8

Steve Lorello (Redis), Ryan Mccormick (NVIDIA), and I recently wrote an article about how to use Redis as a cache for inference responses in NVIDIA Triton. This post links to the blog post and the accompanying code.

You can read the article here.

Code

The redis cache implementation for Triton is located here.

About Sam Partee

I am the author of this blog and a Principal AI Engineer at Redis. Previously, I worked at Cray/HPE working on AI applications in high performance computing.

San Francisco, California partee.io

Code

About Sam Partee

Comments