Steve Lorello (Redis), Ryan Mccormick (NVIDIA), and I recently wrote an article about how to use Redis as a cache for inference responses in NVIDIA Triton. This post links to the blog post and the accompanying code.


You can read the article here.

Code

The redis cache implementation for Triton is located here.