Steve Lorello (Redis), Ryan Mccormick (NVIDIA), and I recently wrote an article about how to use Redis as a cache for inference responses in NVIDIA Triton. This post links to the blog post and the accompanying code.

You can read the article here.


The redis cache implementation for Triton is located here.