Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)

Large context windows aren't the answer. Learn three vector search patterns to slash your LLM costs and latency.

Reducing LLM Calls with Vector Search Patterns - Raphael De Lio (Redis)
#1about 3 minutes

The hidden costs of large LLM context windows

Large context windows in models like GPT-5 seem to eliminate the need for RAG, but the high token cost makes this approach expensive and unscalable for every request.

#2about 3 minutes

A brief introduction to vectors and vector search

Text is converted into numerical vector embeddings that capture its semantic meaning, allowing computers to efficiently calculate the similarity between different phrases or documents.

#3about 9 minutes

How to classify text using a vector database

Instead of using a costly LLM for every classification task, you can use a vector database to match new text against pre-embedded reference examples for a specific label.

#4about 5 minutes

Using semantic routing for efficient tool calling

By matching user prompts against pre-defined reference phrases for each tool, you can directly trigger the correct function without an initial, expensive LLM call.

#5about 5 minutes

Reducing latency and cost with semantic caching

Semantic caching stores LLM responses and serves them for new, semantically similar prompts, which avoids re-computation and significantly reduces both cost and latency.

#6about 7 minutes

Strategies for optimizing vector search accuracy

Improve the accuracy of vector search patterns through techniques like self-improvement, a hybrid approach that falls back to an LLM, and chunking complex prompts into smaller clauses.

#7about 3 minutes

Addressing advanced challenges in semantic caching

Mitigate common caching pitfalls, like misinterpreting negative prompts, by using specialized embedding models and combining semantic routing with caching to avoid caching certain types of queries.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
AB
Andrew Brookins, Raphael De Lio
Introducing Redis Agent Memory Server
One of the biggest challenges AI engineers face today is memory management. Large language models are stateless. Every request must carry all the information the model needs to respond well. This is the art of context engineering. Building agents is ...
Introducing Redis Agent Memory Server
DC
Daniel Cranney
Dev Digest 210: AI Agents Are Go! Is MCP Dead? LLMs Crack Anonymity
Inside last week’s Dev Digest 210 . 🪦 Is MCP already dead? 🐍 Secure snake on the CLI 🏗️ The architecture behind open source LLMs ⚖️ AI companies and governments at odds 🦫 Is Go the best language for AI agents? 🕵️ “Security research” bot hacks Micros...
Dev Digest 210: AI Agents Are Go! Is MCP Dead? LLMs Crack Anonymity

From learning to earning

Jobs that call for the skills explored in this talk.