Agents for RAG
And the lessons:
Perhaps this is the end - for now - of function/tool calling, given agents perform better simply by writing code (and you executing it in a safe environment).
Embeddings are probably overrated now with agents, just because agents are better able to structure queries for keywords/bm25 (and it is more deterministic to find what they want).
Document retrieval with agents works best if you give them the tools humans would use.
Cheers, Ronan
Implementing Document Retrieval with SmolAgents
This video implements document retrieval using SmolAgents with two key tools: a search function and a section reader. The approach moves away from traditional function calling in favor of having agents write code directly.
Key Design Choices
The implementation uses BM25 search rather than embeddings for several reasons:
- More deterministic results
- Better suited for precise keyword matching
- Allows agents to craft specific search queries
- Simpler implementation without sacrificing performance
The system provides two core tools to agents:
1. BM25-based document search that returns relevant snippets
2. Section reader that retrieves complete document sections
Technical Implementation
The system processes documents through these steps:
1. Converts PDFs to Markdown format using PyMuPDF
2. Generates table of contents by parsing Markdown headers
3. Chunks documents while preserving section context
4. Computes BM25 statistics (term frequencies and document lengths)
5. Saves data structures to pickle files for quick loading
The search tool allows agents to:
- Specify number of snippets to retrieve (max 5)
- Get document titles and section headings
- See relevance scores for results
The section reader enables:
- Lookup of complete sections by document and section name
- Access to full table of contents
- Retrieval of surrounding context
Performance Considerations
Testing with different language models revealed:
- Claude performed most consistently for instruction following
- Gemini Flash required additional prompting to avoid function call syntax
- Context length grows with section reading, requiring management
- Input tokens averaged 20-25k for typical queries
Usage Example
A sample query "Summarize DeepSeek v2's pre-training" demonstrates the workflow:
1. Agent searches broadly using BM25
2. Reviews snippet results and table of contents
3. Requests specific pre-training section
4. Synthesizes comprehensive answer from full section text
The implementation emphasizes giving agents human-like information access patterns while maintaining precise control over document retrieval.