Agents for Retrieval - Three lessons learned!

Three Lessons Learned

Trelis Research

Jan 27, 2025

Agents for RAG

And the lessons:

Perhaps this is the end - for now - of function/tool calling, given agents perform better simply by writing code (and you executing it in a safe environment).
Embeddings are probably overrated now with agents, just because agents are better able to structure queries for keywords/bm25 (and it is more deterministic to find what they want).
Document retrieval with agents works best if you give them the tools humans would use.

Cheers, Ronan

Trelis.com

Implementing Document Retrieval with SmolAgents

This video implements document retrieval using SmolAgents with two key tools: a search function and a section reader. The approach moves away from traditional function calling in favor of having agents write code directly.

Key Design Choices

The implementation uses BM25 search rather than embeddings for several reasons:

- More deterministic results

- Better suited for precise keyword matching

- Allows agents to craft specific search queries

- Simpler implementation without sacrificing performance

The system provides two core tools to agents:

1. BM25-based document search that returns relevant snippets

2. Section reader that retrieves complete document sections

Technical Implementation

The system processes documents through these steps:

1. Converts PDFs to Markdown format using PyMuPDF

2. Generates table of contents by parsing Markdown headers

3. Chunks documents while preserving section context

4. Computes BM25 statistics (term frequencies and document lengths)

5. Saves data structures to pickle files for quick loading

The search tool allows agents to:

- Specify number of snippets to retrieve (max 5)

- Get document titles and section headings

- See relevance scores for results

The section reader enables:

- Lookup of complete sections by document and section name

- Access to full table of contents

- Retrieval of surrounding context

Performance Considerations

Testing with different language models revealed:

- Claude performed most consistently for instruction following

- Gemini Flash required additional prompting to avoid function call syntax

- Context length grows with section reading, requiring management

- Input tokens averaged 20-25k for typical queries

Usage Example

A sample query "Summarize DeepSeek v2's pre-training" demonstrates the workflow:

1. Agent searches broadly using BM25

2. Reviews snippet results and table of contents

3. Requests specific pre-training section

4. Synthesizes comprehensive answer from full section text

The implementation emphasizes giving agents human-like information access patterns while maintaining precise control over document retrieval.