Adding Memory to LLMs and Agents

Read-only Memory

Feb 27, 2025

Read-only Memory

Adding memory to LLMs and agents let’s you do a few interesting things:

Create agents with infinite memory for past conversations.
Read documents in chunks.
Have the assistant remember user preferences.

Inspired by the MemGPT paper from over one year ago, I build an LLM system with read-only local memory - allocated within the context.

This is the first in a two part series - in the next part I’ll look at how to add read/write memory to store user preferences and allow for reading documents in chunks (actually, as Cursor does for reading long files).

Enjoyed the video or have Qs? Drop a comment below or reply to this email with a 🤙.

Cheers, Ronan

🛠 Explore Fine-tuning, Inference, Vision, Audio, and Evaluation Tools

💡 Consulting (Technical Assistance OR Market Insights)

🤝 Join the Trelis Team

💸 Grants Program

Adding Read-Only Memory to LLMs and LLM Agents

Large language models (LLMs) can be enhanced with memory systems that allow them to access information beyond their context window. This video examines how to implement a read-only memory system that enables an LLM to retrieve and reference past conversations.

Core Memory Architecture

The system consists of two main components:

Local memory: The LLM's context window, limited to a fixed token count (e.g., 2,500 tokens)
Disk memory: A database storing the complete conversation history

The local memory is further divided into:

System message (500 tokens)
Read-only retrieval block (500 tokens)
Recent chat history (remaining tokens)

Memory Management System

The implementation uses a First-In-First-Out (FIFO) approach:

Recent conversations stay in local memory until pushed out by newer ones
All conversations are saved to the disk database
The LLM can query past conversations through search commands

Search and Retrieval

The system implements a paginated search mechanism:

LLM issues search commands using XML-style tags: <fetch_memory>query</fetch_memory>
Results are returned in pages of 3 conversation turns
LLM can request subsequent pages using <fetch_memory_page>2</fetch_memory_page>
Search uses keyword matching (can be upgraded to BM25 or embeddings)

Technical Implementation Details

Context Management:

Total context = 2,500 tokens
System message = 500 tokens
Read-only memory = 500 tokens
User/Assistant messages = 250 tokens each
Remaining space allocated to recent chat history

Database Structure:

Conversations stored in JSON format
Each entry includes:
1. User message
2. Assistant response
3. Timestamp
4. Token count

Search Implementation:

Case-insensitive keyword matching
Results grouped into conversation turns
Pagination with 3 turns per page
Search state maintained for multi-page retrieval

Command Processing

The system processes LLM commands through:

Regular expression extraction of search queries
Token counting for context management
Result formatting and injection into read-only memory
Confirmation messages back to the LLM

Future Enhancements

The system can be improved through:

Postgres implementation with BM25 search
Vector embeddings for semantic search
Date-based retrieval
Rate limiting and throttling
Multi-user support
Authentication

Technical Requirements

The implementation requires:

Python environment
Anthropic's Claude API
JSON for storage (or Postgres for production)
Regular expressions for command parsing
Token counting utilities

This memory system enables LLMs to maintain conversational context beyond their standard context window while maintaining a clean, modular architecture that can be extended for more complex use cases.

Daniel Manzke

Feb 27

You are tackling an interesting topic which I hit every day too. Especially if I read the next “RAG is dead”.

The main problem we have right now when working with knowledge (especially in enterprise), you don’t want to send the document every time to the llm to answer one question.

That’s just horrible inefficient.

There must be a way that keeps the document close to the llm (like your disk idea), so I can send several questions including chat history.

You mentioned curse how they do it. Do you have any interesting links?

Expand full comment

3 replies by Trelis Research and others

3 more comments...

Trelis Research

Adding Memory to LLMs and Agents

Read-only Memory

Read-only Memory

Adding Read-Only Memory to LLMs and LLM Agents

Core Memory Architecture

Memory Management System

Search and Retrieval

Technical Implementation Details

Command Processing

Future Enhancements

Technical Requirements

Discussion about this post