A problem/solution analysis of total-recall — multi-tiered persistent memory for TUI coding assistants.
Every TUI coding assistant has the same gap. You spend an hour teaching Claude Code about your project's quirks — that the auth middleware was rewritten for compliance, that integration tests must hit a real database, that your team prefers bundled PRs for refactors — and then the session ends.
Next session: blank slate.
$ claude ╭───────────────────────────────────────────╮ │ Claude Code v1.0.23 │ │ Session started. │ │ │ │ ! No memory of previous sessions │ │ ! No project context loaded │ │ ! Previous corrections forgotten │ ╰───────────────────────────────────────────╯ > refactor the auth middleware Analyzing codebase... Mocking the database layer for testability. Opening separate PRs: auth-core, auth-utils, auth-tests. Note: removed legacy compliance check — looked unused.
The built-in memory systems are flat files. No tiering — every memory is treated equally, leading to context bloat or information loss. No semantic search — retrieval is by filename, not meaning. Switch from Claude Code to Copilot CLI? Start from scratch. No observability to know if memory is even helping.
total-recall introduces a three-tier memory architecture backed by local SQLite and vector embeddings. Memories aren't flat files — they're ranked, decayed, promoted, and searched semantically.
Auto-injected into every prompt. Your most critical context, always present. Token-budgeted to 4K tokens.
Retrieved semantically per query. Ask about auth — auth memories surface automatically.
Hierarchical knowledge base. Ingest docs, READMEs, architecture notes, entire source trees.
Here's what the same session looks like with total-recall loaded:
$ claude ╭──────────────────────────────────────────────────────╮ │ Claude Code v1.0.23 │ │ total-recall loaded — 3 hot, 12 warm, 847 KB chunks │ │ Last session: 2 hours ago │ │ │ │ Hot context: │ │ [correction] auth rewrite was compliance-driven │ │ [preference] integration tests must use real DB │ │ [preference] bundled PRs for refactor changes │ ╰──────────────────────────────────────────────────────╯ > refactor the auth middleware Recalling: auth rewrite was compliance-driven — preserving all compliance checks. Using real DB for integration tests as required. Bundling auth-core, auth-utils, and auth-tests into a single PR.
Every memory is vectorized on write using all-MiniLM-L6-v2 — a sentence transformer that produces 384-dimensional embeddings. It runs locally via ONNX Runtime. No API keys, no network calls, no cloud dependency. The model ships bundled with the package.
Storage is SQLite with the sqlite-vec extension for vector similarity search. A single file at ~/.total-recall/total-recall.db holds everything — memories, embeddings, knowledge chunks, compaction logs, and eval metrics.
Retrieval combines two signals: vector similarity (cosine distance in 384-dimensional space) and full-text search (BM25 via SQLite FTS5). The scores are fused with a configurable weight — by default, 70% semantic similarity + 30% keyword match. This catches both conceptual and exact-match queries.
Query: "how does authentication work?" Vector search --- embed(query) -> cosine similarity -> top K | FTS5 search --- tokenize(query) -> BM25 ranking -> top K | Score fusion --- 0.7 * vector + 0.3 * fts -> ranked results
Memories aren't permanent — they decay. The decay formula combines three factors:
1 + log2(1 + access_count) — frequently accessed memories resist decay.At session end, compaction runs: hot entries scoring below 0.7 demote to warm. Warm entries scoring below 0.3 demote to cold. Cold entries that become relevant again promote back up. The system self-tunes.
total-recall speaks MCP (Model Context Protocol) — the emerging standard for tool-to-model communication. Any MCP-compatible coding assistant can use it. But the real win is the import system.
On first session_start, total-recall scans for existing memories across six platforms and migrates them automatically:
Each importer knows where its host tool stores memories — ~/.claude/projects/*/memory/*.md for Claude Code, .cursorrules for Cursor, SQLite databases for Cline. Content hashes prevent duplicate imports. Switch tools freely; your memory follows.
session_start 1. Initialize embedder ............. ok 2. Import: Claude Code ............. 3 new 3. Import: Copilot CLI ............. 1 new 4. Import: Cursor ............. skipped 5. Warm sweep ............. 2 demoted 6. Project docs ingest ............. 12 chunks 7. Smoke test ............. 22/22 pass 8. Hot tier assembly ............. 3 entries, 1.2K tokens
Most memory systems are fire-and-forget. You store things, hope they come back, and have no way to measure if retrieval is working. total-recall ships a full eval framework.
A 139-query benchmark suite runs on version changes to validate retrieval quality. Each query has expected results — both what should surface and what shouldn't. The system tracks precision, hit rate, mean reciprocal rank, and per-tier routing accuracy.
/total-recall eval Retrieval Quality (7-day rolling) ---------------------------------------------- Precision 0.94 (target: 0.85) Hit rate 0.91 (target: 0.80) MRR 0.88 (target: 0.75) Avg latency 12ms ---------------------------------------------- Per-Tier Breakdown ---------------------------------------------- Hot 3 entries precision 1.00 Warm 12 entries precision 0.92 Cold 847 chunks precision 0.89 ---------------------------------------------- Regression Detection ---------------------------------------------- vs. previous config no regressions
Config snapshots enable A/B comparison — change a threshold, run the benchmark, compare metrics side-by-side. Retrieval misses are captured as benchmark candidates, so the test suite grows organically from real-world failures. The eval system improves itself.
For Claude Code users — one command:
/plugin install total-recall@strvmarv-total-recall-marketplace
For any MCP-compatible tool (Copilot CLI, Cursor, Cline, OpenCode, Hermes):
npm install -g @strvmarv/total-recall
Then add it to your tool's MCP config:
{ "mcpServers": { "total-recall": { "command": "total-recall" } } }
Or paste this into any AI coding assistant and it will install itself:
Install the total-recall memory plugin: fetch and follow the instructions at INSTALL.md
On first session, total-recall initializes its database, imports your existing memories, and starts working. No configuration required — the defaults are tuned from the eval benchmark.
Source: github.com/strvmarv/total-recall · npm: @strvmarv/total-recall