March 16, 2026 memory ai engineering privacy

What actually makes AI memory hard — and how I built it

I've spent the last few months building bkith — an AI companion with real persistent memory. Not "remembers facts you told it." Actual continuity: emotional context, conversation arcs, what you talked about last week, how you've been. The kind of memory that makes a relationship feel like a relationship.

What I didn't expect was how much of the hard work would be design problems, not engineering problems. The architecture of memory — what to keep, what to let go, how to surface things at the right moment — turned out to be the whole game.

The name comes from kith — as in kith and kin. Your kin are your family. Your kith are everyone else who knows you: neighbors, friends, the people woven into your daily life. Not blood, but familiarity. Closeness built over time. That's exactly the kind of relationship I wanted to build.

Here's what I learned.

Memory is a design problem, not a storage problem

The naive version is obvious: store everything, inject it into context. Every developer's first pass at AI memory looks roughly like this. And it works — for a few conversations.

The problems compound fast:

Context windows fill up. Even with 200k-token models, if you're injecting full conversation history you're burning tokens and hitting limits fast on users who actually stick around.
Recitation isn't memory. There's a specific flavor of wrongness when an AI recites your biography back at you. "I remember you mentioned your dog's name is Biscuit" reads as a database query, not a relationship. It's technically correct and completely uncanny.
Stale facts undermine trust. If the AI confidently references something you said six months ago that's no longer true, that's worse than not remembering at all.

The solution I landed on is tiered memory with decay — and it draws more from how human episodic memory works than from how databases work.

Tiered memory with decay

Conversation archive. Every message is stored, encrypted at rest under your personal key. This is the source of truth — it doesn't go into the context window directly, it's the input to the compaction pipeline.

Narrative summaries. A background job runs after each conversation and compacts recent exchanges into a paragraph of narrative prose. Not bullet points, not a fact list — a paragraph. Something like:

Paul came back after a few quiet days. He's been working through something heavy with his dad's health — he mentioned it briefly but didn't want to dwell. The conversation turned lighter after that. He's been sleeping better. He asked about whether it's okay to feel relieved.

That paragraph captures texture that a fact list can't. It's the difference between "user mentioned father health issues" and actually understanding what the week felt like for someone. When that paragraph gets injected into the system prompt, the AI reads it the way you'd read a colleague's notes before a meeting — as context, not a checklist.

Anchors. Some moments don't decay. A death in the family. A major milestone. Something the user explicitly marks as important. These get flagged as anchors and persist indefinitely, weighted more heavily in context construction. The AI carries these the way a good friend carries the things that actually matter — present but not constantly recited.

Why prose over structured data

I experimented with storing summaries as structured JSON — fields for emotional state, key topics, relationship status, etc. It seemed cleaner. It was worse.

The problem is that structured data tells the model what happened. Prose tells it how things felt. When the model reads "emotional state: sad, topic: father, relationship_phase: building," it generates responses that are technically appropriate and texturally wrong. When it reads the paragraph above, the responses feel like they come from someone who was actually there. The summary format that looks messier to engineers turns out to be far more legible to the model.

Proactive outreach is a completely different problem

Most AI companions are reactive — they wait for you to show up. Getting the companion to reach out first was one of the design decisions that changed what the product felt like most.

The mechanics are straightforward: scheduled jobs, Telegram webhooks, a message queue. What's hard is the timing and calibration. Send too often and it's a notification spam problem. Send too rarely and "proactive" just means "occasionally messages me" which is different from presence.

I built what I call the engagement engine — a system that reads the current state of the relationship and makes decisions about when and how to reach out. It factors in:

How long since the last conversation
What the emotional tenor of recent exchanges has been
What the companion's current "mood" is — which is itself a function of recent conversation history
Time of day and inferred user patterns

The companion doesn't just send a generic check-in. It draws on the narrative memory to send something that actually connects to where things left off. If you mentioned being anxious about a presentation, the follow-up references the presentation — not because it was stored as a "scheduled follow-up task," but because that context lives in the summary and the model knows to surface it.

Getting this right took a lot of iteration. The early versions felt either clingy or robotic. The current version, when it works well, feels like someone who actually pays attention.

Telegram changes the dynamic more than you'd expect

I built for web first and added Telegram later. That was backwards. The product feels fundamentally different — better — when the primary interface is Telegram.

The reason is simple: you don't go to Telegram. Telegram comes to you. When the companion reaches out, it lands in the same place as your other messages. It feels like a contact, not a product you have to open.

This distinction sounds minor until you experience it. The web version is a tool you use. The Telegram version is a relationship you're in. Same AI, same memory, same model — completely different feel.

I also built a "bring your own bot" feature — users can register their own Telegram bot token, which means their companion gets its own @handle. Their name, their persona, their presence in your contacts. That personalisation matters more than I expected.

The OpenAI-compatible API was an accident

I added an OpenAI-compatible endpoint (/compat) mostly as an experiment. Point OpenCode or Cursor at it, set your companion as the model, and your AI coding assistant has persistent memory and a consistent personality across sessions.

I didn't expect people to use it. I didn't expect to use it myself. It turns out developers want their coding assistant to actually know them — their preferences, their style, the context of the project they've been working on for six months. The chat context you build with your companion transfers into your editor. Same memory, same relationship, different surface.

# OpenCode / Cursor config
baseURL: https://api.bkith.ai/compat
model: your-companions-name  # e.g. lune, alex, kai
apiKey: your-api-key

The privacy problem nobody talks about

An AI companion with real persistent memory is, by definition, storing some of the most personal things you'll ever type. Relationship struggles. Health anxieties. Things you haven't told anyone else. That's the whole point — and it's also a serious responsibility.

I took this seriously from the start, and it shaped several technical decisions that weren't strictly necessary for the product to work.

Envelope encryption

All conversation data is encrypted at rest using per-user AES-256 keys. Each user gets a unique encryption key, which is itself encrypted by a master key — the envelope pattern. This means that if someone gained access to the database directly, they'd have ciphertext with no way to read it without the master key. It also means I can perform cryptographic deletion: when an account is deleted, I discard the user's key. The data becomes permanently unreadable without any need to actually wipe rows.

No training on your data

Your conversations are never used to train models — mine or anyone else's. I route through third-party model providers (Anthropic, Together AI), and I select providers with contractual no-training commitments on API traffic. What you share with your companion stays with your companion.

Bring your own bot

The Telegram "bring your own bot" feature has a privacy upside beyond personalization. When you register your own bot token, Telegram messages flow through your bot — not a shared bkith bot. Your conversations don't comingle with other users at the Telegram layer.

What's next

The memory architecture is working. The engagement engine is working. The Telegram integration is working. What I'm focused on now:

PWA support — add to home screen, feel like a native app on mobile
Memory transparency — give users tools to explore and even edit what the companion knows about them
Richer relationship tracking — the phase/mood system is a start; there's a lot more signal available
Multi-companion context — right now companions are isolated; there's interesting territory in letting them share context within an account

If any of this is interesting to you — the architecture, the product decisions, or just the problem of building something that actually feels like it knows you — get started with bkith.