827 lines
		
	
	
		
			23 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			827 lines
		
	
	
		
			23 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# LLMemory Architecture
 | 
						|
 | 
						|
## System Overview
 | 
						|
 | 
						|
LLMemory is a three-layer system:
 | 
						|
1. **CLI Layer** - User/agent interface (Commander.js)
 | 
						|
2. **Search Layer** - Query processing and ranking (LIKE → FTS5 → Fuzzy)
 | 
						|
3. **Storage Layer** - Persistent data (SQLite)
 | 
						|
 | 
						|
```
 | 
						|
┌─────────────────────────────────────┐
 | 
						|
│         CLI Layer                   │
 | 
						|
│  (memory store/search/list/prune)   │
 | 
						|
└──────────────┬──────────────────────┘
 | 
						|
               │
 | 
						|
┌──────────────▼──────────────────────┐
 | 
						|
│         Search Layer                │
 | 
						|
│  Phase 1: LIKE search               │
 | 
						|
│  Phase 2: FTS5 + BM25 ranking       │
 | 
						|
│  Phase 3: + Trigram fuzzy matching  │
 | 
						|
└──────────────┬──────────────────────┘
 | 
						|
               │
 | 
						|
┌──────────────▼──────────────────────┐
 | 
						|
│         Storage Layer               │
 | 
						|
│  SQLite Database                    │
 | 
						|
│  - memories (content, metadata)     │
 | 
						|
│  - tags (normalized)                │
 | 
						|
│  - memory_tags (many-to-many)       │
 | 
						|
│  - memories_fts (FTS5 virtual)      │
 | 
						|
│  - trigrams (fuzzy index)           │
 | 
						|
└─────────────────────────────────────┘
 | 
						|
```
 | 
						|
 | 
						|
## Data Model
 | 
						|
 | 
						|
### Phase 1 Schema (MVP)
 | 
						|
 | 
						|
```
 | 
						|
┌─────────────────┐
 | 
						|
│    memories     │
 | 
						|
├─────────────────┤
 | 
						|
│ id              │ PK
 | 
						|
│ content         │ TEXT (max 10KB)
 | 
						|
│ created_at      │ INTEGER (Unix timestamp)
 | 
						|
│ entered_by      │ TEXT (agent name)
 | 
						|
│ expires_at      │ INTEGER (nullable)
 | 
						|
└─────────────────┘
 | 
						|
         │
 | 
						|
         │ 1:N
 | 
						|
         ▼
 | 
						|
┌─────────────────┐       ┌─────────────────┐
 | 
						|
│  memory_tags    │ N:M   │      tags       │
 | 
						|
├─────────────────┤       ├─────────────────┤
 | 
						|
│ memory_id       │ FK ───│ id              │ PK
 | 
						|
│ tag_id          │ FK    │ name            │ TEXT (unique, NOCASE)
 | 
						|
└─────────────────┘       │ created_at      │ INTEGER
 | 
						|
                          └─────────────────┘
 | 
						|
```
 | 
						|
 | 
						|
### Phase 2 Schema (+ FTS5)
 | 
						|
 | 
						|
Adds virtual table for full-text search:
 | 
						|
 | 
						|
```
 | 
						|
┌─────────────────────┐
 | 
						|
│   memories_fts      │ Virtual Table (FTS5)
 | 
						|
├─────────────────────┤
 | 
						|
│ rowid → memories.id │
 | 
						|
│ content (indexed)   │
 | 
						|
└─────────────────────┘
 | 
						|
         │
 | 
						|
         │ Synced via triggers
 | 
						|
         ▼
 | 
						|
┌─────────────────┐
 | 
						|
│    memories     │
 | 
						|
└─────────────────┘
 | 
						|
```
 | 
						|
 | 
						|
**Triggers:**
 | 
						|
- `memories_ai`: INSERT into memories → INSERT into memories_fts
 | 
						|
- `memories_au`: UPDATE memories → UPDATE memories_fts
 | 
						|
- `memories_ad`: DELETE memories → DELETE from memories_fts
 | 
						|
 | 
						|
### Phase 3 Schema (+ Trigrams)
 | 
						|
 | 
						|
Adds trigram index for fuzzy matching:
 | 
						|
 | 
						|
```
 | 
						|
┌─────────────────┐
 | 
						|
│    trigrams     │
 | 
						|
├─────────────────┤
 | 
						|
│ trigram         │ TEXT (3 chars)
 | 
						|
│ memory_id       │ FK → memories.id
 | 
						|
│ position        │ INTEGER (for proximity)
 | 
						|
└─────────────────┘
 | 
						|
         │
 | 
						|
         │ Generated on insert/update
 | 
						|
         ▼
 | 
						|
┌─────────────────┐
 | 
						|
│    memories     │
 | 
						|
└─────────────────┘
 | 
						|
```
 | 
						|
 | 
						|
## Search Algorithm Evolution
 | 
						|
 | 
						|
### Phase 1: LIKE Search
 | 
						|
 | 
						|
**Algorithm:**
 | 
						|
```python
 | 
						|
function search_like(query, filters):
 | 
						|
  # Case-insensitive wildcard matching
 | 
						|
  sql = "SELECT * FROM memories WHERE LOWER(content) LIKE LOWER('%' || ? || '%')"
 | 
						|
  
 | 
						|
  # Apply filters
 | 
						|
  if filters.tags:
 | 
						|
    sql += " AND memory_id IN (SELECT memory_id FROM memory_tags WHERE tag_id IN (...))"
 | 
						|
  
 | 
						|
  if filters.after:
 | 
						|
    sql += " AND created_at >= ?"
 | 
						|
  
 | 
						|
  # Exclude expired
 | 
						|
  sql += " AND (expires_at IS NULL OR expires_at > now())"
 | 
						|
  
 | 
						|
  # Order by recency
 | 
						|
  sql += " ORDER BY created_at DESC LIMIT ?"
 | 
						|
  
 | 
						|
  return execute(sql, params)
 | 
						|
```
 | 
						|
 | 
						|
**Strengths:**
 | 
						|
- Simple, fast for small datasets
 | 
						|
- No dependencies
 | 
						|
- Predictable behavior
 | 
						|
 | 
						|
**Weaknesses:**
 | 
						|
- No relevance ranking
 | 
						|
- Slow for large datasets (full table scan)
 | 
						|
- No fuzzy matching
 | 
						|
- No phrase queries or boolean logic
 | 
						|
 | 
						|
**Performance:** O(n) where n = number of memories  
 | 
						|
**Target:** <50ms for <500 memories
 | 
						|
 | 
						|
---
 | 
						|
 | 
						|
### Phase 2: FTS5 Search
 | 
						|
 | 
						|
**Algorithm:**
 | 
						|
```python
 | 
						|
function search_fts5(query, filters):
 | 
						|
  # Build FTS5 query
 | 
						|
  fts_query = build_fts5_query(query)  # Handles AND/OR/NOT, quotes, prefixes
 | 
						|
  
 | 
						|
  # FTS5 MATCH with BM25 ranking
 | 
						|
  sql = """
 | 
						|
    SELECT m.*, mf.rank as relevance
 | 
						|
    FROM memories_fts mf
 | 
						|
    JOIN memories m ON m.id = mf.rowid
 | 
						|
    WHERE memories_fts MATCH ?
 | 
						|
      AND (m.expires_at IS NULL OR m.expires_at > now())
 | 
						|
  """
 | 
						|
  
 | 
						|
  # Apply filters (same as Phase 1)
 | 
						|
  # ...
 | 
						|
  
 | 
						|
  # Order by FTS5 rank (BM25 algorithm)
 | 
						|
  sql += " ORDER BY mf.rank LIMIT ?"
 | 
						|
  
 | 
						|
  return execute(sql, params)
 | 
						|
 | 
						|
function build_fts5_query(query):
 | 
						|
  # Transform grep-like to FTS5
 | 
						|
  # "docker compose" → "docker AND compose"
 | 
						|
  # "docker OR podman" → "docker OR podman" (unchanged)
 | 
						|
  # '"exact phrase"' → '"exact phrase"' (unchanged)
 | 
						|
  # "docker*" → "docker*" (unchanged)
 | 
						|
  
 | 
						|
  if has_operators(query):
 | 
						|
    return query
 | 
						|
  
 | 
						|
  # Implicit AND
 | 
						|
  terms = query.split()
 | 
						|
  return " AND ".join(terms)
 | 
						|
```
 | 
						|
 | 
						|
**FTS5 Tokenization:**
 | 
						|
- **Tokenizer:** `porter unicode61 remove_diacritics 2`
 | 
						|
- **Porter:** Stemming (running → run, databases → database)
 | 
						|
- **unicode61:** Unicode support
 | 
						|
- **remove_diacritics:** Normalize accented characters (café → cafe)
 | 
						|
 | 
						|
**BM25 Ranking:**
 | 
						|
```
 | 
						|
score = Σ(IDF(term) * (f(term) * (k1 + 1)) / (f(term) + k1 * (1 - b + b * |D| / avgdl)))
 | 
						|
 | 
						|
Where:
 | 
						|
- IDF(term) = Inverse Document Frequency (rarer terms score higher)
 | 
						|
- f(term) = Term frequency in document
 | 
						|
- |D| = Document length
 | 
						|
- avgdl = Average document length
 | 
						|
- k1 = 1.2 (term frequency saturation)
 | 
						|
- b = 0.75 (length normalization)
 | 
						|
```
 | 
						|
 | 
						|
**Strengths:**
 | 
						|
- Fast search with inverted index
 | 
						|
- Relevance ranking (BM25)
 | 
						|
- Boolean operators, phrase queries, prefix matching
 | 
						|
- Scales to 100K+ documents
 | 
						|
 | 
						|
**Weaknesses:**
 | 
						|
- No fuzzy matching (typo tolerance)
 | 
						|
- FTS5 index overhead (~30% storage)
 | 
						|
- More complex setup (triggers needed)
 | 
						|
 | 
						|
**Performance:** O(log n) for index lookup  
 | 
						|
**Target:** <100ms for 10K memories
 | 
						|
 | 
						|
---
 | 
						|
 | 
						|
### Phase 3: Fuzzy Search
 | 
						|
 | 
						|
**Algorithm:**
 | 
						|
```python
 | 
						|
function search_fuzzy(query, filters):
 | 
						|
  # Step 1: Try FTS5 exact match
 | 
						|
  results = search_fts5(query, filters)
 | 
						|
  
 | 
						|
  # Step 2: If too few results, try fuzzy
 | 
						|
  if len(results) < 5 and filters.fuzzy != false:
 | 
						|
    fuzzy_results = search_trigram(query, filters)
 | 
						|
    results = merge_dedup(results, fuzzy_results)
 | 
						|
  
 | 
						|
  # Step 3: Re-rank by combined score
 | 
						|
  for result in results:
 | 
						|
    result.score = calculate_combined_score(result, query)
 | 
						|
  
 | 
						|
  results.sort(by=lambda r: r.score, reverse=True)
 | 
						|
  return results[:filters.limit]
 | 
						|
 | 
						|
function search_trigram(query, threshold=0.7, limit=10):
 | 
						|
  # Extract query trigrams
 | 
						|
  query_trigrams = extract_trigrams(query)  # ["doc", "ock", "cke", "ker"]
 | 
						|
  
 | 
						|
  # Find candidates by trigram overlap
 | 
						|
  sql = """
 | 
						|
    SELECT m.id, m.content, COUNT(DISTINCT tr.trigram) as matches
 | 
						|
    FROM memories m
 | 
						|
    JOIN trigrams tr ON tr.memory_id = m.id
 | 
						|
    WHERE tr.trigram IN (?, ?, ?, ...)
 | 
						|
      AND (m.expires_at IS NULL OR m.expires_at > now())
 | 
						|
    GROUP BY m.id
 | 
						|
    HAVING matches >= ?
 | 
						|
    ORDER BY matches DESC
 | 
						|
    LIMIT ?
 | 
						|
  """
 | 
						|
  
 | 
						|
  min_matches = ceil(len(query_trigrams) * threshold)
 | 
						|
  candidates = execute(sql, query_trigrams, min_matches, limit * 2)
 | 
						|
  
 | 
						|
  # Calculate edit distance and combined score
 | 
						|
  scored = []
 | 
						|
  for candidate in candidates:
 | 
						|
    edit_dist = levenshtein(query, candidate.content[:len(query)*3])
 | 
						|
    trigram_sim = candidate.matches / len(query_trigrams)
 | 
						|
    normalized_edit = 1 - (edit_dist / max(len(query), len(candidate.content)))
 | 
						|
    
 | 
						|
    score = 0.6 * trigram_sim + 0.4 * normalized_edit
 | 
						|
    
 | 
						|
    if score >= threshold:
 | 
						|
      scored.append((candidate, score))
 | 
						|
  
 | 
						|
  scored.sort(by=lambda x: x[1], reverse=True)
 | 
						|
  return [c for c, s in scored[:limit]]
 | 
						|
 | 
						|
function extract_trigrams(text):
 | 
						|
  # Normalize: lowercase, remove punctuation, collapse whitespace
 | 
						|
  normalized = text.lower().replace(/[^\w\s]/g, ' ').replace(/\s+/g, ' ').trim()
 | 
						|
  
 | 
						|
  # Add padding for boundary matching
 | 
						|
  padded = "  " + normalized + "  "
 | 
						|
  
 | 
						|
  # Sliding window of 3 characters
 | 
						|
  trigrams = []
 | 
						|
  for i in range(len(padded) - 2):
 | 
						|
    trigram = padded[i:i+3]
 | 
						|
    if trigram.strip().len() == 3:  # Skip whitespace-only
 | 
						|
      trigrams.append(trigram)
 | 
						|
  
 | 
						|
  return unique(trigrams)
 | 
						|
 | 
						|
function levenshtein(a, b):
 | 
						|
  # Wagner-Fischer algorithm with single-row optimization
 | 
						|
  if len(a) == 0: return len(b)
 | 
						|
  if len(b) == 0: return len(a)
 | 
						|
  
 | 
						|
  prev_row = [0..len(b)]
 | 
						|
  
 | 
						|
  for i in range(len(a)):
 | 
						|
    cur_row = [i + 1]
 | 
						|
    for j in range(len(b)):
 | 
						|
      cost = 0 if a[i] == b[j] else 1
 | 
						|
      cur_row.append(min(
 | 
						|
        cur_row[j] + 1,        # deletion
 | 
						|
        prev_row[j + 1] + 1,   # insertion
 | 
						|
        prev_row[j] + cost     # substitution
 | 
						|
      ))
 | 
						|
    prev_row = cur_row
 | 
						|
  
 | 
						|
  return prev_row[len(b)]
 | 
						|
 | 
						|
function calculate_combined_score(result, query):
 | 
						|
  # BM25 from FTS5 (if available)
 | 
						|
  bm25_score = result.fts_rank if result.has_fts_rank else 0
 | 
						|
  
 | 
						|
  # Trigram similarity
 | 
						|
  trigram_score = result.trigram_matches / len(extract_trigrams(query))
 | 
						|
  
 | 
						|
  # Edit distance (normalized)
 | 
						|
  edit_dist = levenshtein(query, result.content[:len(query)*3])
 | 
						|
  edit_score = 1 - (edit_dist / max(len(query), len(result.content)))
 | 
						|
  
 | 
						|
  # Recency boost (exponential decay over 90 days)
 | 
						|
  days_ago = (now() - result.created_at) / 86400
 | 
						|
  recency_score = max(0, 1 - (days_ago / 90))
 | 
						|
  
 | 
						|
  # Weighted combination
 | 
						|
  score = (0.4 * bm25_score +
 | 
						|
           0.3 * trigram_score +
 | 
						|
           0.2 * edit_score +
 | 
						|
           0.1 * recency_score)
 | 
						|
  
 | 
						|
  return score
 | 
						|
```
 | 
						|
 | 
						|
**Trigram Similarity (Jaccard Index):**
 | 
						|
```
 | 
						|
similarity = |trigrams(query) ∩ trigrams(document)| / |trigrams(query)|
 | 
						|
 | 
						|
Example:
 | 
						|
query = "docker"     → trigrams: ["doc", "ock", "cke", "ker"]
 | 
						|
document = "dcoker"  → trigrams: ["dco", "cok", "oke", "ker"]
 | 
						|
 | 
						|
intersection = ["ker"] → count = 1
 | 
						|
similarity = 1 / 4 = 0.25 (below threshold, but edit distance is 2)
 | 
						|
 | 
						|
Better approach: Edit distance normalized by length
 | 
						|
edit_distance("docker", "dcoker") = 2
 | 
						|
normalized = 1 - (2 / 6) = 0.67 (above threshold 0.6)
 | 
						|
```
 | 
						|
 | 
						|
**Strengths:**
 | 
						|
- Handles typos (edit distance ≤2)
 | 
						|
- Partial matches ("docker" finds "dockerization")
 | 
						|
- Cascading strategy (fast exact, fallback to fuzzy)
 | 
						|
- Configurable threshold
 | 
						|
 | 
						|
**Weaknesses:**
 | 
						|
- Trigram table is large (~3x content size)
 | 
						|
- Slower than FTS5 alone
 | 
						|
- Tuning threshold requires experimentation
 | 
						|
 | 
						|
**Performance:** O(log n) + O(m) where m = trigram candidates  
 | 
						|
**Target:** <200ms for 10K memories with fuzzy
 | 
						|
 | 
						|
---
 | 
						|
 | 
						|
## Memory Lifecycle
 | 
						|
 | 
						|
```
 | 
						|
┌──────────┐
 | 
						|
│  Store   │
 | 
						|
└────┬─────┘
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
┌────────────────────┐
 | 
						|
│  Validate:         │
 | 
						|
│  - Length (<10KB)  │
 | 
						|
│  - Tags (parse)    │
 | 
						|
│  - Expiration      │
 | 
						|
└────┬───────────────┘
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
┌────────────────────┐     ┌─────────────────┐
 | 
						|
│  Insert:           │────▶│  Trigger:       │
 | 
						|
│  - memories table  │     │  - Insert FTS5  │
 | 
						|
│  - Link tags       │     │  - Gen trigrams │
 | 
						|
└────┬───────────────┘     └─────────────────┘
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
┌────────────────────┐
 | 
						|
│  Searchable        │
 | 
						|
└────────────────────┘
 | 
						|
     │
 | 
						|
     │ (time passes)
 | 
						|
     ▼
 | 
						|
┌────────────────────┐
 | 
						|
│  Expired?          │───No──▶ Continue
 | 
						|
└────┬───────────────┘
 | 
						|
     │ Yes
 | 
						|
     ▼
 | 
						|
┌────────────────────┐
 | 
						|
│  Prune Command     │
 | 
						|
│  (manual/auto)     │
 | 
						|
└────┬───────────────┘
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
┌────────────────────┐     ┌─────────────────┐
 | 
						|
│  Delete:           │────▶│  Trigger:       │
 | 
						|
│  - memories table  │     │  - Delete FTS5  │
 | 
						|
│  - CASCADE tags    │     │  - Delete tris  │
 | 
						|
└────────────────────┘     └─────────────────┘
 | 
						|
```
 | 
						|
 | 
						|
## Query Processing Flow
 | 
						|
 | 
						|
### Phase 1 (LIKE)
 | 
						|
```
 | 
						|
User Query: "docker networking"
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Parse Query: Extract terms, filters
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Build SQL: LIKE '%docker%' AND LIKE '%networking%'
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Apply Filters: Tags, dates, agent
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Execute: Sequential scan through memories
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Order: By created_at DESC
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Limit: Take top N results
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Format: Plain text / JSON / Markdown
 | 
						|
```
 | 
						|
 | 
						|
### Phase 2 (FTS5)
 | 
						|
```
 | 
						|
User Query: "docker AND networking"
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Parse Query: Identify operators, quotes, prefixes
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Build FTS5 Query: "docker AND networking" (already valid)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
FTS5 MATCH: Inverted index lookup
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
BM25 Ranking: Calculate relevance scores
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Apply Filters: Tags, dates, agent (on results)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Order: By rank (BM25 score)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Limit: Take top N results
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Format: With relevance scores
 | 
						|
```
 | 
						|
 | 
						|
### Phase 3 (Fuzzy)
 | 
						|
```
 | 
						|
User Query: "dokcer networking"
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Try FTS5: "dokcer AND networking"
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Results: 0 (no exact match)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Trigger Fuzzy: Extract trigrams
 | 
						|
     │
 | 
						|
     ├─▶ "dokcer" → ["dok", "okc", "kce", "cer"]
 | 
						|
     └─▶ "networking" → ["net", "etw", "two", ...]
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Find Candidates: Query trigrams table
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Calculate Similarity: Trigram overlap + edit distance
 | 
						|
     │
 | 
						|
     ├─▶ "docker" → similarity = 0.85 (good match)
 | 
						|
     └─▶ "networking" → similarity = 1.0 (exact)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Filter: Threshold ≥ 0.7
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Re-rank: Combined score (trigram + edit + recency)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Merge: With FTS5 results (dedup by ID)
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Limit: Take top N results
 | 
						|
     │
 | 
						|
     ▼
 | 
						|
Format: With relevance scores
 | 
						|
```
 | 
						|
 | 
						|
## Indexing Strategy
 | 
						|
 | 
						|
### Phase 1 Indexes
 | 
						|
```sql
 | 
						|
-- Recency queries (ORDER BY created_at DESC)
 | 
						|
CREATE INDEX idx_memories_created ON memories(created_at DESC);
 | 
						|
 | 
						|
-- Expiration filtering (WHERE expires_at > now())
 | 
						|
CREATE INDEX idx_memories_expires ON memories(expires_at) 
 | 
						|
  WHERE expires_at IS NOT NULL;
 | 
						|
 | 
						|
-- Tag lookups (JOIN on tag_id)
 | 
						|
CREATE INDEX idx_tags_name ON tags(name);
 | 
						|
 | 
						|
-- Tag filtering (JOIN memory_tags on memory_id)
 | 
						|
CREATE INDEX idx_memory_tags_tag ON memory_tags(tag_id);
 | 
						|
```
 | 
						|
 | 
						|
**Query plans:**
 | 
						|
```sql
 | 
						|
-- Search query uses indexes:
 | 
						|
EXPLAIN QUERY PLAN
 | 
						|
SELECT * FROM memories WHERE created_at > ? ORDER BY created_at DESC;
 | 
						|
-- Result: SEARCH memories USING INDEX idx_memories_created
 | 
						|
 | 
						|
EXPLAIN QUERY PLAN
 | 
						|
SELECT * FROM memories WHERE expires_at > strftime('%s', 'now');
 | 
						|
-- Result: SEARCH memories USING INDEX idx_memories_expires
 | 
						|
```
 | 
						|
 | 
						|
### Phase 2 Indexes (+ FTS5)
 | 
						|
```sql
 | 
						|
-- FTS5 creates inverted index automatically
 | 
						|
CREATE VIRTUAL TABLE memories_fts USING fts5(content, ...);
 | 
						|
-- Generates internal tables: memories_fts_data, memories_fts_idx, memories_fts_config
 | 
						|
```
 | 
						|
 | 
						|
**FTS5 Index Structure:**
 | 
						|
```
 | 
						|
Term → Document Postings List
 | 
						|
 | 
						|
"docker" → [1, 5, 12, 34, 56, ...]
 | 
						|
"compose" → [5, 12, 89, ...]
 | 
						|
"networking" → [5, 34, 67, ...]
 | 
						|
 | 
						|
Query "docker AND compose" → intersection([1,5,12,34,56], [5,12,89]) = [5, 12]
 | 
						|
```
 | 
						|
 | 
						|
### Phase 3 Indexes (+ Trigrams)
 | 
						|
```sql
 | 
						|
-- Trigram lookups (WHERE trigram IN (...))
 | 
						|
CREATE INDEX idx_trigrams_trigram ON trigrams(trigram);
 | 
						|
 | 
						|
-- Cleanup on memory deletion (CASCADE via memory_id)
 | 
						|
CREATE INDEX idx_trigrams_memory ON trigrams(memory_id);
 | 
						|
```
 | 
						|
 | 
						|
**Trigram Index Structure:**
 | 
						|
```
 | 
						|
Trigram → Memory IDs
 | 
						|
 | 
						|
"doc" → [1, 5, 12, 34, ...]  (all memories with "doc")
 | 
						|
"ock" → [1, 5, 12, 34, ...]  (all memories with "ock")
 | 
						|
"cke" → [1, 5, 12, ...]      (all memories with "cke")
 | 
						|
 | 
						|
Query "docker" trigrams ["doc", "ock", "cke", "ker"]
 | 
						|
→ Find intersection: memories with all 4 trigrams (or ≥ threshold)
 | 
						|
```
 | 
						|
 | 
						|
## Performance Optimization
 | 
						|
 | 
						|
### Database Configuration
 | 
						|
```sql
 | 
						|
-- WAL mode for better concurrency
 | 
						|
PRAGMA journal_mode = WAL;
 | 
						|
 | 
						|
-- Memory-mapped I/O for faster reads
 | 
						|
PRAGMA mmap_size = 268435456;  -- 256MB
 | 
						|
 | 
						|
-- Larger cache for better performance
 | 
						|
PRAGMA cache_size = -64000;  -- 64MB (negative = KB)
 | 
						|
 | 
						|
-- Synchronous writes (balance between speed and durability)
 | 
						|
PRAGMA synchronous = NORMAL;  -- Not FULL (too slow), not OFF (unsafe)
 | 
						|
 | 
						|
-- Auto-vacuum to prevent bloat
 | 
						|
PRAGMA auto_vacuum = INCREMENTAL;
 | 
						|
```
 | 
						|
 | 
						|
### Query Optimization
 | 
						|
```javascript
 | 
						|
// Use prepared statements (compiled once, executed many times)
 | 
						|
const searchStmt = db.prepare(`
 | 
						|
  SELECT * FROM memories 
 | 
						|
  WHERE LOWER(content) LIKE LOWER(?) 
 | 
						|
  ORDER BY created_at DESC 
 | 
						|
  LIMIT ?
 | 
						|
`);
 | 
						|
 | 
						|
// Transaction for bulk inserts
 | 
						|
const insertMany = db.transaction((memories) => {
 | 
						|
  for (const memory of memories) {
 | 
						|
    insertStmt.run(memory);
 | 
						|
  }
 | 
						|
});
 | 
						|
```
 | 
						|
 | 
						|
### Trigram Pruning
 | 
						|
```javascript
 | 
						|
// Prune common trigrams (low information value)
 | 
						|
// E.g., "the", "and", "ing" appear in most memories
 | 
						|
const pruneCommonTrigrams = db.prepare(`
 | 
						|
  DELETE FROM trigrams
 | 
						|
  WHERE trigram IN (
 | 
						|
    SELECT trigram FROM trigrams
 | 
						|
    GROUP BY trigram
 | 
						|
    HAVING COUNT(*) > (SELECT COUNT(*) * 0.5 FROM memories)
 | 
						|
  )
 | 
						|
`);
 | 
						|
 | 
						|
// Run after bulk imports
 | 
						|
pruneCommonTrigrams.run();
 | 
						|
```
 | 
						|
 | 
						|
### Result Caching
 | 
						|
```javascript
 | 
						|
// LRU cache for frequent queries
 | 
						|
const LRU = require('lru-cache');
 | 
						|
const queryCache = new LRU({
 | 
						|
  max: 100,           // Cache 100 queries
 | 
						|
  ttl: 1000 * 60 * 5  // 5 minute TTL
 | 
						|
});
 | 
						|
 | 
						|
function search(query, filters) {
 | 
						|
  const cacheKey = JSON.stringify({ query, filters });
 | 
						|
  
 | 
						|
  if (queryCache.has(cacheKey)) {
 | 
						|
    return queryCache.get(cacheKey);
 | 
						|
  }
 | 
						|
  
 | 
						|
  const results = executeSearch(query, filters);
 | 
						|
  queryCache.set(cacheKey, results);
 | 
						|
  return results;
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
## Error Handling
 | 
						|
 | 
						|
### Database Errors
 | 
						|
```javascript
 | 
						|
try {
 | 
						|
  db.prepare(sql).run(params);
 | 
						|
} catch (error) {
 | 
						|
  if (error.code === 'SQLITE_BUSY') {
 | 
						|
    // Retry after backoff
 | 
						|
    await sleep(100);
 | 
						|
    return retry(operation, maxRetries - 1);
 | 
						|
  }
 | 
						|
  
 | 
						|
  if (error.code === 'SQLITE_CONSTRAINT') {
 | 
						|
    // Validation error (content too long, duplicate tag, etc.)
 | 
						|
    throw new ValidationError(error.message);
 | 
						|
  }
 | 
						|
  
 | 
						|
  if (error.code === 'SQLITE_CORRUPT') {
 | 
						|
    // Database corruption - suggest recovery
 | 
						|
    throw new DatabaseCorruptError('Database corrupted, run: memory recover');
 | 
						|
  }
 | 
						|
  
 | 
						|
  // Unknown error
 | 
						|
  throw error;
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
### Migration Errors
 | 
						|
```javascript
 | 
						|
async function migrate(targetVersion) {
 | 
						|
  const currentVersion = getCurrentSchemaVersion();
 | 
						|
  
 | 
						|
  // Backup before migration
 | 
						|
  await backupDatabase(`backup-v${currentVersion}.db`);
 | 
						|
  
 | 
						|
  try {
 | 
						|
    db.exec('BEGIN TRANSACTION');
 | 
						|
    
 | 
						|
    // Run migrations
 | 
						|
    for (let v = currentVersion + 1; v <= targetVersion; v++) {
 | 
						|
      await runMigration(v);
 | 
						|
    }
 | 
						|
    
 | 
						|
    db.exec('COMMIT');
 | 
						|
    console.log(`Migrated to version ${targetVersion}`);
 | 
						|
  } catch (error) {
 | 
						|
    db.exec('ROLLBACK');
 | 
						|
    console.error('Migration failed, rolling back');
 | 
						|
    
 | 
						|
    // Restore backup
 | 
						|
    await restoreDatabase(`backup-v${currentVersion}.db`);
 | 
						|
    throw error;
 | 
						|
  }
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
## Security Considerations
 | 
						|
 | 
						|
### Input Validation
 | 
						|
```javascript
 | 
						|
// Prevent SQL injection (prepared statements)
 | 
						|
const stmt = db.prepare('SELECT * FROM memories WHERE content LIKE ?');
 | 
						|
stmt.all(`%${userInput}%`);  // Safe: userInput is parameterized
 | 
						|
 | 
						|
// Validate content length
 | 
						|
if (content.length > 10000) {
 | 
						|
  throw new ValidationError('Content exceeds 10KB limit');
 | 
						|
}
 | 
						|
 | 
						|
// Sanitize tags (only alphanumeric, hyphens, underscores)
 | 
						|
const sanitizeTag = (tag) => tag.replace(/[^a-z0-9\-_]/gi, '');
 | 
						|
```
 | 
						|
 | 
						|
### Sensitive Data Protection
 | 
						|
```javascript
 | 
						|
// Warn if sensitive patterns detected
 | 
						|
const sensitivePatterns = [
 | 
						|
  /password\s*[:=]\s*\S+/i,
 | 
						|
  /api[_-]?key\s*[:=]\s*\S+/i,
 | 
						|
  /token\s*[:=]\s*\S+/i,
 | 
						|
  /secret\s*[:=]\s*\S+/i
 | 
						|
];
 | 
						|
 | 
						|
function checkSensitiveData(content) {
 | 
						|
  for (const pattern of sensitivePatterns) {
 | 
						|
    if (pattern.test(content)) {
 | 
						|
      console.warn('⚠️  Warning: Potential sensitive data detected');
 | 
						|
      console.warn('Consider storing credentials in a secure vault instead');
 | 
						|
      return true;
 | 
						|
    }
 | 
						|
  }
 | 
						|
  return false;
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
### File Permissions
 | 
						|
```bash
 | 
						|
# Database file should be user-readable only
 | 
						|
chmod 600 ~/.config/opencode/memories.db
 | 
						|
 | 
						|
# Backup files should have same permissions
 | 
						|
chmod 600 ~/.config/opencode/memories-backup-*.db
 | 
						|
```
 | 
						|
 | 
						|
## Scalability Limits
 | 
						|
 | 
						|
### Phase 1 (LIKE)
 | 
						|
- **Max memories:** ~500 (performance degrades beyond)
 | 
						|
- **Query latency:** O(n) - linear scan
 | 
						|
- **Storage:** ~250KB for 500 memories
 | 
						|
 | 
						|
### Phase 2 (FTS5)
 | 
						|
- **Max memories:** ~50K (comfortable), 100K+ (possible)
 | 
						|
- **Query latency:** O(log n) - index lookup
 | 
						|
- **Storage:** +30% for FTS5 index (~325KB for 500 memories)
 | 
						|
 | 
						|
### Phase 3 (Fuzzy)
 | 
						|
- **Max memories:** 100K+ (with trigram pruning)
 | 
						|
- **Query latency:** O(log n) + O(m) where m = fuzzy candidates
 | 
						|
- **Storage:** +200% for trigrams (~750KB for 500 memories)
 | 
						|
  - Mitigated by pruning common trigrams
 | 
						|
 | 
						|
### Migration Triggers
 | 
						|
 | 
						|
**Phase 1 → Phase 2:**
 | 
						|
- Dataset > 500 memories
 | 
						|
- Query latency > 500ms
 | 
						|
- Manual user request
 | 
						|
 | 
						|
**Phase 2 → Phase 3:**
 | 
						|
- User reports needing fuzzy search
 | 
						|
- High typo rates in queries
 | 
						|
- Manual user request
 | 
						|
 | 
						|
## Future Enhancements
 | 
						|
 | 
						|
### Vector Embeddings (Phase 4?)
 | 
						|
- Semantic search ("docker" → "containerization")
 | 
						|
- Requires embedding model (~100MB)
 | 
						|
- SQLite-VSS extension
 | 
						|
- Hybrid: BM25 (lexical) + Cosine similarity (semantic)
 | 
						|
 | 
						|
### Automatic Summarization
 | 
						|
- LLM-generated summaries for long memories
 | 
						|
- Reduces token usage in search results
 | 
						|
- Trade-off: API dependency
 | 
						|
 | 
						|
### Memory Versioning
 | 
						|
- Track edits to memories
 | 
						|
- Show history
 | 
						|
- Revert to previous version
 | 
						|
 | 
						|
### Conflict Detection
 | 
						|
- Identify contradictory memories
 | 
						|
- Suggest consolidation
 | 
						|
- Flag for review
 | 
						|
 | 
						|
### Collaborative Features
 | 
						|
- Share memories between agents
 | 
						|
- Team-wide memory pool
 | 
						|
- Privacy controls
 | 
						|
 | 
						|
---
 | 
						|
 | 
						|
**Document Version:** 1.0  
 | 
						|
**Last Updated:** 2025-10-29  
 | 
						|
**Status:** Planning Complete, Implementation Pending
 |