827 lines
23 KiB
Markdown
827 lines
23 KiB
Markdown
# LLMemory Architecture
|
|
|
|
## System Overview
|
|
|
|
LLMemory is a three-layer system:
|
|
1. **CLI Layer** - User/agent interface (Commander.js)
|
|
2. **Search Layer** - Query processing and ranking (LIKE → FTS5 → Fuzzy)
|
|
3. **Storage Layer** - Persistent data (SQLite)
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ CLI Layer │
|
|
│ (memory store/search/list/prune) │
|
|
└──────────────┬──────────────────────┘
|
|
│
|
|
┌──────────────▼──────────────────────┐
|
|
│ Search Layer │
|
|
│ Phase 1: LIKE search │
|
|
│ Phase 2: FTS5 + BM25 ranking │
|
|
│ Phase 3: + Trigram fuzzy matching │
|
|
└──────────────┬──────────────────────┘
|
|
│
|
|
┌──────────────▼──────────────────────┐
|
|
│ Storage Layer │
|
|
│ SQLite Database │
|
|
│ - memories (content, metadata) │
|
|
│ - tags (normalized) │
|
|
│ - memory_tags (many-to-many) │
|
|
│ - memories_fts (FTS5 virtual) │
|
|
│ - trigrams (fuzzy index) │
|
|
└─────────────────────────────────────┘
|
|
```
|
|
|
|
## Data Model
|
|
|
|
### Phase 1 Schema (MVP)
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ memories │
|
|
├─────────────────┤
|
|
│ id │ PK
|
|
│ content │ TEXT (max 10KB)
|
|
│ created_at │ INTEGER (Unix timestamp)
|
|
│ entered_by │ TEXT (agent name)
|
|
│ expires_at │ INTEGER (nullable)
|
|
└─────────────────┘
|
|
│
|
|
│ 1:N
|
|
▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ memory_tags │ N:M │ tags │
|
|
├─────────────────┤ ├─────────────────┤
|
|
│ memory_id │ FK ───│ id │ PK
|
|
│ tag_id │ FK │ name │ TEXT (unique, NOCASE)
|
|
└─────────────────┘ │ created_at │ INTEGER
|
|
└─────────────────┘
|
|
```
|
|
|
|
### Phase 2 Schema (+ FTS5)
|
|
|
|
Adds virtual table for full-text search:
|
|
|
|
```
|
|
┌─────────────────────┐
|
|
│ memories_fts │ Virtual Table (FTS5)
|
|
├─────────────────────┤
|
|
│ rowid → memories.id │
|
|
│ content (indexed) │
|
|
└─────────────────────┘
|
|
│
|
|
│ Synced via triggers
|
|
▼
|
|
┌─────────────────┐
|
|
│ memories │
|
|
└─────────────────┘
|
|
```
|
|
|
|
**Triggers:**
|
|
- `memories_ai`: INSERT into memories → INSERT into memories_fts
|
|
- `memories_au`: UPDATE memories → UPDATE memories_fts
|
|
- `memories_ad`: DELETE memories → DELETE from memories_fts
|
|
|
|
### Phase 3 Schema (+ Trigrams)
|
|
|
|
Adds trigram index for fuzzy matching:
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ trigrams │
|
|
├─────────────────┤
|
|
│ trigram │ TEXT (3 chars)
|
|
│ memory_id │ FK → memories.id
|
|
│ position │ INTEGER (for proximity)
|
|
└─────────────────┘
|
|
│
|
|
│ Generated on insert/update
|
|
▼
|
|
┌─────────────────┐
|
|
│ memories │
|
|
└─────────────────┘
|
|
```
|
|
|
|
## Search Algorithm Evolution
|
|
|
|
### Phase 1: LIKE Search
|
|
|
|
**Algorithm:**
|
|
```python
|
|
function search_like(query, filters):
|
|
# Case-insensitive wildcard matching
|
|
sql = "SELECT * FROM memories WHERE LOWER(content) LIKE LOWER('%' || ? || '%')"
|
|
|
|
# Apply filters
|
|
if filters.tags:
|
|
sql += " AND memory_id IN (SELECT memory_id FROM memory_tags WHERE tag_id IN (...))"
|
|
|
|
if filters.after:
|
|
sql += " AND created_at >= ?"
|
|
|
|
# Exclude expired
|
|
sql += " AND (expires_at IS NULL OR expires_at > now())"
|
|
|
|
# Order by recency
|
|
sql += " ORDER BY created_at DESC LIMIT ?"
|
|
|
|
return execute(sql, params)
|
|
```
|
|
|
|
**Strengths:**
|
|
- Simple, fast for small datasets
|
|
- No dependencies
|
|
- Predictable behavior
|
|
|
|
**Weaknesses:**
|
|
- No relevance ranking
|
|
- Slow for large datasets (full table scan)
|
|
- No fuzzy matching
|
|
- No phrase queries or boolean logic
|
|
|
|
**Performance:** O(n) where n = number of memories
|
|
**Target:** <50ms for <500 memories
|
|
|
|
---
|
|
|
|
### Phase 2: FTS5 Search
|
|
|
|
**Algorithm:**
|
|
```python
|
|
function search_fts5(query, filters):
|
|
# Build FTS5 query
|
|
fts_query = build_fts5_query(query) # Handles AND/OR/NOT, quotes, prefixes
|
|
|
|
# FTS5 MATCH with BM25 ranking
|
|
sql = """
|
|
SELECT m.*, mf.rank as relevance
|
|
FROM memories_fts mf
|
|
JOIN memories m ON m.id = mf.rowid
|
|
WHERE memories_fts MATCH ?
|
|
AND (m.expires_at IS NULL OR m.expires_at > now())
|
|
"""
|
|
|
|
# Apply filters (same as Phase 1)
|
|
# ...
|
|
|
|
# Order by FTS5 rank (BM25 algorithm)
|
|
sql += " ORDER BY mf.rank LIMIT ?"
|
|
|
|
return execute(sql, params)
|
|
|
|
function build_fts5_query(query):
|
|
# Transform grep-like to FTS5
|
|
# "docker compose" → "docker AND compose"
|
|
# "docker OR podman" → "docker OR podman" (unchanged)
|
|
# '"exact phrase"' → '"exact phrase"' (unchanged)
|
|
# "docker*" → "docker*" (unchanged)
|
|
|
|
if has_operators(query):
|
|
return query
|
|
|
|
# Implicit AND
|
|
terms = query.split()
|
|
return " AND ".join(terms)
|
|
```
|
|
|
|
**FTS5 Tokenization:**
|
|
- **Tokenizer:** `porter unicode61 remove_diacritics 2`
|
|
- **Porter:** Stemming (running → run, databases → database)
|
|
- **unicode61:** Unicode support
|
|
- **remove_diacritics:** Normalize accented characters (café → cafe)
|
|
|
|
**BM25 Ranking:**
|
|
```
|
|
score = Σ(IDF(term) * (f(term) * (k1 + 1)) / (f(term) + k1 * (1 - b + b * |D| / avgdl)))
|
|
|
|
Where:
|
|
- IDF(term) = Inverse Document Frequency (rarer terms score higher)
|
|
- f(term) = Term frequency in document
|
|
- |D| = Document length
|
|
- avgdl = Average document length
|
|
- k1 = 1.2 (term frequency saturation)
|
|
- b = 0.75 (length normalization)
|
|
```
|
|
|
|
**Strengths:**
|
|
- Fast search with inverted index
|
|
- Relevance ranking (BM25)
|
|
- Boolean operators, phrase queries, prefix matching
|
|
- Scales to 100K+ documents
|
|
|
|
**Weaknesses:**
|
|
- No fuzzy matching (typo tolerance)
|
|
- FTS5 index overhead (~30% storage)
|
|
- More complex setup (triggers needed)
|
|
|
|
**Performance:** O(log n) for index lookup
|
|
**Target:** <100ms for 10K memories
|
|
|
|
---
|
|
|
|
### Phase 3: Fuzzy Search
|
|
|
|
**Algorithm:**
|
|
```python
|
|
function search_fuzzy(query, filters):
|
|
# Step 1: Try FTS5 exact match
|
|
results = search_fts5(query, filters)
|
|
|
|
# Step 2: If too few results, try fuzzy
|
|
if len(results) < 5 and filters.fuzzy != false:
|
|
fuzzy_results = search_trigram(query, filters)
|
|
results = merge_dedup(results, fuzzy_results)
|
|
|
|
# Step 3: Re-rank by combined score
|
|
for result in results:
|
|
result.score = calculate_combined_score(result, query)
|
|
|
|
results.sort(by=lambda r: r.score, reverse=True)
|
|
return results[:filters.limit]
|
|
|
|
function search_trigram(query, threshold=0.7, limit=10):
|
|
# Extract query trigrams
|
|
query_trigrams = extract_trigrams(query) # ["doc", "ock", "cke", "ker"]
|
|
|
|
# Find candidates by trigram overlap
|
|
sql = """
|
|
SELECT m.id, m.content, COUNT(DISTINCT tr.trigram) as matches
|
|
FROM memories m
|
|
JOIN trigrams tr ON tr.memory_id = m.id
|
|
WHERE tr.trigram IN (?, ?, ?, ...)
|
|
AND (m.expires_at IS NULL OR m.expires_at > now())
|
|
GROUP BY m.id
|
|
HAVING matches >= ?
|
|
ORDER BY matches DESC
|
|
LIMIT ?
|
|
"""
|
|
|
|
min_matches = ceil(len(query_trigrams) * threshold)
|
|
candidates = execute(sql, query_trigrams, min_matches, limit * 2)
|
|
|
|
# Calculate edit distance and combined score
|
|
scored = []
|
|
for candidate in candidates:
|
|
edit_dist = levenshtein(query, candidate.content[:len(query)*3])
|
|
trigram_sim = candidate.matches / len(query_trigrams)
|
|
normalized_edit = 1 - (edit_dist / max(len(query), len(candidate.content)))
|
|
|
|
score = 0.6 * trigram_sim + 0.4 * normalized_edit
|
|
|
|
if score >= threshold:
|
|
scored.append((candidate, score))
|
|
|
|
scored.sort(by=lambda x: x[1], reverse=True)
|
|
return [c for c, s in scored[:limit]]
|
|
|
|
function extract_trigrams(text):
|
|
# Normalize: lowercase, remove punctuation, collapse whitespace
|
|
normalized = text.lower().replace(/[^\w\s]/g, ' ').replace(/\s+/g, ' ').trim()
|
|
|
|
# Add padding for boundary matching
|
|
padded = " " + normalized + " "
|
|
|
|
# Sliding window of 3 characters
|
|
trigrams = []
|
|
for i in range(len(padded) - 2):
|
|
trigram = padded[i:i+3]
|
|
if trigram.strip().len() == 3: # Skip whitespace-only
|
|
trigrams.append(trigram)
|
|
|
|
return unique(trigrams)
|
|
|
|
function levenshtein(a, b):
|
|
# Wagner-Fischer algorithm with single-row optimization
|
|
if len(a) == 0: return len(b)
|
|
if len(b) == 0: return len(a)
|
|
|
|
prev_row = [0..len(b)]
|
|
|
|
for i in range(len(a)):
|
|
cur_row = [i + 1]
|
|
for j in range(len(b)):
|
|
cost = 0 if a[i] == b[j] else 1
|
|
cur_row.append(min(
|
|
cur_row[j] + 1, # deletion
|
|
prev_row[j + 1] + 1, # insertion
|
|
prev_row[j] + cost # substitution
|
|
))
|
|
prev_row = cur_row
|
|
|
|
return prev_row[len(b)]
|
|
|
|
function calculate_combined_score(result, query):
|
|
# BM25 from FTS5 (if available)
|
|
bm25_score = result.fts_rank if result.has_fts_rank else 0
|
|
|
|
# Trigram similarity
|
|
trigram_score = result.trigram_matches / len(extract_trigrams(query))
|
|
|
|
# Edit distance (normalized)
|
|
edit_dist = levenshtein(query, result.content[:len(query)*3])
|
|
edit_score = 1 - (edit_dist / max(len(query), len(result.content)))
|
|
|
|
# Recency boost (exponential decay over 90 days)
|
|
days_ago = (now() - result.created_at) / 86400
|
|
recency_score = max(0, 1 - (days_ago / 90))
|
|
|
|
# Weighted combination
|
|
score = (0.4 * bm25_score +
|
|
0.3 * trigram_score +
|
|
0.2 * edit_score +
|
|
0.1 * recency_score)
|
|
|
|
return score
|
|
```
|
|
|
|
**Trigram Similarity (Jaccard Index):**
|
|
```
|
|
similarity = |trigrams(query) ∩ trigrams(document)| / |trigrams(query)|
|
|
|
|
Example:
|
|
query = "docker" → trigrams: ["doc", "ock", "cke", "ker"]
|
|
document = "dcoker" → trigrams: ["dco", "cok", "oke", "ker"]
|
|
|
|
intersection = ["ker"] → count = 1
|
|
similarity = 1 / 4 = 0.25 (below threshold, but edit distance is 2)
|
|
|
|
Better approach: Edit distance normalized by length
|
|
edit_distance("docker", "dcoker") = 2
|
|
normalized = 1 - (2 / 6) = 0.67 (above threshold 0.6)
|
|
```
|
|
|
|
**Strengths:**
|
|
- Handles typos (edit distance ≤2)
|
|
- Partial matches ("docker" finds "dockerization")
|
|
- Cascading strategy (fast exact, fallback to fuzzy)
|
|
- Configurable threshold
|
|
|
|
**Weaknesses:**
|
|
- Trigram table is large (~3x content size)
|
|
- Slower than FTS5 alone
|
|
- Tuning threshold requires experimentation
|
|
|
|
**Performance:** O(log n) + O(m) where m = trigram candidates
|
|
**Target:** <200ms for 10K memories with fuzzy
|
|
|
|
---
|
|
|
|
## Memory Lifecycle
|
|
|
|
```
|
|
┌──────────┐
|
|
│ Store │
|
|
└────┬─────┘
|
|
│
|
|
▼
|
|
┌────────────────────┐
|
|
│ Validate: │
|
|
│ - Length (<10KB) │
|
|
│ - Tags (parse) │
|
|
│ - Expiration │
|
|
└────┬───────────────┘
|
|
│
|
|
▼
|
|
┌────────────────────┐ ┌─────────────────┐
|
|
│ Insert: │────▶│ Trigger: │
|
|
│ - memories table │ │ - Insert FTS5 │
|
|
│ - Link tags │ │ - Gen trigrams │
|
|
└────┬───────────────┘ └─────────────────┘
|
|
│
|
|
▼
|
|
┌────────────────────┐
|
|
│ Searchable │
|
|
└────────────────────┘
|
|
│
|
|
│ (time passes)
|
|
▼
|
|
┌────────────────────┐
|
|
│ Expired? │───No──▶ Continue
|
|
└────┬───────────────┘
|
|
│ Yes
|
|
▼
|
|
┌────────────────────┐
|
|
│ Prune Command │
|
|
│ (manual/auto) │
|
|
└────┬───────────────┘
|
|
│
|
|
▼
|
|
┌────────────────────┐ ┌─────────────────┐
|
|
│ Delete: │────▶│ Trigger: │
|
|
│ - memories table │ │ - Delete FTS5 │
|
|
│ - CASCADE tags │ │ - Delete tris │
|
|
└────────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
## Query Processing Flow
|
|
|
|
### Phase 1 (LIKE)
|
|
```
|
|
User Query: "docker networking"
|
|
│
|
|
▼
|
|
Parse Query: Extract terms, filters
|
|
│
|
|
▼
|
|
Build SQL: LIKE '%docker%' AND LIKE '%networking%'
|
|
│
|
|
▼
|
|
Apply Filters: Tags, dates, agent
|
|
│
|
|
▼
|
|
Execute: Sequential scan through memories
|
|
│
|
|
▼
|
|
Order: By created_at DESC
|
|
│
|
|
▼
|
|
Limit: Take top N results
|
|
│
|
|
▼
|
|
Format: Plain text / JSON / Markdown
|
|
```
|
|
|
|
### Phase 2 (FTS5)
|
|
```
|
|
User Query: "docker AND networking"
|
|
│
|
|
▼
|
|
Parse Query: Identify operators, quotes, prefixes
|
|
│
|
|
▼
|
|
Build FTS5 Query: "docker AND networking" (already valid)
|
|
│
|
|
▼
|
|
FTS5 MATCH: Inverted index lookup
|
|
│
|
|
▼
|
|
BM25 Ranking: Calculate relevance scores
|
|
│
|
|
▼
|
|
Apply Filters: Tags, dates, agent (on results)
|
|
│
|
|
▼
|
|
Order: By rank (BM25 score)
|
|
│
|
|
▼
|
|
Limit: Take top N results
|
|
│
|
|
▼
|
|
Format: With relevance scores
|
|
```
|
|
|
|
### Phase 3 (Fuzzy)
|
|
```
|
|
User Query: "dokcer networking"
|
|
│
|
|
▼
|
|
Try FTS5: "dokcer AND networking"
|
|
│
|
|
▼
|
|
Results: 0 (no exact match)
|
|
│
|
|
▼
|
|
Trigger Fuzzy: Extract trigrams
|
|
│
|
|
├─▶ "dokcer" → ["dok", "okc", "kce", "cer"]
|
|
└─▶ "networking" → ["net", "etw", "two", ...]
|
|
│
|
|
▼
|
|
Find Candidates: Query trigrams table
|
|
│
|
|
▼
|
|
Calculate Similarity: Trigram overlap + edit distance
|
|
│
|
|
├─▶ "docker" → similarity = 0.85 (good match)
|
|
└─▶ "networking" → similarity = 1.0 (exact)
|
|
│
|
|
▼
|
|
Filter: Threshold ≥ 0.7
|
|
│
|
|
▼
|
|
Re-rank: Combined score (trigram + edit + recency)
|
|
│
|
|
▼
|
|
Merge: With FTS5 results (dedup by ID)
|
|
│
|
|
▼
|
|
Limit: Take top N results
|
|
│
|
|
▼
|
|
Format: With relevance scores
|
|
```
|
|
|
|
## Indexing Strategy
|
|
|
|
### Phase 1 Indexes
|
|
```sql
|
|
-- Recency queries (ORDER BY created_at DESC)
|
|
CREATE INDEX idx_memories_created ON memories(created_at DESC);
|
|
|
|
-- Expiration filtering (WHERE expires_at > now())
|
|
CREATE INDEX idx_memories_expires ON memories(expires_at)
|
|
WHERE expires_at IS NOT NULL;
|
|
|
|
-- Tag lookups (JOIN on tag_id)
|
|
CREATE INDEX idx_tags_name ON tags(name);
|
|
|
|
-- Tag filtering (JOIN memory_tags on memory_id)
|
|
CREATE INDEX idx_memory_tags_tag ON memory_tags(tag_id);
|
|
```
|
|
|
|
**Query plans:**
|
|
```sql
|
|
-- Search query uses indexes:
|
|
EXPLAIN QUERY PLAN
|
|
SELECT * FROM memories WHERE created_at > ? ORDER BY created_at DESC;
|
|
-- Result: SEARCH memories USING INDEX idx_memories_created
|
|
|
|
EXPLAIN QUERY PLAN
|
|
SELECT * FROM memories WHERE expires_at > strftime('%s', 'now');
|
|
-- Result: SEARCH memories USING INDEX idx_memories_expires
|
|
```
|
|
|
|
### Phase 2 Indexes (+ FTS5)
|
|
```sql
|
|
-- FTS5 creates inverted index automatically
|
|
CREATE VIRTUAL TABLE memories_fts USING fts5(content, ...);
|
|
-- Generates internal tables: memories_fts_data, memories_fts_idx, memories_fts_config
|
|
```
|
|
|
|
**FTS5 Index Structure:**
|
|
```
|
|
Term → Document Postings List
|
|
|
|
"docker" → [1, 5, 12, 34, 56, ...]
|
|
"compose" → [5, 12, 89, ...]
|
|
"networking" → [5, 34, 67, ...]
|
|
|
|
Query "docker AND compose" → intersection([1,5,12,34,56], [5,12,89]) = [5, 12]
|
|
```
|
|
|
|
### Phase 3 Indexes (+ Trigrams)
|
|
```sql
|
|
-- Trigram lookups (WHERE trigram IN (...))
|
|
CREATE INDEX idx_trigrams_trigram ON trigrams(trigram);
|
|
|
|
-- Cleanup on memory deletion (CASCADE via memory_id)
|
|
CREATE INDEX idx_trigrams_memory ON trigrams(memory_id);
|
|
```
|
|
|
|
**Trigram Index Structure:**
|
|
```
|
|
Trigram → Memory IDs
|
|
|
|
"doc" → [1, 5, 12, 34, ...] (all memories with "doc")
|
|
"ock" → [1, 5, 12, 34, ...] (all memories with "ock")
|
|
"cke" → [1, 5, 12, ...] (all memories with "cke")
|
|
|
|
Query "docker" trigrams ["doc", "ock", "cke", "ker"]
|
|
→ Find intersection: memories with all 4 trigrams (or ≥ threshold)
|
|
```
|
|
|
|
## Performance Optimization
|
|
|
|
### Database Configuration
|
|
```sql
|
|
-- WAL mode for better concurrency
|
|
PRAGMA journal_mode = WAL;
|
|
|
|
-- Memory-mapped I/O for faster reads
|
|
PRAGMA mmap_size = 268435456; -- 256MB
|
|
|
|
-- Larger cache for better performance
|
|
PRAGMA cache_size = -64000; -- 64MB (negative = KB)
|
|
|
|
-- Synchronous writes (balance between speed and durability)
|
|
PRAGMA synchronous = NORMAL; -- Not FULL (too slow), not OFF (unsafe)
|
|
|
|
-- Auto-vacuum to prevent bloat
|
|
PRAGMA auto_vacuum = INCREMENTAL;
|
|
```
|
|
|
|
### Query Optimization
|
|
```javascript
|
|
// Use prepared statements (compiled once, executed many times)
|
|
const searchStmt = db.prepare(`
|
|
SELECT * FROM memories
|
|
WHERE LOWER(content) LIKE LOWER(?)
|
|
ORDER BY created_at DESC
|
|
LIMIT ?
|
|
`);
|
|
|
|
// Transaction for bulk inserts
|
|
const insertMany = db.transaction((memories) => {
|
|
for (const memory of memories) {
|
|
insertStmt.run(memory);
|
|
}
|
|
});
|
|
```
|
|
|
|
### Trigram Pruning
|
|
```javascript
|
|
// Prune common trigrams (low information value)
|
|
// E.g., "the", "and", "ing" appear in most memories
|
|
const pruneCommonTrigrams = db.prepare(`
|
|
DELETE FROM trigrams
|
|
WHERE trigram IN (
|
|
SELECT trigram FROM trigrams
|
|
GROUP BY trigram
|
|
HAVING COUNT(*) > (SELECT COUNT(*) * 0.5 FROM memories)
|
|
)
|
|
`);
|
|
|
|
// Run after bulk imports
|
|
pruneCommonTrigrams.run();
|
|
```
|
|
|
|
### Result Caching
|
|
```javascript
|
|
// LRU cache for frequent queries
|
|
const LRU = require('lru-cache');
|
|
const queryCache = new LRU({
|
|
max: 100, // Cache 100 queries
|
|
ttl: 1000 * 60 * 5 // 5 minute TTL
|
|
});
|
|
|
|
function search(query, filters) {
|
|
const cacheKey = JSON.stringify({ query, filters });
|
|
|
|
if (queryCache.has(cacheKey)) {
|
|
return queryCache.get(cacheKey);
|
|
}
|
|
|
|
const results = executeSearch(query, filters);
|
|
queryCache.set(cacheKey, results);
|
|
return results;
|
|
}
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Database Errors
|
|
```javascript
|
|
try {
|
|
db.prepare(sql).run(params);
|
|
} catch (error) {
|
|
if (error.code === 'SQLITE_BUSY') {
|
|
// Retry after backoff
|
|
await sleep(100);
|
|
return retry(operation, maxRetries - 1);
|
|
}
|
|
|
|
if (error.code === 'SQLITE_CONSTRAINT') {
|
|
// Validation error (content too long, duplicate tag, etc.)
|
|
throw new ValidationError(error.message);
|
|
}
|
|
|
|
if (error.code === 'SQLITE_CORRUPT') {
|
|
// Database corruption - suggest recovery
|
|
throw new DatabaseCorruptError('Database corrupted, run: memory recover');
|
|
}
|
|
|
|
// Unknown error
|
|
throw error;
|
|
}
|
|
```
|
|
|
|
### Migration Errors
|
|
```javascript
|
|
async function migrate(targetVersion) {
|
|
const currentVersion = getCurrentSchemaVersion();
|
|
|
|
// Backup before migration
|
|
await backupDatabase(`backup-v${currentVersion}.db`);
|
|
|
|
try {
|
|
db.exec('BEGIN TRANSACTION');
|
|
|
|
// Run migrations
|
|
for (let v = currentVersion + 1; v <= targetVersion; v++) {
|
|
await runMigration(v);
|
|
}
|
|
|
|
db.exec('COMMIT');
|
|
console.log(`Migrated to version ${targetVersion}`);
|
|
} catch (error) {
|
|
db.exec('ROLLBACK');
|
|
console.error('Migration failed, rolling back');
|
|
|
|
// Restore backup
|
|
await restoreDatabase(`backup-v${currentVersion}.db`);
|
|
throw error;
|
|
}
|
|
}
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Input Validation
|
|
```javascript
|
|
// Prevent SQL injection (prepared statements)
|
|
const stmt = db.prepare('SELECT * FROM memories WHERE content LIKE ?');
|
|
stmt.all(`%${userInput}%`); // Safe: userInput is parameterized
|
|
|
|
// Validate content length
|
|
if (content.length > 10000) {
|
|
throw new ValidationError('Content exceeds 10KB limit');
|
|
}
|
|
|
|
// Sanitize tags (only alphanumeric, hyphens, underscores)
|
|
const sanitizeTag = (tag) => tag.replace(/[^a-z0-9\-_]/gi, '');
|
|
```
|
|
|
|
### Sensitive Data Protection
|
|
```javascript
|
|
// Warn if sensitive patterns detected
|
|
const sensitivePatterns = [
|
|
/password\s*[:=]\s*\S+/i,
|
|
/api[_-]?key\s*[:=]\s*\S+/i,
|
|
/token\s*[:=]\s*\S+/i,
|
|
/secret\s*[:=]\s*\S+/i
|
|
];
|
|
|
|
function checkSensitiveData(content) {
|
|
for (const pattern of sensitivePatterns) {
|
|
if (pattern.test(content)) {
|
|
console.warn('⚠️ Warning: Potential sensitive data detected');
|
|
console.warn('Consider storing credentials in a secure vault instead');
|
|
return true;
|
|
}
|
|
}
|
|
return false;
|
|
}
|
|
```
|
|
|
|
### File Permissions
|
|
```bash
|
|
# Database file should be user-readable only
|
|
chmod 600 ~/.config/opencode/memories.db
|
|
|
|
# Backup files should have same permissions
|
|
chmod 600 ~/.config/opencode/memories-backup-*.db
|
|
```
|
|
|
|
## Scalability Limits
|
|
|
|
### Phase 1 (LIKE)
|
|
- **Max memories:** ~500 (performance degrades beyond)
|
|
- **Query latency:** O(n) - linear scan
|
|
- **Storage:** ~250KB for 500 memories
|
|
|
|
### Phase 2 (FTS5)
|
|
- **Max memories:** ~50K (comfortable), 100K+ (possible)
|
|
- **Query latency:** O(log n) - index lookup
|
|
- **Storage:** +30% for FTS5 index (~325KB for 500 memories)
|
|
|
|
### Phase 3 (Fuzzy)
|
|
- **Max memories:** 100K+ (with trigram pruning)
|
|
- **Query latency:** O(log n) + O(m) where m = fuzzy candidates
|
|
- **Storage:** +200% for trigrams (~750KB for 500 memories)
|
|
- Mitigated by pruning common trigrams
|
|
|
|
### Migration Triggers
|
|
|
|
**Phase 1 → Phase 2:**
|
|
- Dataset > 500 memories
|
|
- Query latency > 500ms
|
|
- Manual user request
|
|
|
|
**Phase 2 → Phase 3:**
|
|
- User reports needing fuzzy search
|
|
- High typo rates in queries
|
|
- Manual user request
|
|
|
|
## Future Enhancements
|
|
|
|
### Vector Embeddings (Phase 4?)
|
|
- Semantic search ("docker" → "containerization")
|
|
- Requires embedding model (~100MB)
|
|
- SQLite-VSS extension
|
|
- Hybrid: BM25 (lexical) + Cosine similarity (semantic)
|
|
|
|
### Automatic Summarization
|
|
- LLM-generated summaries for long memories
|
|
- Reduces token usage in search results
|
|
- Trade-off: API dependency
|
|
|
|
### Memory Versioning
|
|
- Track edits to memories
|
|
- Show history
|
|
- Revert to previous version
|
|
|
|
### Conflict Detection
|
|
- Identify contradictory memories
|
|
- Suggest consolidation
|
|
- Flag for review
|
|
|
|
### Collaborative Features
|
|
- Share memories between agents
|
|
- Team-wide memory pool
|
|
- Privacy controls
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2025-10-29
|
|
**Status:** Planning Complete, Implementation Pending
|