nixos/shared/linked-dotfiles/opencode/llmemory/IMPLEMENTATION_PLAN.md
2025-10-29 18:46:16 -06:00

25 KiB

LLMemory Implementation Plan

Current Status: Phase 0 - Planning Complete

This document tracks implementation progress and provides step-by-step guidance for building LLMemory.

Goal: Working CLI tool with basic search in 2-3 days
Status: Not Started
Trigger to Complete: All checkpoints passed, can store/search memories

Step 1.1: Project Setup

Effort: 30 minutes
Status: Not Started

cd llmemory
npm init -y
npm install better-sqlite3 commander chalk date-fns
npm install -D vitest typescript @types/node @types/better-sqlite3

Deliverables:

  • package.json configured with dependencies
  • TypeScript configured (optional but recommended)
  • Git initialized with .gitignore
  • bin/memory executable created

Checkpoint: Run npm list - all dependencies installed


Step 1.2: Database Layer - Schema & Connection

Effort: 2 hours
Status: Not Started

Files to create:

  • src/db/connection.js - Database connection and initialization
  • src/db/schema.js - Phase 1 schema (memories, tags, memory_tags)
  • src/db/queries.js - Prepared statements

Schema (Phase 1):

CREATE TABLE memories (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  content TEXT NOT NULL CHECK(length(content) <= 10000),
  created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
  entered_by TEXT,
  expires_at INTEGER,
  CHECK(expires_at IS NULL OR expires_at > created_at)
);

CREATE TABLE tags (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL UNIQUE COLLATE NOCASE,
  created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);

CREATE TABLE memory_tags (
  memory_id INTEGER NOT NULL,
  tag_id INTEGER NOT NULL,
  PRIMARY KEY (memory_id, tag_id),
  FOREIGN KEY (memory_id) REFERENCES memories(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE TABLE metadata (
  key TEXT PRIMARY KEY,
  value TEXT NOT NULL
);

CREATE INDEX idx_memories_created ON memories(created_at DESC);
CREATE INDEX idx_memories_expires ON memories(expires_at) WHERE expires_at IS NOT NULL;
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_memory_tags_tag ON memory_tags(tag_id);

Implementation checklist:

  • Database connection with WAL mode enabled
  • Schema creation on first run
  • Metadata table initialized (schema_version: 1)
  • Prepared statements for common operations
  • Transaction helpers

Checkpoint: Run test insertion and query - works without errors


Step 1.3: Core Command - Store

Effort: 2 hours
Status: Not Started

TDD Workflow:

  1. Write test first (see test structure below)
  2. Run test - watch it fail
  3. Implement feature - make test pass
  4. Refine - improve based on test output

Files to create:

  • test/integration.test.js (TEST FIRST)
  • src/commands/store.js
  • src/utils/validation.js
  • src/utils/tags.js

Test First (write this before implementation):

// test/integration.test.js
import { describe, test, expect, beforeEach } from 'vitest';
import Database from 'better-sqlite3';
import { storeMemory } from '../src/commands/store.js';

describe('Store Command', () => {
  let db;
  
  beforeEach(() => {
    db = new Database(':memory:');
    // Init schema
    initSchema(db);
  });
  
  test('stores memory with tags', () => {
    const result = storeMemory(db, {
      content: 'Docker uses bridge networks by default',
      tags: 'docker,networking',
      entered_by: 'test'
    });
    
    expect(result.id).toBeDefined();
    
    // Verify in database
    const memory = db.prepare('SELECT * FROM memories WHERE id = ?').get(result.id);
    expect(memory.content).toBe('Docker uses bridge networks by default');
    
    // Verify tags
    const tags = db.prepare(`
      SELECT t.name FROM tags t
      JOIN memory_tags mt ON t.id = mt.tag_id
      WHERE mt.memory_id = ?
    `).all(result.id);
    
    expect(tags.map(t => t.name)).toEqual(['docker', 'networking']);
  });
  
  test('rejects content over 10KB', () => {
    const longContent = 'x'.repeat(10001);
    
    expect(() => {
      storeMemory(db, { content: longContent });
    }).toThrow('Content exceeds 10KB limit');
  });
  
  test('normalizes tags to lowercase', () => {
    storeMemory(db, { content: 'test', tags: 'Docker,NETWORKING' });
    
    const tags = db.prepare('SELECT name FROM tags').all();
    expect(tags).toEqual([
      { name: 'docker' },
      { name: 'networking' }
    ]);
  });
});

Then implement (after test fails):

// src/commands/store.js
export function storeMemory(db, { content, tags, expires, entered_by }) {
  // Implementation goes here
  // Make the test pass!
}

Features checklist:

  • Test written first and failing
  • Content validation (length, non-empty)
  • Tag parsing and normalization (lowercase)
  • Expiration date parsing (ISO 8601)
  • Atomic transaction (memory + tags)
  • Test passes

Checkpoint: npm test passes for store command


Step 1.4: Core Command - Search (LIKE)

Effort: 3 hours
Status: Not Started

TDD Workflow:

  1. Write integration test first with realistic data
  2. Run and watch it fail
  3. Implement search - make test pass
  4. Verify manually with CLI

Files to create:

  • Add tests to test/integration.test.js (TEST FIRST)
  • src/commands/search.js
  • src/search/like.js
  • src/utils/formatting.js

Test First:

// test/integration.test.js (add to existing file)
describe('Search Command', () => {
  let db;
  
  beforeEach(() => {
    db = new Database(':memory:');
    initSchema(db);
    
    // Seed with realistic data
    storeMemory(db, { content: 'Docker uses bridge networks by default', tags: 'docker,networking' });
    storeMemory(db, { content: 'Kubernetes pods share network namespace', tags: 'kubernetes,networking' });
    storeMemory(db, { content: 'PostgreSQL requires explicit vacuum', tags: 'postgresql,database' });
  });
  
  test('finds memories by content', () => {
    const results = searchMemories(db, 'docker');
    
    expect(results).toHaveLength(1);
    expect(results[0].content).toContain('Docker');
  });
  
  test('filters by tags (AND logic)', () => {
    const results = searchMemories(db, 'network', { tags: ['networking'] });
    
    expect(results).toHaveLength(2);
    expect(results.map(r => r.content)).toContain('Docker uses bridge networks by default');
    expect(results.map(r => r.content)).toContain('Kubernetes pods share network namespace');
  });
  
  test('excludes expired memories automatically', () => {
    storeMemory(db, {
      content: 'Expired memory',
      tags: 'test',
      expires_at: Date.now() - 86400 // Yesterday
    });
    
    const results = searchMemories(db, 'expired');
    
    expect(results).toHaveLength(0);
  });
  
  test('respects limit option', () => {
    // Add 20 memories
    for (let i = 0; i < 20; i++) {
      storeMemory(db, { content: `Memory ${i}`, tags: 'test' });
    }
    
    const results = searchMemories(db, 'Memory', { limit: 5 });
    
    expect(results).toHaveLength(5);
  });
});

Then implement to make tests pass.

Features checklist:

  • Tests written and failing
  • Case-insensitive LIKE search
  • Tag filtering (AND logic)
  • Date filtering (after/before)
  • Agent filtering (entered_by)
  • Automatic expiration filtering
  • Result limit
  • Tests pass

Checkpoint: npm test passes for search, manual CLI test works


Step 1.5: Core Command - List

Effort: 1 hour
Status: Not Started

Files to create:

  • src/commands/list.js

Implementation:

// Pseudo-code
export async function listCommand(options) {
  // 1. Query memories with filters
  // 2. Order by created_at DESC (or custom sort)
  // 3. Apply limit/offset
  // 4. Format and display
}

Features:

  • Sort options (created, expires, content)
  • Order direction (asc/desc)
  • Tag filtering
  • Pagination (limit/offset)
  • Display with tags

Checkpoint:

memory list --limit 5
# Should show 5 most recent memories

Step 1.6: Core Command - Prune

Effort: 1.5 hours
Status: Not Started

Files to create:

  • src/commands/prune.js

Implementation:

// Pseudo-code
export async function pruneCommand(options) {
  // 1. Find expired memories
  // 2. If --dry-run, show what would be deleted
  // 3. Else, prompt for confirmation (unless --force)
  // 4. Delete expired memories
  // 5. Show count of deleted memories
}

Features:

  • Find expired memories (expires_at <= now)
  • --dry-run flag (show without deleting)
  • --force flag (skip confirmation)
  • Confirmation prompt
  • Report deleted count

Checkpoint:

memory store "Temp" --expires "2020-01-01"
memory prune --dry-run
# Should show the expired memory
memory prune --force
# Should delete it

Step 1.7: CLI Integration

Effort: 2 hours
Status: Not Started

Files to create:

  • src/cli.js
  • bin/memory

Implementation:

// src/cli.js
import { Command } from 'commander';

const program = new Command();

program
  .name('memory')
  .description('AI Agent Memory System')
  .version('1.0.0');

program
  .command('store <content>')
  .description('Store a new memory')
  .option('-t, --tags <tags>', 'Comma-separated tags')
  .option('-e, --expires <date>', 'Expiration date')
  .option('--by <agent>', 'Agent name')
  .action(storeCommand);

program
  .command('search <query>')
  .description('Search memories')
  .option('-t, --tags <tags>', 'Filter by tags')
  .option('--after <date>', 'Created after')
  .option('--before <date>', 'Created before')
  .option('--entered-by <agent>', 'Filter by agent')
  .option('-l, --limit <n>', 'Max results', '10')
  .action(searchCommand);

// ... other commands

program.parse();

Features:

  • All commands registered
  • Global options (--db, --verbose, --json)
  • Help text for all commands
  • Error handling
  • Exit codes (0=success, 1=error)

Checkpoint:

memory --help
# Should show all commands
memory store --help
# Should show store options

Step 1.8: Testing & Polish

Effort: 2 hours
Status: Not Started

Note: Integration tests written first for each feature (TDD approach).
This step is for final polish and comprehensive scenarios.

Files to enhance:

  • test/integration.test.js (should already have tests from Steps 1.3-1.6)
  • test/helpers/seed.js - Realistic data generation
  • test/fixtures/realistic-memories.js - Memory templates

Comprehensive test scenarios:

  • Full workflow: store → search → list → prune
  • Performance: 100 memories, search <50ms
  • Edge cases: empty query, no results, expired memories
  • Data validation: content length, invalid dates, malformed tags
  • Tag normalization: uppercase → lowercase, duplicates
  • Expiration: auto-filter in search, prune removes correctly

Checkpoint: All tests pass with npm test, >80% coverage (mostly integration)


Phase 1 Completion Criteria

  • All checkpoints passed
  • Can store memories with tags and expiration
  • Can search with basic LIKE matching
  • Can list recent memories
  • Can prune expired memories
  • Help text comprehensive
  • Tests passing (>80% coverage)
  • Database file created at ~/.config/opencode/memories.db

Validation test:

# Full workflow test
memory store "Docker Compose uses bridge networks by default" --tags docker,networking
memory store "Kubernetes pods share network namespace" --tags kubernetes,networking
memory search "networking" --tags docker
# Should return only Docker memory
memory list --limit 10
# Should show both memories
memory stats
# Should show 2 memories, 2 unique tags

Phase 2: FTS5 Migration

Goal: Production-grade search with FTS5
Status: Not Started
Trigger to Start: Dataset > 500 memories OR query latency > 500ms OR manual request

Step 2.1: Migration Script

Effort: 2 hours
Status: Not Started

Files to create:

  • src/db/migrations.js
  • src/db/migrations/002_fts5.js

Implementation:

export async function migrateToFTS5(db) {
  console.log('Migrating to FTS5...');
  
  // 1. Check if already migrated
  const version = db.prepare('SELECT value FROM metadata WHERE key = ?').get('schema_version');
  if (version.value >= 2) {
    console.log('Already on FTS5');
    return;
  }
  
  // 2. Create FTS5 table
  db.exec(`CREATE VIRTUAL TABLE memories_fts USING fts5(...)`);
  
  // 3. Populate from existing memories
  db.exec(`INSERT INTO memories_fts(rowid, content) SELECT id, content FROM memories`);
  
  // 4. Create triggers
  db.exec(`CREATE TRIGGER memories_ai AFTER INSERT...`);
  db.exec(`CREATE TRIGGER memories_ad AFTER DELETE...`);
  db.exec(`CREATE TRIGGER memories_au AFTER UPDATE...`);
  
  // 5. Update schema version
  db.prepare('UPDATE metadata SET value = ? WHERE key = ?').run('2', 'schema_version');
  
  console.log('Migration complete!');
}

Checkpoint: Run migration on test DB, verify FTS5 table exists and is populated


Step 2.2: FTS5 Search Implementation

Effort: 3 hours
Status: Not Started

Files to create:

  • src/search/fts.js

Features:

  • FTS5 MATCH query builder
  • Support boolean operators (AND/OR/NOT)
  • Phrase queries ("exact phrase")
  • Prefix matching (docker*)
  • BM25 relevance ranking
  • Combined with metadata filters

Checkpoint: FTS5 search returns results ranked by relevance


Step 2.3: CLI Command - Migrate

Effort: 1 hour
Status: Not Started

Files to create:

  • src/commands/migrate.js

Implementation:

memory migrate fts5
# Prompts for confirmation, runs migration

Checkpoint: Command successfully migrates Phase 1 DB to Phase 2


Phase 3: Fuzzy Layer

Goal: Handle typos and inexact matches
Status: Not Started
Trigger to Start: Manual request or need for fuzzy matching

Step 3.1: Trigram Infrastructure

Effort: 3 hours
Status: Not Started

Files to create:

  • src/db/migrations/003_trigrams.js
  • src/search/fuzzy.js

Features:

  • Trigram table creation
  • Trigram extraction function
  • Populate trigrams from existing memories
  • Trigger to maintain trigrams on insert/update

Step 3.2: Fuzzy Search Implementation

Effort: 4 hours
Status: Not Started

Features:

  • Trigram similarity calculation
  • Levenshtein distance implementation
  • Combined relevance scoring
  • Cascade logic (exact → fuzzy)
  • Configurable threshold

Step 3.3: CLI Integration

Effort: 2 hours
Status: Not Started

Features:

  • --fuzzy flag for search command
  • --threshold option
  • Auto-fuzzy when <5 results

Additional Features (Post-MVP)

Stats Command

Effort: 2 hours
Status: Not Started

memory stats
# Total memories: 1,234
# Total tags: 56
# Database size: 2.3 MB
# Most used tags: docker (123), kubernetes (89), nodejs (67)

memory stats --tags
# docker: 123
# kubernetes: 89
# nodejs: 67
# ...

memory stats --agents
# investigate-agent: 456
# optimize-agent: 234
# manual: 544

Export/Import Commands

Effort: 3 hours
Status: Not Started

memory export memories.json
# Exported 1,234 memories to memories.json

memory import memories.json
# Imported 1,234 memories

Agent Context Documentation

Effort: 3 hours
Status: Not Started

Files to create:

  • docs/AGENT_GUIDE.md
  • src/commands/agent-context.js
memory --agent-context
# Displays comprehensive guide for AI agents

Auto-Extraction (Remember Pattern)

Effort: 4 hours
Status: Not Started

Files to create:

  • src/extractors/remember.js

Features:

  • Regex pattern to detect *Remember*: [fact]
  • Auto-extract tags from content
  • Auto-detect expiration dates
  • Store extracted memories
  • Report extraction results

OpenCode Plugin Integration

Effort: 3 hours
Status: Not Started

Files to create:

  • plugin.js (root level for OpenCode)

Features:

  • Plugin registration
  • API exposure (store, search, extractRemember)
  • Lifecycle hooks (onInstall, onUninstall)
  • Command registration

Testing Strategy

TDD Philosophy: Integration-First Approach

Core Principles:

  1. Integration tests are primary - Test real workflows end-to-end
  2. Unit tests are rare - Only for complex algorithms (fuzzy matching, trigrams, Levenshtein)
  3. Test with real data - Use SQLite :memory: or temp files with realistic scenarios
  4. Watch-driven development - Run tests in watch mode, see failures, implement, see success

Testing Workflow:

# 1. Write integration test first (it will fail)
npm run test:watch

# 2. Run program manually to see behavior
node src/cli.js store "test"

# 3. Implement feature

# 4. Watch tests pass

# 5. Refine based on output

Integration Tests (Primary)

Coverage target: All major workflows

Test approach:

  • Use real SQLite database (:memory: for speed, temp file for persistence tests)
  • Simulate realistic data (10-100 memories per test)
  • Test actual CLI commands via Node API
  • Verify end-to-end behavior, not internal implementation

Test scenarios:

// test/integration.test.js
describe('Memory System Integration', () => {
  test('store and retrieve workflow', async () => {
    // Store memory
    await cli(['store', 'Docker uses bridge networks', '--tags', 'docker,networking']);
    
    // Search for it
    const results = await cli(['search', 'docker']);
    
    // Verify output
    expect(results).toContain('Docker uses bridge networks');
    expect(results).toContain('docker');
    expect(results).toContain('networking');
  });
  
  test('realistic dataset search performance', async () => {
    // Insert 100 realistic memories
    for (let i = 0; i < 100; i++) {
      await storeMemory(generateRealisticMemory());
    }
    
    // Search should be fast
    const start = Date.now();
    await cli(['search', 'docker']);
    const duration = Date.now() - start;
    
    expect(duration).toBeLessThan(50); // Phase 1 target
  });
});

Test data generation:

// test/fixtures/realistic-memories.js
export function generateRealisticMemory() {
  const templates = [
    { content: 'Docker Compose requires explicit subnet config when using multiple networks', tags: ['docker', 'networking'] },
    { content: 'PostgreSQL VACUUM FULL locks tables, use ANALYZE instead', tags: ['postgresql', 'performance'] },
    { content: 'Git worktree allows parallel branches without stashing', tags: ['git', 'workflow'] },
    // ... 50+ realistic templates
  ];
  return randomChoice(templates);
}

Unit Tests (Rare - Only When Necessary)

When to write unit tests:

  • Complex algorithms with edge cases (Levenshtein distance, trigram extraction)
  • Pure functions with clear inputs/outputs
  • Critical validation logic

When NOT to write unit tests:

  • Database queries (covered by integration tests)
  • CLI parsing (covered by integration tests)
  • Simple utilities (tag parsing, date formatting)

Example unit test (justified):

// test/unit/fuzzy.test.js - Complex algorithm worth unit testing
describe('Levenshtein distance', () => {
  test('calculates edit distance correctly', () => {
    expect(levenshtein('docker', 'dcoker')).toBe(2);
    expect(levenshtein('kubernetes', 'kuberntes')).toBe(2);
    expect(levenshtein('same', 'same')).toBe(0);
  });
  
  test('handles edge cases', () => {
    expect(levenshtein('', 'hello')).toBe(5);
    expect(levenshtein('a', '')).toBe(1);
  });
});

Test Data Management

For integration tests:

// Use :memory: database for fast, isolated tests
beforeEach(() => {
  db = new Database(':memory:');
  initSchema(db);
});

// Or use temp file for persistence testing
import { mkdtempSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';

beforeEach(() => {
  const tempDir = mkdtempSync(join(tmpdir(), 'llmemory-test-'));
  dbPath = join(tempDir, 'test.db');
  db = new Database(dbPath);
  initSchema(db);
});

afterEach(() => {
  db.close();
  // Cleanup temp files
});

Realistic data seeding:

// test/helpers/seed.js
export async function seedDatabase(db, count = 50) {
  const memories = [];
  
  for (let i = 0; i < count; i++) {
    memories.push({
      content: generateRealisticMemory(),
      tags: generateRealisticTags(),
      entered_by: randomChoice(['investigate-agent', 'optimize-agent', 'manual']),
      created_at: Date.now() - randomInt(0, 90 * 86400) // Random within 90 days
    });
  }
  
  // Bulk insert
  const insert = db.transaction((memories) => {
    for (const memory of memories) {
      storeMemory(db, memory);
    }
  });
  
  insert(memories);
  return memories;
}

Performance Tests

Run after each phase:

// Benchmark search latency
test('Phase 1 search <50ms for 500 memories', async () => {
  // Insert 500 test memories
  const start = Date.now();
  const results = await search('test query');
  const duration = Date.now() - start;
  expect(duration).toBeLessThan(50);
});

test('Phase 2 search <100ms for 10K memories', async () => {
  // Insert 10K test memories
  const start = Date.now();
  const results = await search('test query');
  const duration = Date.now() - start;
  expect(duration).toBeLessThan(100);
});

Documentation Roadmap

Phase 1 Docs

  • README.md - Quick start, installation, basic usage
  • CLI_REFERENCE.md - All commands and options
  • ARCHITECTURE.md - System design, schema, algorithms

Phase 2 Docs

  • AGENT_GUIDE.md - Comprehensive guide for AI agents
  • MIGRATION_GUIDE.md - Phase 1 → 2 → 3 instructions
  • QUERY_SYNTAX.md - FTS5 query patterns

Phase 3 Docs

  • API.md - Programmatic API for plugins
  • CONTRIBUTING.md - Development setup, testing
  • TROUBLESHOOTING.md - Common issues and solutions

Success Metrics

Phase 1 (MVP)

  • Can store/retrieve memories
  • Search works for exact matches
  • Performance: <50ms for 500 memories
  • Test coverage: >80%
  • No critical bugs

Phase 2 (FTS5)

  • Migration completes without data loss
  • Search quality improved (relevance ranking)
  • Performance: <100ms for 10K memories
  • Boolean operators work correctly

Phase 3 (Fuzzy)

  • Typos correctly matched (edit distance ≤2)
  • Fuzzy cascade improves result count
  • Performance: <200ms for 10K memories
  • No false positives (threshold tuned)

Overall

  • Agents use system regularly in workflows
  • Search results are high-quality (relevant)
  • Token-efficient (limited, ranked results)
  • No performance complaints
  • Documentation comprehensive

Development Workflow

Daily Checklist

  1. Pull latest changes
  2. Run tests: npm test
  3. Work on current step
  4. Write/update tests
  5. Update this document (mark checkboxes)
  6. Commit with clear message
  7. Update CHANGELOG.md

Before Phase Completion

  1. All checkpoints passed
  2. Tests passing (>80% coverage)
  3. Documentation updated
  4. Performance benchmarks run
  5. Manual testing completed
  6. Changelog updated

Commit Message Format

<type>(<scope>): <subject>

Examples:
feat(search): implement FTS5 search with BM25 ranking
fix(store): validate content length before insertion
docs(readme): add installation instructions
test(search): add integration tests for filters
refactor(db): extract connection logic to separate file

Troubleshooting

Common Issues

Issue: SQLite FTS5 not available Solution: Ensure SQLite version ≥3.35, check better-sqlite3 includes FTS5

Issue: Database locked errors Solution: Enable WAL mode: PRAGMA journal_mode = WAL

Issue: Slow searches with large dataset Solution: Check indexes exist, run ANALYZE, consider migration to next phase

Issue: Tag filtering not working Solution: Verify tag normalization (lowercase), check many-to-many joins


Next Session Continuation

For the next developer/AI agent:

  1. Check Current Phase: Review checkboxes in this file to see progress
  2. Run Tests: npm test to verify current state
  3. Check Database: sqlite3 ~/.config/opencode/memories.db .schema to see current schema version
  4. Review SPECIFICATION.md: Understand overall architecture
  5. Pick Next Step: Find first unchecked item in current phase
  6. Update This File: Mark completed checkboxes as you go

Quick Start Commands:

cd llmemory
npm install              # Install dependencies
npm test                 # Run test suite
npm run start -- --help  # Test CLI

Current Status: Phase 0 complete (planning/documentation), ready to begin Phase 1 implementation.

Estimated Time to MVP: 12-15 hours of focused development.


Resources


Changelog

2025-10-29 - Phase 0 Complete

  • Project structure defined
  • Comprehensive specification written
  • Implementation plan created
  • Agent investigation reports integrated
  • Ready for Phase 1 development