# LLMemory Testing Guide ## Testing Philosophy: Integration-First TDD This project uses **integration-first TDD** - we write integration tests that verify real workflows, not unit tests that verify implementation details. ## Core Principles ### 1. Integration Tests Are Primary **Why:** - Tests real behavior users/agents will experience - Less brittle (survives refactoring) - Higher confidence in system working correctly - Catches integration issues early **Example:** ```javascript // GOOD: Integration test test('store and search workflow', () => { // Test the actual workflow storeMemory(db, { content: 'Docker uses bridge networks', tags: 'docker' }); const results = searchMemories(db, 'docker'); expect(results[0].content).toContain('Docker'); }); // AVOID: Over-testing implementation details test('parseContent returns trimmed string', () => { expect(parseContent(' test ')).toBe('test'); }); // ^ This is probably already tested by integration tests ``` ### 2. Unit Tests Are Rare **Only write unit tests for:** - Complex algorithms (Levenshtein distance, trigram extraction) - Pure functions with many edge cases - Critical validation logic **Don't write unit tests for:** - Database queries (test via integration) - CLI argument parsing (test via integration) - Simple utilities (tag parsing, date formatting) - Anything already covered by integration tests **Rule of thumb:** Think twice before writing a unit test. Ask: "Is this already tested by my integration tests?" ### 3. Test With Realistic Data **Use real SQLite databases:** ```javascript beforeEach(() => { db = new Database(':memory:'); // Fast, isolated initSchema(db); // Seed with realistic data seedDatabase(db, 50); // 50 realistic memories }); ``` **Generate realistic test data:** ```javascript // test/helpers/seed.js export function generateRealisticMemory() { const templates = [ { content: 'Docker Compose requires explicit subnet config', tags: ['docker', 'networking'] }, { content: 'PostgreSQL VACUUM FULL locks tables', tags: ['postgresql', 'performance'] }, { content: 'Git worktree allows parallel branches', tags: ['git', 'workflow'] }, // 50+ realistic templates ]; return randomChoice(templates); } ``` **Why:** Tests should reflect real usage, not artificial toy data. ### 4. Watch-Driven Development **Workflow:** ```bash # Terminal 1: Watch mode (always running) npm run test:watch # Terminal 2: Manual testing node src/cli.js store "test memory" ``` **Steps:** 1. Write integration test (red/failing) 2. Watch test fail 3. Implement feature 4. Watch test pass (green) 5. Verify manually with CLI 6. Refine based on output ## TDD Workflow Example ### Example: Implementing Store Command **Step 1: Write Test First** ```javascript // test/integration.test.js describe('Store Command', () => { let db; beforeEach(() => { db = new Database(':memory:'); initSchema(db); }); test('stores memory with tags', () => { const result = storeMemory(db, { content: 'Docker uses bridge networks', tags: 'docker,networking' }); expect(result.id).toBeDefined(); // Verify in database const memory = db.prepare('SELECT * FROM memories WHERE id = ?').get(result.id); expect(memory.content).toBe('Docker uses bridge networks'); // Verify tags linked correctly const tags = db.prepare(` SELECT t.name FROM tags t JOIN memory_tags mt ON t.id = mt.tag_id WHERE mt.memory_id = ? `).all(result.id); expect(tags.map(t => t.name)).toEqual(['docker', 'networking']); }); test('rejects content over 10KB', () => { expect(() => { storeMemory(db, { content: 'x'.repeat(10001) }); }).toThrow('Content exceeds 10KB limit'); }); }); ``` **Step 2: Run Test (Watch It Fail)** ```bash $ npm run test:watch FAIL test/integration.test.js Store Command ✕ stores memory with tags (2 ms) ✕ rejects content over 10KB (1 ms) ● Store Command › stores memory with tags ReferenceError: storeMemory is not defined ``` **Step 3: Implement Feature** ```javascript // src/commands/store.js export function storeMemory(db, { content, tags, expires, entered_by }) { // Validate content if (content.length > 10000) { throw new Error('Content exceeds 10KB limit'); } // Insert memory const result = db.prepare(` INSERT INTO memories (content, entered_by, expires_at) VALUES (?, ?, ?) `).run(content, entered_by, expires); // Handle tags if (tags) { const tagList = tags.split(',').map(t => t.trim().toLowerCase()); linkTags(db, result.lastInsertRowid, tagList); } return { id: result.lastInsertRowid }; } ``` **Step 4: Watch Test Pass** ```bash PASS test/integration.test.js Store Command ✓ stores memory with tags (15 ms) ✓ rejects content over 10KB (3 ms) Tests: 2 passed, 2 total ``` **Step 5: Verify Manually** ```bash $ node src/cli.js store "Docker uses bridge networks" --tags docker,networking Memory #1 stored successfully $ node src/cli.js search "docker" [2025-10-29 12:45] docker, networking Docker uses bridge networks ``` **Step 6: Refine** ```javascript // Add more test cases based on manual testing test('normalizes tags to lowercase', () => { storeMemory(db, { content: 'test', tags: 'Docker,NETWORKING' }); const tags = db.prepare('SELECT name FROM tags').all(); expect(tags).toEqual([ { name: 'docker' }, { name: 'networking' } ]); }); ``` ## Test Organization ### Directory Structure ``` test/ ├── integration.test.js # PRIMARY - All main workflows ├── unit/ │ ├── fuzzy.test.js # RARE - Only complex algorithms │ └── levenshtein.test.js # RARE - Only complex algorithms ├── helpers/ │ ├── seed.js # Realistic data generation │ └── db.js # Database setup helpers └── fixtures/ └── realistic-memories.js # Memory templates ``` ### Integration Test Structure ```javascript // test/integration.test.js import { describe, test, expect, beforeEach, afterEach } from 'vitest'; import Database from 'better-sqlite3'; import { storeMemory, searchMemories } from '../src/commands/index.js'; import { initSchema } from '../src/db/schema.js'; import { seedDatabase } from './helpers/seed.js'; describe('Memory System Integration', () => { let db; beforeEach(() => { db = new Database(':memory:'); initSchema(db); }); afterEach(() => { db.close(); }); describe('Store and Retrieve', () => { test('stores and finds memory', () => { storeMemory(db, { content: 'test', tags: 'demo' }); const results = searchMemories(db, 'test'); expect(results).toHaveLength(1); }); }); describe('Search with Filters', () => { beforeEach(() => { seedDatabase(db, 50); // Realistic data }); test('filters by tags', () => { const results = searchMemories(db, 'docker', { tags: ['networking'] }); results.forEach(r => { expect(r.tags).toContain('networking'); }); }); }); describe('Performance', () => { test('searches 100 memories in <50ms', () => { seedDatabase(db, 100); const start = Date.now(); searchMemories(db, 'test'); const duration = Date.now() - start; expect(duration).toBeLessThan(50); }); }); }); ``` ## Unit Test Structure (Rare) **Only for complex algorithms:** ```javascript // test/unit/levenshtein.test.js import { describe, test, expect } from 'vitest'; import { levenshtein } from '../../src/search/fuzzy.js'; describe('Levenshtein Distance', () => { test('calculates edit distance correctly', () => { expect(levenshtein('docker', 'dcoker')).toBe(2); expect(levenshtein('kubernetes', 'kuberntes')).toBe(2); expect(levenshtein('same', 'same')).toBe(0); }); test('handles edge cases', () => { expect(levenshtein('', 'hello')).toBe(5); expect(levenshtein('a', '')).toBe(1); expect(levenshtein('', '')).toBe(0); }); test('handles unicode correctly', () => { expect(levenshtein('café', 'cafe')).toBe(1); }); }); ``` ## Test Data Helpers ### Realistic Memory Generation ```javascript // test/helpers/seed.js const REALISTIC_MEMORIES = [ { content: 'Docker Compose uses bridge networks by default. Custom networks require explicit subnet config.', tags: ['docker', 'networking'] }, { content: 'PostgreSQL VACUUM FULL locks tables and requires 2x disk space. Use VACUUM ANALYZE for production.', tags: ['postgresql', 'performance'] }, { content: 'Git worktree allows working on multiple branches without stashing. Use: git worktree add ../branch branch-name', tags: ['git', 'workflow'] }, { content: 'NixOS flake.lock must be committed to git for reproducible builds across machines', tags: ['nixos', 'build-system'] }, { content: 'TypeScript 5.0+ const type parameters preserve literal types: function id(x: T): T', tags: ['typescript', 'types'] }, // ... 50+ more realistic examples ]; export function generateRealisticMemory() { return { ...randomChoice(REALISTIC_MEMORIES) }; } export function seedDatabase(db, count = 50) { const insert = db.prepare(` INSERT INTO memories (content, entered_by, created_at) VALUES (?, ?, ?) `); const insertMany = db.transaction((memories) => { for (const memory of memories) { const result = insert.run( memory.content, randomChoice(['investigate-agent', 'optimize-agent', 'manual']), Date.now() - randomInt(0, 90 * 86400000) // Random within 90 days ); // Link tags if (memory.tags) { linkTags(db, result.lastInsertRowid, memory.tags); } } }); const memories = Array.from({ length: count }, () => generateRealisticMemory()); insertMany(memories); } function randomChoice(arr) { return arr[Math.floor(Math.random() * arr.length)]; } function randomInt(min, max) { return Math.floor(Math.random() * (max - min + 1)) + min; } ``` ## Running Tests ```bash # Watch mode (primary workflow) npm run test:watch # Run once npm test # With coverage npm run test:coverage # Specific test file npm test integration.test.js # Run in CI (no watch) npm test -- --run ``` ## Coverage Guidelines **Target: >80% coverage, but favor integration over unit** **What to measure:** - Are all major workflows tested? (store, search, list, prune) - Are edge cases covered? (empty data, expired memories, invalid input) - Are performance targets met? (<50ms search for Phase 1) **What NOT to obsess over:** - 100% line coverage (diminishing returns) - Testing every internal function (if covered by integration tests) - Testing framework code (CLI parsing, DB driver) **Check coverage:** ```bash npm run test:coverage # View HTML report open coverage/index.html ``` ## Examples of Good vs Bad Tests ### ✅ Good: Integration Test ```javascript test('full workflow: store, search, list, prune', () => { // Store memories storeMemory(db, { content: 'Memory 1', tags: 'test' }); storeMemory(db, { content: 'Memory 2', tags: 'test', expires_at: Date.now() - 1000 }); // Search finds active memory const results = searchMemories(db, 'Memory'); expect(results).toHaveLength(2); // Both found initially // List shows both const all = listMemories(db); expect(all).toHaveLength(2); // Prune removes expired const pruned = pruneMemories(db); expect(pruned.count).toBe(1); // Search now finds only active const afterPrune = searchMemories(db, 'Memory'); expect(afterPrune).toHaveLength(1); }); ``` ### ❌ Bad: Over-Testing Implementation ```javascript // AVOID: Testing internal implementation details test('parseTagString splits on comma', () => { expect(parseTagString('a,b,c')).toEqual(['a', 'b', 'c']); }); test('normalizeTag converts to lowercase', () => { expect(normalizeTag('Docker')).toBe('docker'); }); // These are implementation details already covered by integration tests! ``` ### ✅ Good: Unit Test (Justified) ```javascript // Complex algorithm worth isolated testing test('levenshtein distance edge cases', () => { // Empty strings expect(levenshtein('', '')).toBe(0); expect(levenshtein('abc', '')).toBe(3); // Unicode expect(levenshtein('café', 'cafe')).toBe(1); // Long strings const long1 = 'a'.repeat(1000); const long2 = 'a'.repeat(999) + 'b'; expect(levenshtein(long1, long2)).toBe(1); }); ``` ## Debugging Failed Tests ### 1. Use `.only` to Focus ```javascript test.only('this specific test', () => { // Only runs this test }); ``` ### 2. Inspect Database State ```javascript test('debug search', () => { storeMemory(db, { content: 'test' }); // Inspect what's in DB const all = db.prepare('SELECT * FROM memories').all(); console.log('Database contents:', all); const results = searchMemories(db, 'test'); console.log('Search results:', results); expect(results).toHaveLength(1); }); ``` ### 3. Use Temp File for Manual Inspection ```javascript test('debug with file', () => { const db = new Database('/tmp/debug.db'); initSchema(db); storeMemory(db, { content: 'test' }); // Now inspect with: sqlite3 /tmp/debug.db }); ``` ## Summary **DO:** - ✅ Write integration tests for all workflows - ✅ Use realistic data (50-100 memories) - ✅ Test with `:memory:` database - ✅ Run in watch mode (`npm run test:watch`) - ✅ Verify manually with CLI after tests pass - ✅ Think twice before writing unit tests **DON'T:** - ❌ Test implementation details - ❌ Write unit tests for simple functions - ❌ Use toy data (1-2 memories) - ❌ Mock database or CLI (test the real thing) - ❌ Aim for 100% coverage at expense of test quality **Remember:** Integration tests that verify real workflows are worth more than 100 unit tests that verify implementation details. --- **Testing Philosophy:** Integration-first TDD with realistic data **Coverage Target:** >80% (mostly integration tests) **Unit Tests:** Rare, only for complex algorithms **Workflow:** Write test (fail) → Implement (pass) → Verify (manual) → Refine