nixos/shared/linked-dotfiles/opencode/llmemory/docs/TESTING.md
2025-10-29 18:46:16 -06:00

530 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# LLMemory Testing Guide
## Testing Philosophy: Integration-First TDD
This project uses **integration-first TDD** - we write integration tests that verify real workflows, not unit tests that verify implementation details.
## Core Principles
### 1. Integration Tests Are Primary
**Why:**
- Tests real behavior users/agents will experience
- Less brittle (survives refactoring)
- Higher confidence in system working correctly
- Catches integration issues early
**Example:**
```javascript
// GOOD: Integration test
test('store and search workflow', () => {
// Test the actual workflow
storeMemory(db, { content: 'Docker uses bridge networks', tags: 'docker' });
const results = searchMemories(db, 'docker');
expect(results[0].content).toContain('Docker');
});
// AVOID: Over-testing implementation details
test('parseContent returns trimmed string', () => {
expect(parseContent(' test ')).toBe('test');
});
// ^ This is probably already tested by integration tests
```
### 2. Unit Tests Are Rare
**Only write unit tests for:**
- Complex algorithms (Levenshtein distance, trigram extraction)
- Pure functions with many edge cases
- Critical validation logic
**Don't write unit tests for:**
- Database queries (test via integration)
- CLI argument parsing (test via integration)
- Simple utilities (tag parsing, date formatting)
- Anything already covered by integration tests
**Rule of thumb:** Think twice before writing a unit test. Ask: "Is this already tested by my integration tests?"
### 3. Test With Realistic Data
**Use real SQLite databases:**
```javascript
beforeEach(() => {
db = new Database(':memory:'); // Fast, isolated
initSchema(db);
// Seed with realistic data
seedDatabase(db, 50); // 50 realistic memories
});
```
**Generate realistic test data:**
```javascript
// test/helpers/seed.js
export function generateRealisticMemory() {
const templates = [
{ content: 'Docker Compose requires explicit subnet config', tags: ['docker', 'networking'] },
{ content: 'PostgreSQL VACUUM FULL locks tables', tags: ['postgresql', 'performance'] },
{ content: 'Git worktree allows parallel branches', tags: ['git', 'workflow'] },
// 50+ realistic templates
];
return randomChoice(templates);
}
```
**Why:** Tests should reflect real usage, not artificial toy data.
### 4. Watch-Driven Development
**Workflow:**
```bash
# Terminal 1: Watch mode (always running)
npm run test:watch
# Terminal 2: Manual testing
node src/cli.js store "test memory"
```
**Steps:**
1. Write integration test (red/failing)
2. Watch test fail
3. Implement feature
4. Watch test pass (green)
5. Verify manually with CLI
6. Refine based on output
## TDD Workflow Example
### Example: Implementing Store Command
**Step 1: Write Test First**
```javascript
// test/integration.test.js
describe('Store Command', () => {
let db;
beforeEach(() => {
db = new Database(':memory:');
initSchema(db);
});
test('stores memory with tags', () => {
const result = storeMemory(db, {
content: 'Docker uses bridge networks',
tags: 'docker,networking'
});
expect(result.id).toBeDefined();
// Verify in database
const memory = db.prepare('SELECT * FROM memories WHERE id = ?').get(result.id);
expect(memory.content).toBe('Docker uses bridge networks');
// Verify tags linked correctly
const tags = db.prepare(`
SELECT t.name FROM tags t
JOIN memory_tags mt ON t.id = mt.tag_id
WHERE mt.memory_id = ?
`).all(result.id);
expect(tags.map(t => t.name)).toEqual(['docker', 'networking']);
});
test('rejects content over 10KB', () => {
expect(() => {
storeMemory(db, { content: 'x'.repeat(10001) });
}).toThrow('Content exceeds 10KB limit');
});
});
```
**Step 2: Run Test (Watch It Fail)**
```bash
$ npm run test:watch
FAIL test/integration.test.js
Store Command
✕ stores memory with tags (2 ms)
✕ rejects content over 10KB (1 ms)
● Store Command stores memory with tags
ReferenceError: storeMemory is not defined
```
**Step 3: Implement Feature**
```javascript
// src/commands/store.js
export function storeMemory(db, { content, tags, expires, entered_by }) {
// Validate content
if (content.length > 10000) {
throw new Error('Content exceeds 10KB limit');
}
// Insert memory
const result = db.prepare(`
INSERT INTO memories (content, entered_by, expires_at)
VALUES (?, ?, ?)
`).run(content, entered_by, expires);
// Handle tags
if (tags) {
const tagList = tags.split(',').map(t => t.trim().toLowerCase());
linkTags(db, result.lastInsertRowid, tagList);
}
return { id: result.lastInsertRowid };
}
```
**Step 4: Watch Test Pass**
```bash
PASS test/integration.test.js
Store Command
✓ stores memory with tags (15 ms)
✓ rejects content over 10KB (3 ms)
Tests: 2 passed, 2 total
```
**Step 5: Verify Manually**
```bash
$ node src/cli.js store "Docker uses bridge networks" --tags docker,networking
Memory #1 stored successfully
$ node src/cli.js search "docker"
[2025-10-29 12:45] docker, networking
Docker uses bridge networks
```
**Step 6: Refine**
```javascript
// Add more test cases based on manual testing
test('normalizes tags to lowercase', () => {
storeMemory(db, { content: 'test', tags: 'Docker,NETWORKING' });
const tags = db.prepare('SELECT name FROM tags').all();
expect(tags).toEqual([
{ name: 'docker' },
{ name: 'networking' }
]);
});
```
## Test Organization
### Directory Structure
```
test/
├── integration.test.js # PRIMARY - All main workflows
├── unit/
│ ├── fuzzy.test.js # RARE - Only complex algorithms
│ └── levenshtein.test.js # RARE - Only complex algorithms
├── helpers/
│ ├── seed.js # Realistic data generation
│ └── db.js # Database setup helpers
└── fixtures/
└── realistic-memories.js # Memory templates
```
### Integration Test Structure
```javascript
// test/integration.test.js
import { describe, test, expect, beforeEach, afterEach } from 'vitest';
import Database from 'better-sqlite3';
import { storeMemory, searchMemories } from '../src/commands/index.js';
import { initSchema } from '../src/db/schema.js';
import { seedDatabase } from './helpers/seed.js';
describe('Memory System Integration', () => {
let db;
beforeEach(() => {
db = new Database(':memory:');
initSchema(db);
});
afterEach(() => {
db.close();
});
describe('Store and Retrieve', () => {
test('stores and finds memory', () => {
storeMemory(db, { content: 'test', tags: 'demo' });
const results = searchMemories(db, 'test');
expect(results).toHaveLength(1);
});
});
describe('Search with Filters', () => {
beforeEach(() => {
seedDatabase(db, 50); // Realistic data
});
test('filters by tags', () => {
const results = searchMemories(db, 'docker', { tags: ['networking'] });
results.forEach(r => {
expect(r.tags).toContain('networking');
});
});
});
describe('Performance', () => {
test('searches 100 memories in <50ms', () => {
seedDatabase(db, 100);
const start = Date.now();
searchMemories(db, 'test');
const duration = Date.now() - start;
expect(duration).toBeLessThan(50);
});
});
});
```
## Unit Test Structure (Rare)
**Only for complex algorithms:**
```javascript
// test/unit/levenshtein.test.js
import { describe, test, expect } from 'vitest';
import { levenshtein } from '../../src/search/fuzzy.js';
describe('Levenshtein Distance', () => {
test('calculates edit distance correctly', () => {
expect(levenshtein('docker', 'dcoker')).toBe(2);
expect(levenshtein('kubernetes', 'kuberntes')).toBe(2);
expect(levenshtein('same', 'same')).toBe(0);
});
test('handles edge cases', () => {
expect(levenshtein('', 'hello')).toBe(5);
expect(levenshtein('a', '')).toBe(1);
expect(levenshtein('', '')).toBe(0);
});
test('handles unicode correctly', () => {
expect(levenshtein('café', 'cafe')).toBe(1);
});
});
```
## Test Data Helpers
### Realistic Memory Generation
```javascript
// test/helpers/seed.js
const REALISTIC_MEMORIES = [
{ content: 'Docker Compose uses bridge networks by default. Custom networks require explicit subnet config.', tags: ['docker', 'networking'] },
{ content: 'PostgreSQL VACUUM FULL locks tables and requires 2x disk space. Use VACUUM ANALYZE for production.', tags: ['postgresql', 'performance'] },
{ content: 'Git worktree allows working on multiple branches without stashing. Use: git worktree add ../branch branch-name', tags: ['git', 'workflow'] },
{ content: 'NixOS flake.lock must be committed to git for reproducible builds across machines', tags: ['nixos', 'build-system'] },
{ content: 'TypeScript 5.0+ const type parameters preserve literal types: function id<const T>(x: T): T', tags: ['typescript', 'types'] },
// ... 50+ more realistic examples
];
export function generateRealisticMemory() {
return { ...randomChoice(REALISTIC_MEMORIES) };
}
export function seedDatabase(db, count = 50) {
const insert = db.prepare(`
INSERT INTO memories (content, entered_by, created_at)
VALUES (?, ?, ?)
`);
const insertMany = db.transaction((memories) => {
for (const memory of memories) {
const result = insert.run(
memory.content,
randomChoice(['investigate-agent', 'optimize-agent', 'manual']),
Date.now() - randomInt(0, 90 * 86400000) // Random within 90 days
);
// Link tags
if (memory.tags) {
linkTags(db, result.lastInsertRowid, memory.tags);
}
}
});
const memories = Array.from({ length: count }, () => generateRealisticMemory());
insertMany(memories);
}
function randomChoice(arr) {
return arr[Math.floor(Math.random() * arr.length)];
}
function randomInt(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
```
## Running Tests
```bash
# Watch mode (primary workflow)
npm run test:watch
# Run once
npm test
# With coverage
npm run test:coverage
# Specific test file
npm test integration.test.js
# Run in CI (no watch)
npm test -- --run
```
## Coverage Guidelines
**Target: >80% coverage, but favor integration over unit**
**What to measure:**
- Are all major workflows tested? (store, search, list, prune)
- Are edge cases covered? (empty data, expired memories, invalid input)
- Are performance targets met? (<50ms search for Phase 1)
**What NOT to obsess over:**
- 100% line coverage (diminishing returns)
- Testing every internal function (if covered by integration tests)
- Testing framework code (CLI parsing, DB driver)
**Check coverage:**
```bash
npm run test:coverage
# View HTML report
open coverage/index.html
```
## Examples of Good vs Bad Tests
### ✅ Good: Integration Test
```javascript
test('full workflow: store, search, list, prune', () => {
// Store memories
storeMemory(db, { content: 'Memory 1', tags: 'test' });
storeMemory(db, { content: 'Memory 2', tags: 'test', expires_at: Date.now() - 1000 });
// Search finds active memory
const results = searchMemories(db, 'Memory');
expect(results).toHaveLength(2); // Both found initially
// List shows both
const all = listMemories(db);
expect(all).toHaveLength(2);
// Prune removes expired
const pruned = pruneMemories(db);
expect(pruned.count).toBe(1);
// Search now finds only active
const afterPrune = searchMemories(db, 'Memory');
expect(afterPrune).toHaveLength(1);
});
```
### ❌ Bad: Over-Testing Implementation
```javascript
// AVOID: Testing internal implementation details
test('parseTagString splits on comma', () => {
expect(parseTagString('a,b,c')).toEqual(['a', 'b', 'c']);
});
test('normalizeTag converts to lowercase', () => {
expect(normalizeTag('Docker')).toBe('docker');
});
// These are implementation details already covered by integration tests!
```
### ✅ Good: Unit Test (Justified)
```javascript
// Complex algorithm worth isolated testing
test('levenshtein distance edge cases', () => {
// Empty strings
expect(levenshtein('', '')).toBe(0);
expect(levenshtein('abc', '')).toBe(3);
// Unicode
expect(levenshtein('café', 'cafe')).toBe(1);
// Long strings
const long1 = 'a'.repeat(1000);
const long2 = 'a'.repeat(999) + 'b';
expect(levenshtein(long1, long2)).toBe(1);
});
```
## Debugging Failed Tests
### 1. Use `.only` to Focus
```javascript
test.only('this specific test', () => {
// Only runs this test
});
```
### 2. Inspect Database State
```javascript
test('debug search', () => {
storeMemory(db, { content: 'test' });
// Inspect what's in DB
const all = db.prepare('SELECT * FROM memories').all();
console.log('Database contents:', all);
const results = searchMemories(db, 'test');
console.log('Search results:', results);
expect(results).toHaveLength(1);
});
```
### 3. Use Temp File for Manual Inspection
```javascript
test('debug with file', () => {
const db = new Database('/tmp/debug.db');
initSchema(db);
storeMemory(db, { content: 'test' });
// Now inspect with: sqlite3 /tmp/debug.db
});
```
## Summary
**DO:**
- Write integration tests for all workflows
- Use realistic data (50-100 memories)
- Test with `:memory:` database
- Run in watch mode (`npm run test:watch`)
- Verify manually with CLI after tests pass
- Think twice before writing unit tests
**DON'T:**
- Test implementation details
- Write unit tests for simple functions
- Use toy data (1-2 memories)
- Mock database or CLI (test the real thing)
- Aim for 100% coverage at expense of test quality
**Remember:** Integration tests that verify real workflows are worth more than 100 unit tests that verify implementation details.
---
**Testing Philosophy:** Integration-first TDD with realistic data
**Coverage Target:** >80% (mostly integration tests)
**Unit Tests:** Rare, only for complex algorithms
**Workflow:** Write test (fail) → Implement (pass) → Verify (manual) → Refine