530 lines
14 KiB
Markdown
530 lines
14 KiB
Markdown
# LLMemory Testing Guide
|
||
|
||
## Testing Philosophy: Integration-First TDD
|
||
|
||
This project uses **integration-first TDD** - we write integration tests that verify real workflows, not unit tests that verify implementation details.
|
||
|
||
## Core Principles
|
||
|
||
### 1. Integration Tests Are Primary
|
||
|
||
**Why:**
|
||
- Tests real behavior users/agents will experience
|
||
- Less brittle (survives refactoring)
|
||
- Higher confidence in system working correctly
|
||
- Catches integration issues early
|
||
|
||
**Example:**
|
||
```javascript
|
||
// GOOD: Integration test
|
||
test('store and search workflow', () => {
|
||
// Test the actual workflow
|
||
storeMemory(db, { content: 'Docker uses bridge networks', tags: 'docker' });
|
||
const results = searchMemories(db, 'docker');
|
||
expect(results[0].content).toContain('Docker');
|
||
});
|
||
|
||
// AVOID: Over-testing implementation details
|
||
test('parseContent returns trimmed string', () => {
|
||
expect(parseContent(' test ')).toBe('test');
|
||
});
|
||
// ^ This is probably already tested by integration tests
|
||
```
|
||
|
||
### 2. Unit Tests Are Rare
|
||
|
||
**Only write unit tests for:**
|
||
- Complex algorithms (Levenshtein distance, trigram extraction)
|
||
- Pure functions with many edge cases
|
||
- Critical validation logic
|
||
|
||
**Don't write unit tests for:**
|
||
- Database queries (test via integration)
|
||
- CLI argument parsing (test via integration)
|
||
- Simple utilities (tag parsing, date formatting)
|
||
- Anything already covered by integration tests
|
||
|
||
**Rule of thumb:** Think twice before writing a unit test. Ask: "Is this already tested by my integration tests?"
|
||
|
||
### 3. Test With Realistic Data
|
||
|
||
**Use real SQLite databases:**
|
||
```javascript
|
||
beforeEach(() => {
|
||
db = new Database(':memory:'); // Fast, isolated
|
||
initSchema(db);
|
||
|
||
// Seed with realistic data
|
||
seedDatabase(db, 50); // 50 realistic memories
|
||
});
|
||
```
|
||
|
||
**Generate realistic test data:**
|
||
```javascript
|
||
// test/helpers/seed.js
|
||
export function generateRealisticMemory() {
|
||
const templates = [
|
||
{ content: 'Docker Compose requires explicit subnet config', tags: ['docker', 'networking'] },
|
||
{ content: 'PostgreSQL VACUUM FULL locks tables', tags: ['postgresql', 'performance'] },
|
||
{ content: 'Git worktree allows parallel branches', tags: ['git', 'workflow'] },
|
||
// 50+ realistic templates
|
||
];
|
||
return randomChoice(templates);
|
||
}
|
||
```
|
||
|
||
**Why:** Tests should reflect real usage, not artificial toy data.
|
||
|
||
### 4. Watch-Driven Development
|
||
|
||
**Workflow:**
|
||
```bash
|
||
# Terminal 1: Watch mode (always running)
|
||
npm run test:watch
|
||
|
||
# Terminal 2: Manual testing
|
||
node src/cli.js store "test memory"
|
||
```
|
||
|
||
**Steps:**
|
||
1. Write integration test (red/failing)
|
||
2. Watch test fail
|
||
3. Implement feature
|
||
4. Watch test pass (green)
|
||
5. Verify manually with CLI
|
||
6. Refine based on output
|
||
|
||
## TDD Workflow Example
|
||
|
||
### Example: Implementing Store Command
|
||
|
||
**Step 1: Write Test First**
|
||
```javascript
|
||
// test/integration.test.js
|
||
describe('Store Command', () => {
|
||
let db;
|
||
|
||
beforeEach(() => {
|
||
db = new Database(':memory:');
|
||
initSchema(db);
|
||
});
|
||
|
||
test('stores memory with tags', () => {
|
||
const result = storeMemory(db, {
|
||
content: 'Docker uses bridge networks',
|
||
tags: 'docker,networking'
|
||
});
|
||
|
||
expect(result.id).toBeDefined();
|
||
|
||
// Verify in database
|
||
const memory = db.prepare('SELECT * FROM memories WHERE id = ?').get(result.id);
|
||
expect(memory.content).toBe('Docker uses bridge networks');
|
||
|
||
// Verify tags linked correctly
|
||
const tags = db.prepare(`
|
||
SELECT t.name FROM tags t
|
||
JOIN memory_tags mt ON t.id = mt.tag_id
|
||
WHERE mt.memory_id = ?
|
||
`).all(result.id);
|
||
|
||
expect(tags.map(t => t.name)).toEqual(['docker', 'networking']);
|
||
});
|
||
|
||
test('rejects content over 10KB', () => {
|
||
expect(() => {
|
||
storeMemory(db, { content: 'x'.repeat(10001) });
|
||
}).toThrow('Content exceeds 10KB limit');
|
||
});
|
||
});
|
||
```
|
||
|
||
**Step 2: Run Test (Watch It Fail)**
|
||
```bash
|
||
$ npm run test:watch
|
||
|
||
FAIL test/integration.test.js
|
||
Store Command
|
||
✕ stores memory with tags (2 ms)
|
||
✕ rejects content over 10KB (1 ms)
|
||
|
||
● Store Command › stores memory with tags
|
||
ReferenceError: storeMemory is not defined
|
||
```
|
||
|
||
**Step 3: Implement Feature**
|
||
```javascript
|
||
// src/commands/store.js
|
||
export function storeMemory(db, { content, tags, expires, entered_by }) {
|
||
// Validate content
|
||
if (content.length > 10000) {
|
||
throw new Error('Content exceeds 10KB limit');
|
||
}
|
||
|
||
// Insert memory
|
||
const result = db.prepare(`
|
||
INSERT INTO memories (content, entered_by, expires_at)
|
||
VALUES (?, ?, ?)
|
||
`).run(content, entered_by, expires);
|
||
|
||
// Handle tags
|
||
if (tags) {
|
||
const tagList = tags.split(',').map(t => t.trim().toLowerCase());
|
||
linkTags(db, result.lastInsertRowid, tagList);
|
||
}
|
||
|
||
return { id: result.lastInsertRowid };
|
||
}
|
||
```
|
||
|
||
**Step 4: Watch Test Pass**
|
||
```bash
|
||
PASS test/integration.test.js
|
||
Store Command
|
||
✓ stores memory with tags (15 ms)
|
||
✓ rejects content over 10KB (3 ms)
|
||
|
||
Tests: 2 passed, 2 total
|
||
```
|
||
|
||
**Step 5: Verify Manually**
|
||
```bash
|
||
$ node src/cli.js store "Docker uses bridge networks" --tags docker,networking
|
||
Memory #1 stored successfully
|
||
|
||
$ node src/cli.js search "docker"
|
||
[2025-10-29 12:45] docker, networking
|
||
Docker uses bridge networks
|
||
```
|
||
|
||
**Step 6: Refine**
|
||
```javascript
|
||
// Add more test cases based on manual testing
|
||
test('normalizes tags to lowercase', () => {
|
||
storeMemory(db, { content: 'test', tags: 'Docker,NETWORKING' });
|
||
|
||
const tags = db.prepare('SELECT name FROM tags').all();
|
||
expect(tags).toEqual([
|
||
{ name: 'docker' },
|
||
{ name: 'networking' }
|
||
]);
|
||
});
|
||
```
|
||
|
||
## Test Organization
|
||
|
||
### Directory Structure
|
||
```
|
||
test/
|
||
├── integration.test.js # PRIMARY - All main workflows
|
||
├── unit/
|
||
│ ├── fuzzy.test.js # RARE - Only complex algorithms
|
||
│ └── levenshtein.test.js # RARE - Only complex algorithms
|
||
├── helpers/
|
||
│ ├── seed.js # Realistic data generation
|
||
│ └── db.js # Database setup helpers
|
||
└── fixtures/
|
||
└── realistic-memories.js # Memory templates
|
||
```
|
||
|
||
### Integration Test Structure
|
||
|
||
```javascript
|
||
// test/integration.test.js
|
||
import { describe, test, expect, beforeEach, afterEach } from 'vitest';
|
||
import Database from 'better-sqlite3';
|
||
import { storeMemory, searchMemories } from '../src/commands/index.js';
|
||
import { initSchema } from '../src/db/schema.js';
|
||
import { seedDatabase } from './helpers/seed.js';
|
||
|
||
describe('Memory System Integration', () => {
|
||
let db;
|
||
|
||
beforeEach(() => {
|
||
db = new Database(':memory:');
|
||
initSchema(db);
|
||
});
|
||
|
||
afterEach(() => {
|
||
db.close();
|
||
});
|
||
|
||
describe('Store and Retrieve', () => {
|
||
test('stores and finds memory', () => {
|
||
storeMemory(db, { content: 'test', tags: 'demo' });
|
||
const results = searchMemories(db, 'test');
|
||
expect(results).toHaveLength(1);
|
||
});
|
||
});
|
||
|
||
describe('Search with Filters', () => {
|
||
beforeEach(() => {
|
||
seedDatabase(db, 50); // Realistic data
|
||
});
|
||
|
||
test('filters by tags', () => {
|
||
const results = searchMemories(db, 'docker', { tags: ['networking'] });
|
||
results.forEach(r => {
|
||
expect(r.tags).toContain('networking');
|
||
});
|
||
});
|
||
});
|
||
|
||
describe('Performance', () => {
|
||
test('searches 100 memories in <50ms', () => {
|
||
seedDatabase(db, 100);
|
||
|
||
const start = Date.now();
|
||
searchMemories(db, 'test');
|
||
const duration = Date.now() - start;
|
||
|
||
expect(duration).toBeLessThan(50);
|
||
});
|
||
});
|
||
});
|
||
```
|
||
|
||
## Unit Test Structure (Rare)
|
||
|
||
**Only for complex algorithms:**
|
||
|
||
```javascript
|
||
// test/unit/levenshtein.test.js
|
||
import { describe, test, expect } from 'vitest';
|
||
import { levenshtein } from '../../src/search/fuzzy.js';
|
||
|
||
describe('Levenshtein Distance', () => {
|
||
test('calculates edit distance correctly', () => {
|
||
expect(levenshtein('docker', 'dcoker')).toBe(2);
|
||
expect(levenshtein('kubernetes', 'kuberntes')).toBe(2);
|
||
expect(levenshtein('same', 'same')).toBe(0);
|
||
});
|
||
|
||
test('handles edge cases', () => {
|
||
expect(levenshtein('', 'hello')).toBe(5);
|
||
expect(levenshtein('a', '')).toBe(1);
|
||
expect(levenshtein('', '')).toBe(0);
|
||
});
|
||
|
||
test('handles unicode correctly', () => {
|
||
expect(levenshtein('café', 'cafe')).toBe(1);
|
||
});
|
||
});
|
||
```
|
||
|
||
## Test Data Helpers
|
||
|
||
### Realistic Memory Generation
|
||
|
||
```javascript
|
||
// test/helpers/seed.js
|
||
const REALISTIC_MEMORIES = [
|
||
{ content: 'Docker Compose uses bridge networks by default. Custom networks require explicit subnet config.', tags: ['docker', 'networking'] },
|
||
{ content: 'PostgreSQL VACUUM FULL locks tables and requires 2x disk space. Use VACUUM ANALYZE for production.', tags: ['postgresql', 'performance'] },
|
||
{ content: 'Git worktree allows working on multiple branches without stashing. Use: git worktree add ../branch branch-name', tags: ['git', 'workflow'] },
|
||
{ content: 'NixOS flake.lock must be committed to git for reproducible builds across machines', tags: ['nixos', 'build-system'] },
|
||
{ content: 'TypeScript 5.0+ const type parameters preserve literal types: function id<const T>(x: T): T', tags: ['typescript', 'types'] },
|
||
// ... 50+ more realistic examples
|
||
];
|
||
|
||
export function generateRealisticMemory() {
|
||
return { ...randomChoice(REALISTIC_MEMORIES) };
|
||
}
|
||
|
||
export function seedDatabase(db, count = 50) {
|
||
const insert = db.prepare(`
|
||
INSERT INTO memories (content, entered_by, created_at)
|
||
VALUES (?, ?, ?)
|
||
`);
|
||
|
||
const insertMany = db.transaction((memories) => {
|
||
for (const memory of memories) {
|
||
const result = insert.run(
|
||
memory.content,
|
||
randomChoice(['investigate-agent', 'optimize-agent', 'manual']),
|
||
Date.now() - randomInt(0, 90 * 86400000) // Random within 90 days
|
||
);
|
||
|
||
// Link tags
|
||
if (memory.tags) {
|
||
linkTags(db, result.lastInsertRowid, memory.tags);
|
||
}
|
||
}
|
||
});
|
||
|
||
const memories = Array.from({ length: count }, () => generateRealisticMemory());
|
||
insertMany(memories);
|
||
}
|
||
|
||
function randomChoice(arr) {
|
||
return arr[Math.floor(Math.random() * arr.length)];
|
||
}
|
||
|
||
function randomInt(min, max) {
|
||
return Math.floor(Math.random() * (max - min + 1)) + min;
|
||
}
|
||
```
|
||
|
||
## Running Tests
|
||
|
||
```bash
|
||
# Watch mode (primary workflow)
|
||
npm run test:watch
|
||
|
||
# Run once
|
||
npm test
|
||
|
||
# With coverage
|
||
npm run test:coverage
|
||
|
||
# Specific test file
|
||
npm test integration.test.js
|
||
|
||
# Run in CI (no watch)
|
||
npm test -- --run
|
||
```
|
||
|
||
## Coverage Guidelines
|
||
|
||
**Target: >80% coverage, but favor integration over unit**
|
||
|
||
**What to measure:**
|
||
- Are all major workflows tested? (store, search, list, prune)
|
||
- Are edge cases covered? (empty data, expired memories, invalid input)
|
||
- Are performance targets met? (<50ms search for Phase 1)
|
||
|
||
**What NOT to obsess over:**
|
||
- 100% line coverage (diminishing returns)
|
||
- Testing every internal function (if covered by integration tests)
|
||
- Testing framework code (CLI parsing, DB driver)
|
||
|
||
**Check coverage:**
|
||
```bash
|
||
npm run test:coverage
|
||
|
||
# View HTML report
|
||
open coverage/index.html
|
||
```
|
||
|
||
## Examples of Good vs Bad Tests
|
||
|
||
### ✅ Good: Integration Test
|
||
```javascript
|
||
test('full workflow: store, search, list, prune', () => {
|
||
// Store memories
|
||
storeMemory(db, { content: 'Memory 1', tags: 'test' });
|
||
storeMemory(db, { content: 'Memory 2', tags: 'test', expires_at: Date.now() - 1000 });
|
||
|
||
// Search finds active memory
|
||
const results = searchMemories(db, 'Memory');
|
||
expect(results).toHaveLength(2); // Both found initially
|
||
|
||
// List shows both
|
||
const all = listMemories(db);
|
||
expect(all).toHaveLength(2);
|
||
|
||
// Prune removes expired
|
||
const pruned = pruneMemories(db);
|
||
expect(pruned.count).toBe(1);
|
||
|
||
// Search now finds only active
|
||
const afterPrune = searchMemories(db, 'Memory');
|
||
expect(afterPrune).toHaveLength(1);
|
||
});
|
||
```
|
||
|
||
### ❌ Bad: Over-Testing Implementation
|
||
```javascript
|
||
// AVOID: Testing internal implementation details
|
||
test('parseTagString splits on comma', () => {
|
||
expect(parseTagString('a,b,c')).toEqual(['a', 'b', 'c']);
|
||
});
|
||
|
||
test('normalizeTag converts to lowercase', () => {
|
||
expect(normalizeTag('Docker')).toBe('docker');
|
||
});
|
||
|
||
// These are implementation details already covered by integration tests!
|
||
```
|
||
|
||
### ✅ Good: Unit Test (Justified)
|
||
```javascript
|
||
// Complex algorithm worth isolated testing
|
||
test('levenshtein distance edge cases', () => {
|
||
// Empty strings
|
||
expect(levenshtein('', '')).toBe(0);
|
||
expect(levenshtein('abc', '')).toBe(3);
|
||
|
||
// Unicode
|
||
expect(levenshtein('café', 'cafe')).toBe(1);
|
||
|
||
// Long strings
|
||
const long1 = 'a'.repeat(1000);
|
||
const long2 = 'a'.repeat(999) + 'b';
|
||
expect(levenshtein(long1, long2)).toBe(1);
|
||
});
|
||
```
|
||
|
||
## Debugging Failed Tests
|
||
|
||
### 1. Use `.only` to Focus
|
||
```javascript
|
||
test.only('this specific test', () => {
|
||
// Only runs this test
|
||
});
|
||
```
|
||
|
||
### 2. Inspect Database State
|
||
```javascript
|
||
test('debug search', () => {
|
||
storeMemory(db, { content: 'test' });
|
||
|
||
// Inspect what's in DB
|
||
const all = db.prepare('SELECT * FROM memories').all();
|
||
console.log('Database contents:', all);
|
||
|
||
const results = searchMemories(db, 'test');
|
||
console.log('Search results:', results);
|
||
|
||
expect(results).toHaveLength(1);
|
||
});
|
||
```
|
||
|
||
### 3. Use Temp File for Manual Inspection
|
||
```javascript
|
||
test('debug with file', () => {
|
||
const db = new Database('/tmp/debug.db');
|
||
initSchema(db);
|
||
|
||
storeMemory(db, { content: 'test' });
|
||
|
||
// Now inspect with: sqlite3 /tmp/debug.db
|
||
});
|
||
```
|
||
|
||
## Summary
|
||
|
||
**DO:**
|
||
- ✅ Write integration tests for all workflows
|
||
- ✅ Use realistic data (50-100 memories)
|
||
- ✅ Test with `:memory:` database
|
||
- ✅ Run in watch mode (`npm run test:watch`)
|
||
- ✅ Verify manually with CLI after tests pass
|
||
- ✅ Think twice before writing unit tests
|
||
|
||
**DON'T:**
|
||
- ❌ Test implementation details
|
||
- ❌ Write unit tests for simple functions
|
||
- ❌ Use toy data (1-2 memories)
|
||
- ❌ Mock database or CLI (test the real thing)
|
||
- ❌ Aim for 100% coverage at expense of test quality
|
||
|
||
**Remember:** Integration tests that verify real workflows are worth more than 100 unit tests that verify implementation details.
|
||
|
||
---
|
||
|
||
**Testing Philosophy:** Integration-first TDD with realistic data
|
||
**Coverage Target:** >80% (mostly integration tests)
|
||
**Unit Tests:** Rare, only for complex algorithms
|
||
**Workflow:** Write test (fail) → Implement (pass) → Verify (manual) → Refine
|