9.6 KiB
SPIKE Investigation Workflow
What is a SPIKE?
A SPIKE ticket is a time-boxed research and investigation task. The goal is to explore a problem space, evaluate solution approaches, and create an actionable plan for implementation.
SPIKE = Investigation only. No code changes.
Key Principles
1. Exploration Over Implementation
- Focus on understanding the problem deeply
- Consider multiple solution approaches (3-5)
- Don't commit to first idea
- Think creatively about alternatives
2. Documentation Over Code
- Document findings thoroughly
- Provide specific code references (file:line)
- Explain trade-offs objectively
- Create actionable implementation plan
3. Developer Approval Required
- Always review findings with developer before creating tickets
- Developer has final say on implementation approach
- Get explicit approval before creating follow-up tickets
- Typically results in just 1 follow-up ticket
4. No Code Changes
- ✅ Read and explore codebase
- ✅ Document findings
- ✅ Create implementation plan
- ❌ Write implementation code
- ❌ Create git worktree
- ❌ Create PR
Investigation Process
Phase 1: Problem Understanding
Understand current state:
- Read ticket description thoroughly
- Explore relevant codebase areas
- Identify constraints and dependencies
- Document current implementation
Ask questions:
- What problem are we solving?
- Who is affected?
- What are the constraints?
- What's the desired outcome?
Phase 2: Approach Exploration
Explore 3-5 different approaches:
For each approach, document:
- Name: Brief descriptive name
- Description: How it works
- Pros: Benefits and advantages
- Cons: Drawbacks and challenges
- Effort: Relative complexity (S/M/L/XL)
- Code locations: Specific file:line references
Think broadly:
- Conventional approaches
- Creative/unconventional approaches
- Simple vs. complex solutions
- Short-term vs. long-term solutions
Phase 3: Trade-off Analysis
Evaluate objectively:
- Implementation complexity
- Performance implications
- Maintenance burden
- Testing requirements
- Migration/rollout complexity
- Team familiarity with approach
- Long-term sustainability
Be honest about cons:
- Every approach has trade-offs
- Document them clearly
- Don't hide problems
Phase 4: Recommendation
Make clear recommendation:
- Which approach is best
- Why it's superior to alternatives
- Key risks and mitigations
- Confidence level (Low/Medium/High)
Justify recommendation:
- Reference specific trade-offs
- Explain why pros outweigh cons
- Consider team context
Phase 5: Implementation Planning
Create actionable plan:
- Typically breaks down into 1 follow-up ticket
- Occasionally 2-3 if clearly independent tasks
- Never many vague tickets
For each ticket, include:
- Clear summary
- Detailed description
- Recommended approach
- Acceptance criteria
- Code references from investigation
- Effort estimate (S/M/L/XL)
Investigation Output Template
## Investigation Findings - PI-XXXXX
### Problem Analysis
[Current state description with file:line references]
[Problem statement]
[Constraints and requirements]
### Approaches Considered
#### 1. [Approach Name]
- **Description**: [How it works]
- **Pros**:
- [Benefit 1]
- [Benefit 2]
- **Cons**:
- [Drawback 1]
- [Drawback 2]
- **Effort**: [S/M/L/XL]
- **Code**: [file.ext:123, file.ext:456]
#### 2. [Approach Name]
[Repeat structure for each approach]
[Continue for 3-5 approaches]
### Recommendation
**Recommended Approach**: [Approach Name]
**Justification**: [Why this is best, referencing specific trade-offs]
**Risks**:
- [Risk 1]: [Mitigation]
- [Risk 2]: [Mitigation]
**Confidence**: [Low/Medium/High]
### Proposed Implementation
Typically **1 follow-up ticket**:
**Summary**: [Concise task description]
**Description**:
[Detailed implementation plan]
[Step-by-step approach]
[Key considerations]
**Acceptance Criteria**:
- [ ] [Criterion 1]
- [ ] [Criterion 2]
- [ ] [Criterion 3]
**Effort Estimate**: [S/M/L/XL]
**Code References**:
- [file.ext:123 - Description]
- [file.ext:456 - Description]
### References
- [Documentation link]
- [Related ticket]
- [External resource]
Example SPIKE Investigation
Problem
Performance degradation in user search with large datasets (10k+ users)
Approaches Considered
1. Database Query Optimization
- Description: Add indexes, optimize JOIN queries, use query caching
- Pros:
- Minimal code changes
- Works with existing architecture
- Can be implemented incrementally
- Cons:
- Limited scalability (still hits DB for each search)
- Query complexity increases with features
- Cache invalidation complexity
- Effort: M
- Code: user_service.go:245, user_repository.go:89
2. Elasticsearch Integration
- Description: Index users in Elasticsearch, use for all search operations
- Pros:
- Excellent search performance at scale
- Full-text search capabilities
- Faceted search support
- Cons:
- New infrastructure to maintain
- Data sync complexity
- Team learning curve
- Higher operational cost
- Effort: XL
- Code: Would be new service, interfaces at user_service.go:200
3. In-Memory Cache with Background Sync
- Description: Maintain searchable user cache in memory, sync periodically
- Pros:
- Very fast search performance
- No additional infrastructure
- Simple implementation
- Cons:
- Memory usage on app servers
- Eventual consistency issues
- Cache warming on deploy
- Doesn't scale past single-server memory
- Effort: L
- Code: New cache_service.go, integrate at user_service.go:245
4. Materialized View with Triggers
- Description: Database materialized view optimized for search, auto-updated via triggers
- Pros:
- Good performance
- Consistent data
- Minimal app code changes
- Cons:
- Database-specific (PostgreSQL only)
- Trigger complexity
- Harder to debug issues
- Lock contention on high write volume
- Effort: M
- Code: Migration needed, user_repository.go:89
Recommendation
Recommended Approach: Database Query Optimization (#1)
Justification: Given our current scale (8k users, growing ~20%/year) and team context:
- Elasticsearch is over-engineering for current needs - reaches 50k users in ~5 years
- In-memory cache has consistency issues that would affect UX
- Materialized views add database complexity our team hasn't worked with
- Query optimization addresses immediate pain point with minimal risk
- Can revisit Elasticsearch if we hit 20k+ users or need full-text features
Risks:
- May need to revisit in 2-3 years if growth accelerates: Monitor performance metrics, set alert at 15k users
- Won't support advanced search features: Document limitation, plan for future if needed
Confidence: High
Proposed Implementation
1 follow-up ticket:
Summary: Optimize user search queries with indexes and caching
Description:
- Add composite index on (last_name, first_name, email)
- Implement Redis query cache with 5-min TTL
- Optimize JOIN query in getUsersForSearch
- Add performance monitoring
Acceptance Criteria:
- Search response time < 200ms for 95th percentile
- Database query count reduced from 3 to 1 per search
- Monitoring dashboard shows performance metrics
- Load testing validates 10k concurrent users
Effort Estimate: M (1-2 days)
Code References:
- user_service.go:245 - Main search function to optimize
- user_repository.go:89 - Database query to modify
- schema.sql:34 - Add index here
References
- PostgreSQL index documentation: https://...
- Existing Redis cache pattern: cache_service.go:12
- Related performance ticket: PI-65432
Common Pitfalls
❌ Shallow Investigation
Bad:
- Only considers 1 obvious solution
- Vague references like "the user module"
- No trade-off analysis
Good:
- Explores 3-5 distinct approaches
- Specific file:line references
- Honest pros/cons for each
❌ Analysis Paralysis
Bad:
- Explores 15 different approaches
- Gets lost in theoretical possibilities
- Never makes clear recommendation
Good:
- Focus on 3-5 viable approaches
- Make decision based on team context
- Acknowledge uncertainty but recommend path
❌ Premature Implementation
Bad:
- Starts writing code during SPIKE
- Creates git worktree
- Implements "prototype"
Good:
- Investigation only
- Code reading and references
- Plan for implementation ticket
❌ Automatic Ticket Creation
Bad:
- Creates 5 tickets without developer review
- Breaks work into too many pieces
- Doesn't get approval first
Good:
- Proposes implementation plan
- Waits for developer approval
- Typically creates just 1 ticket
Time-Boxing
SPIKEs should be time-boxed to prevent over-analysis:
- Small SPIKE: 2-4 hours
- Medium SPIKE: 1 day
- Large SPIKE: 2-3 days
If hitting time limit:
- Document what you've learned so far
- Document what's still unknown
- Recommend either:
- Proceeding with current knowledge
- Extending SPIKE with specific questions
- Creating prototype SPIKE to validate approach
Success Criteria
A successful SPIKE:
- ✅ Thoroughly explores problem space
- ✅ Considers multiple approaches (3-5)
- ✅ Provides specific code references
- ✅ Makes clear recommendation with justification
- ✅ Creates actionable plan (typically 1 ticket)
- ✅ Gets developer approval before creating tickets
- ✅ Enables confident implementation
A successful SPIKE does NOT:
- ❌ Implement the solution
- ❌ Create code changes
- ❌ Create tickets without approval
- ❌ Leave implementation plan vague
- ❌ Only explore 1 obvious solution