# SPIKE Investigation Workflow

## What is a SPIKE?

A SPIKE ticket is a time-boxed research and investigation task. The goal is to explore a problem space, evaluate solution approaches, and create an actionable plan for implementation.

**SPIKE = Investigation only. No code changes.**

## Key Principles

### 1. Exploration Over Implementation
- Focus on understanding the problem deeply
- Consider multiple solution approaches (3-5)
- Don't commit to first idea
- Think creatively about alternatives

### 2. Documentation Over Code
- Document findings thoroughly
- Provide specific code references (file:line)
- Explain trade-offs objectively
- Create actionable implementation plan

### 3. Developer Approval Required
- Always review findings with developer before creating tickets
- Developer has final say on implementation approach
- Get explicit approval before creating follow-up tickets
- Typically results in just 1 follow-up ticket

### 4. No Code Changes
- ✅ Read and explore codebase
- ✅ Document findings
- ✅ Create implementation plan
- ❌ Write implementation code
- ❌ Create git worktree
- ❌ Create PR

## Investigation Process

### Phase 1: Problem Understanding

**Understand current state:**
- Read ticket description thoroughly
- Explore relevant codebase areas
- Identify constraints and dependencies
- Document current implementation

**Ask questions:**
- What problem are we solving?
- Who is affected?
- What are the constraints?
- What's the desired outcome?

### Phase 2: Approach Exploration

**Explore 3-5 different approaches:**

For each approach, document:
- **Name**: Brief descriptive name
- **Description**: How it works
- **Pros**: Benefits and advantages
- **Cons**: Drawbacks and challenges
- **Effort**: Relative complexity (S/M/L/XL)
- **Code locations**: Specific file:line references

**Think broadly:**
- Conventional approaches
- Creative/unconventional approaches
- Simple vs. complex solutions
- Short-term vs. long-term solutions

### Phase 3: Trade-off Analysis

**Evaluate objectively:**
- Implementation complexity
- Performance implications
- Maintenance burden
- Testing requirements
- Migration/rollout complexity
- Team familiarity with approach
- Long-term sustainability

**Be honest about cons:**
- Every approach has trade-offs
- Document them clearly
- Don't hide problems

### Phase 4: Recommendation

**Make clear recommendation:**
- Which approach is best
- Why it's superior to alternatives
- Key risks and mitigations
- Confidence level (Low/Medium/High)

**Justify recommendation:**
- Reference specific trade-offs
- Explain why pros outweigh cons
- Consider team context

### Phase 5: Implementation Planning

**Create actionable plan:**
- Typically breaks down into **1 follow-up ticket**
- Occasionally 2-3 if clearly independent tasks
- Never many vague tickets

**For each ticket, include:**
- Clear summary
- Detailed description
- Recommended approach
- Acceptance criteria
- Code references from investigation
- Effort estimate (S/M/L/XL)

## Investigation Output Template

```markdown
## Investigation Findings - PI-XXXXX

### Problem Analysis
[Current state description with file:line references]
[Problem statement]
[Constraints and requirements]

### Approaches Considered

#### 1. [Approach Name]
- **Description**: [How it works]
- **Pros**:
  - [Benefit 1]
  - [Benefit 2]
- **Cons**:
  - [Drawback 1]
  - [Drawback 2]
- **Effort**: [S/M/L/XL]
- **Code**: [file.ext:123, file.ext:456]

#### 2. [Approach Name]
[Repeat structure for each approach]

[Continue for 3-5 approaches]

### Recommendation

**Recommended Approach**: [Approach Name]

**Justification**: [Why this is best, referencing specific trade-offs]

**Risks**:
- [Risk 1]: [Mitigation]
- [Risk 2]: [Mitigation]

**Confidence**: [Low/Medium/High]

### Proposed Implementation

Typically **1 follow-up ticket**:

**Summary**: [Concise task description]

**Description**:
[Detailed implementation plan]
[Step-by-step approach]
[Key considerations]

**Acceptance Criteria**:
- [ ] [Criterion 1]
- [ ] [Criterion 2]
- [ ] [Criterion 3]

**Effort Estimate**: [S/M/L/XL]

**Code References**:
- [file.ext:123 - Description]
- [file.ext:456 - Description]

### References
- [Documentation link]
- [Related ticket]
- [External resource]
```

## Example SPIKE Investigation

### Problem
Performance degradation in user search with large datasets (10k+ users)

### Approaches Considered

#### 1. Database Query Optimization
- **Description**: Add indexes, optimize JOIN queries, use query caching
- **Pros**:
  - Minimal code changes
  - Works with existing architecture
  - Can be implemented incrementally
- **Cons**:
  - Limited scalability (still hits DB for each search)
  - Query complexity increases with features
  - Cache invalidation complexity
- **Effort**: M
- **Code**: user_service.go:245, user_repository.go:89

#### 2. Elasticsearch Integration
- **Description**: Index users in Elasticsearch, use for all search operations
- **Pros**:
  - Excellent search performance at scale
  - Full-text search capabilities
  - Faceted search support
- **Cons**:
  - New infrastructure to maintain
  - Data sync complexity
  - Team learning curve
  - Higher operational cost
- **Effort**: XL
- **Code**: Would be new service, interfaces at user_service.go:200

#### 3. In-Memory Cache with Background Sync
- **Description**: Maintain searchable user cache in memory, sync periodically
- **Pros**:
  - Very fast search performance
  - No additional infrastructure
  - Simple implementation
- **Cons**:
  - Memory usage on app servers
  - Eventual consistency issues
  - Cache warming on deploy
  - Doesn't scale past single-server memory
- **Effort**: L
- **Code**: New cache_service.go, integrate at user_service.go:245

#### 4. Materialized View with Triggers
- **Description**: Database materialized view optimized for search, auto-updated via triggers
- **Pros**:
  - Good performance
  - Consistent data
  - Minimal app code changes
- **Cons**:
  - Database-specific (PostgreSQL only)
  - Trigger complexity
  - Harder to debug issues
  - Lock contention on high write volume
- **Effort**: M
- **Code**: Migration needed, user_repository.go:89

### Recommendation

**Recommended Approach**: Database Query Optimization (#1)

**Justification**:
Given our current scale (8k users, growing ~20%/year) and team context:
- Elasticsearch is over-engineering for current needs - reaches 50k users in ~5 years
- In-memory cache has consistency issues that would affect UX
- Materialized views add database complexity our team hasn't worked with
- Query optimization addresses immediate pain point with minimal risk
- Can revisit Elasticsearch if we hit 20k+ users or need full-text features

**Risks**:
- May need to revisit in 2-3 years if growth accelerates: Monitor performance metrics, set alert at 15k users
- Won't support advanced search features: Document limitation, plan for future if needed

**Confidence**: High

### Proposed Implementation

**1 follow-up ticket**:

**Summary**: Optimize user search queries with indexes and caching

**Description**:
1. Add composite index on (last_name, first_name, email)
2. Implement Redis query cache with 5-min TTL
3. Optimize JOIN query in getUsersForSearch
4. Add performance monitoring

**Acceptance Criteria**:
- [ ] Search response time < 200ms for 95th percentile
- [ ] Database query count reduced from 3 to 1 per search
- [ ] Monitoring dashboard shows performance metrics
- [ ] Load testing validates 10k concurrent users

**Effort Estimate**: M (1-2 days)

**Code References**:
- user_service.go:245 - Main search function to optimize
- user_repository.go:89 - Database query to modify
- schema.sql:34 - Add index here

### References
- PostgreSQL index documentation: https://...
- Existing Redis cache pattern: cache_service.go:12
- Related performance ticket: PI-65432

## Common Pitfalls

### ❌ Shallow Investigation
**Bad**:
- Only considers 1 obvious solution
- Vague references like "the user module"
- No trade-off analysis

**Good**:
- Explores 3-5 distinct approaches
- Specific file:line references
- Honest pros/cons for each

### ❌ Analysis Paralysis
**Bad**:
- Explores 15 different approaches
- Gets lost in theoretical possibilities
- Never makes clear recommendation

**Good**:
- Focus on 3-5 viable approaches
- Make decision based on team context
- Acknowledge uncertainty but recommend path

### ❌ Premature Implementation
**Bad**:
- Starts writing code during SPIKE
- Creates git worktree
- Implements "prototype"

**Good**:
- Investigation only
- Code reading and references
- Plan for implementation ticket

### ❌ Automatic Ticket Creation
**Bad**:
- Creates 5 tickets without developer review
- Breaks work into too many pieces
- Doesn't get approval first

**Good**:
- Proposes implementation plan
- Waits for developer approval
- Typically creates just 1 ticket

## Time-Boxing

SPIKEs should be time-boxed to prevent over-analysis:

- **Small SPIKE**: 2-4 hours
- **Medium SPIKE**: 1 day
- **Large SPIKE**: 2-3 days

If hitting time limit:
1. Document what you've learned so far
2. Document what's still unknown
3. Recommend either:
   - Proceeding with current knowledge
   - Extending SPIKE with specific questions
   - Creating prototype SPIKE to validate approach

## Success Criteria

A successful SPIKE:
- ✅ Thoroughly explores problem space
- ✅ Considers multiple approaches (3-5)
- ✅ Provides specific code references
- ✅ Makes clear recommendation with justification
- ✅ Creates actionable plan (typically 1 ticket)
- ✅ Gets developer approval before creating tickets
- ✅ Enables confident implementation

A successful SPIKE does NOT:
- ❌ Implement the solution
- ❌ Create code changes
- ❌ Create tickets without approval
- ❌ Leave implementation plan vague
- ❌ Only explore 1 obvious solution