nixos/shared/linked-dotfiles/opencode/skills/do-job/references/spike-workflow.md
2025-10-29 18:46:16 -06:00

9.6 KiB

SPIKE Investigation Workflow

What is a SPIKE?

A SPIKE ticket is a time-boxed research and investigation task. The goal is to explore a problem space, evaluate solution approaches, and create an actionable plan for implementation.

SPIKE = Investigation only. No code changes.

Key Principles

1. Exploration Over Implementation

  • Focus on understanding the problem deeply
  • Consider multiple solution approaches (3-5)
  • Don't commit to first idea
  • Think creatively about alternatives

2. Documentation Over Code

  • Document findings thoroughly
  • Provide specific code references (file:line)
  • Explain trade-offs objectively
  • Create actionable implementation plan

3. Developer Approval Required

  • Always review findings with developer before creating tickets
  • Developer has final say on implementation approach
  • Get explicit approval before creating follow-up tickets
  • Typically results in just 1 follow-up ticket

4. No Code Changes

  • Read and explore codebase
  • Document findings
  • Create implementation plan
  • Write implementation code
  • Create git worktree
  • Create PR

Investigation Process

Phase 1: Problem Understanding

Understand current state:

  • Read ticket description thoroughly
  • Explore relevant codebase areas
  • Identify constraints and dependencies
  • Document current implementation

Ask questions:

  • What problem are we solving?
  • Who is affected?
  • What are the constraints?
  • What's the desired outcome?

Phase 2: Approach Exploration

Explore 3-5 different approaches:

For each approach, document:

  • Name: Brief descriptive name
  • Description: How it works
  • Pros: Benefits and advantages
  • Cons: Drawbacks and challenges
  • Effort: Relative complexity (S/M/L/XL)
  • Code locations: Specific file:line references

Think broadly:

  • Conventional approaches
  • Creative/unconventional approaches
  • Simple vs. complex solutions
  • Short-term vs. long-term solutions

Phase 3: Trade-off Analysis

Evaluate objectively:

  • Implementation complexity
  • Performance implications
  • Maintenance burden
  • Testing requirements
  • Migration/rollout complexity
  • Team familiarity with approach
  • Long-term sustainability

Be honest about cons:

  • Every approach has trade-offs
  • Document them clearly
  • Don't hide problems

Phase 4: Recommendation

Make clear recommendation:

  • Which approach is best
  • Why it's superior to alternatives
  • Key risks and mitigations
  • Confidence level (Low/Medium/High)

Justify recommendation:

  • Reference specific trade-offs
  • Explain why pros outweigh cons
  • Consider team context

Phase 5: Implementation Planning

Create actionable plan:

  • Typically breaks down into 1 follow-up ticket
  • Occasionally 2-3 if clearly independent tasks
  • Never many vague tickets

For each ticket, include:

  • Clear summary
  • Detailed description
  • Recommended approach
  • Acceptance criteria
  • Code references from investigation
  • Effort estimate (S/M/L/XL)

Investigation Output Template

## Investigation Findings - PI-XXXXX

### Problem Analysis
[Current state description with file:line references]
[Problem statement]
[Constraints and requirements]

### Approaches Considered

#### 1. [Approach Name]
- **Description**: [How it works]
- **Pros**:
  - [Benefit 1]
  - [Benefit 2]
- **Cons**:
  - [Drawback 1]
  - [Drawback 2]
- **Effort**: [S/M/L/XL]
- **Code**: [file.ext:123, file.ext:456]

#### 2. [Approach Name]
[Repeat structure for each approach]

[Continue for 3-5 approaches]

### Recommendation

**Recommended Approach**: [Approach Name]

**Justification**: [Why this is best, referencing specific trade-offs]

**Risks**:
- [Risk 1]: [Mitigation]
- [Risk 2]: [Mitigation]

**Confidence**: [Low/Medium/High]

### Proposed Implementation

Typically **1 follow-up ticket**:

**Summary**: [Concise task description]

**Description**:
[Detailed implementation plan]
[Step-by-step approach]
[Key considerations]

**Acceptance Criteria**:
- [ ] [Criterion 1]
- [ ] [Criterion 2]
- [ ] [Criterion 3]

**Effort Estimate**: [S/M/L/XL]

**Code References**:
- [file.ext:123 - Description]
- [file.ext:456 - Description]

### References
- [Documentation link]
- [Related ticket]
- [External resource]

Example SPIKE Investigation

Problem

Performance degradation in user search with large datasets (10k+ users)

Approaches Considered

1. Database Query Optimization

  • Description: Add indexes, optimize JOIN queries, use query caching
  • Pros:
    • Minimal code changes
    • Works with existing architecture
    • Can be implemented incrementally
  • Cons:
    • Limited scalability (still hits DB for each search)
    • Query complexity increases with features
    • Cache invalidation complexity
  • Effort: M
  • Code: user_service.go:245, user_repository.go:89

2. Elasticsearch Integration

  • Description: Index users in Elasticsearch, use for all search operations
  • Pros:
    • Excellent search performance at scale
    • Full-text search capabilities
    • Faceted search support
  • Cons:
    • New infrastructure to maintain
    • Data sync complexity
    • Team learning curve
    • Higher operational cost
  • Effort: XL
  • Code: Would be new service, interfaces at user_service.go:200

3. In-Memory Cache with Background Sync

  • Description: Maintain searchable user cache in memory, sync periodically
  • Pros:
    • Very fast search performance
    • No additional infrastructure
    • Simple implementation
  • Cons:
    • Memory usage on app servers
    • Eventual consistency issues
    • Cache warming on deploy
    • Doesn't scale past single-server memory
  • Effort: L
  • Code: New cache_service.go, integrate at user_service.go:245

4. Materialized View with Triggers

  • Description: Database materialized view optimized for search, auto-updated via triggers
  • Pros:
    • Good performance
    • Consistent data
    • Minimal app code changes
  • Cons:
    • Database-specific (PostgreSQL only)
    • Trigger complexity
    • Harder to debug issues
    • Lock contention on high write volume
  • Effort: M
  • Code: Migration needed, user_repository.go:89

Recommendation

Recommended Approach: Database Query Optimization (#1)

Justification: Given our current scale (8k users, growing ~20%/year) and team context:

  • Elasticsearch is over-engineering for current needs - reaches 50k users in ~5 years
  • In-memory cache has consistency issues that would affect UX
  • Materialized views add database complexity our team hasn't worked with
  • Query optimization addresses immediate pain point with minimal risk
  • Can revisit Elasticsearch if we hit 20k+ users or need full-text features

Risks:

  • May need to revisit in 2-3 years if growth accelerates: Monitor performance metrics, set alert at 15k users
  • Won't support advanced search features: Document limitation, plan for future if needed

Confidence: High

Proposed Implementation

1 follow-up ticket:

Summary: Optimize user search queries with indexes and caching

Description:

  1. Add composite index on (last_name, first_name, email)
  2. Implement Redis query cache with 5-min TTL
  3. Optimize JOIN query in getUsersForSearch
  4. Add performance monitoring

Acceptance Criteria:

  • Search response time < 200ms for 95th percentile
  • Database query count reduced from 3 to 1 per search
  • Monitoring dashboard shows performance metrics
  • Load testing validates 10k concurrent users

Effort Estimate: M (1-2 days)

Code References:

  • user_service.go:245 - Main search function to optimize
  • user_repository.go:89 - Database query to modify
  • schema.sql:34 - Add index here

References

  • PostgreSQL index documentation: https://...
  • Existing Redis cache pattern: cache_service.go:12
  • Related performance ticket: PI-65432

Common Pitfalls

Shallow Investigation

Bad:

  • Only considers 1 obvious solution
  • Vague references like "the user module"
  • No trade-off analysis

Good:

  • Explores 3-5 distinct approaches
  • Specific file:line references
  • Honest pros/cons for each

Analysis Paralysis

Bad:

  • Explores 15 different approaches
  • Gets lost in theoretical possibilities
  • Never makes clear recommendation

Good:

  • Focus on 3-5 viable approaches
  • Make decision based on team context
  • Acknowledge uncertainty but recommend path

Premature Implementation

Bad:

  • Starts writing code during SPIKE
  • Creates git worktree
  • Implements "prototype"

Good:

  • Investigation only
  • Code reading and references
  • Plan for implementation ticket

Automatic Ticket Creation

Bad:

  • Creates 5 tickets without developer review
  • Breaks work into too many pieces
  • Doesn't get approval first

Good:

  • Proposes implementation plan
  • Waits for developer approval
  • Typically creates just 1 ticket

Time-Boxing

SPIKEs should be time-boxed to prevent over-analysis:

  • Small SPIKE: 2-4 hours
  • Medium SPIKE: 1 day
  • Large SPIKE: 2-3 days

If hitting time limit:

  1. Document what you've learned so far
  2. Document what's still unknown
  3. Recommend either:
    • Proceeding with current knowledge
    • Extending SPIKE with specific questions
    • Creating prototype SPIKE to validate approach

Success Criteria

A successful SPIKE:

  • Thoroughly explores problem space
  • Considers multiple approaches (3-5)
  • Provides specific code references
  • Makes clear recommendation with justification
  • Creates actionable plan (typically 1 ticket)
  • Gets developer approval before creating tickets
  • Enables confident implementation

A successful SPIKE does NOT:

  • Implement the solution
  • Create code changes
  • Create tickets without approval
  • Leave implementation plan vague
  • Only explore 1 obvious solution