nixos/shared/linked-dotfiles/opencode/skills/browser-automation/README.md
2025-10-29 18:46:16 -06:00

4.3 KiB

Browser Automation Skill

Control Chrome browser via DevTools Protocol using the use_browser MCP tool.

Structure

browser-automation/
├── SKILL.md                      # Main skill (324 lines, 1050 words)
└── references/
    ├── examples.md               # Complete workflows (672 lines)
    ├── troubleshooting.md        # Error handling (546 lines)
    └── advanced.md               # Advanced patterns (678 lines)

Quick Start

The skill provides:

  • Core patterns: Navigate, wait, interact, extract
  • Form automation: Multi-step forms, validation, submission
  • Data extraction: Tables, structured data, batch operations
  • Multi-tab workflows: Cross-site data correlation
  • Dynamic content: AJAX waiting, infinite scroll, modals

Installation

This skill requires the use_browser MCP tool from the superpowers-chrome package.

Option 1: Use superpowers-chrome directly

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers-chrome@superpowers-marketplace

Option 2: Install as standalone skill

Copy this skill directory to your OpenCode skills directory:

cp -r browser-automation ~/.opencode/skills/

Then configure the chrome MCP server in your Claude Desktop config per the superpowers-chrome installation guide.

Usage

The skill is automatically loaded when OpenCode starts. It will be invoked when you:

  • Request web automation tasks
  • Need to fill forms
  • Want to extract content from websites
  • Mention Chrome or browser control

Example prompts:

  • "Fill out the registration form at example.com"
  • "Extract all product names and prices from this page"
  • "Navigate to my email and find the receipt from yesterday"

Contents

SKILL.md

Main reference with:

  • Quick reference table for all actions
  • Core workflow patterns
  • Common mistakes and solutions
  • Real-world impact metrics

references/examples.md

Complete workflows including:

  • E-commerce booking flows
  • Multi-step registration forms
  • Price comparison across sites
  • Data extraction patterns
  • Multi-tab operations
  • Dynamic content handling
  • Authentication workflows

references/troubleshooting.md

Solutions for:

  • Element not found errors
  • Timeout issues
  • Click failures
  • Form submission problems
  • Tab index errors
  • Extract returning empty

Plus best practices for selectors, waiting, and debugging.

references/advanced.md

Advanced techniques:

  • Network interception
  • JavaScript injection
  • Complex waiting patterns
  • Data manipulation
  • State management
  • Visual testing
  • Performance monitoring
  • Accessibility testing
  • Frame handling

Progressive Disclosure

The skill uses progressive disclosure to minimize context usage:

  1. SKILL.md loads first - quick reference and common patterns
  2. examples.md - loaded when implementing specific workflows
  3. troubleshooting.md - loaded when encountering errors
  4. advanced.md - loaded for complex requirements

Key Features

Single Tool Interface

All operations use one tool with action-based parameters:

{action: "navigate", payload: "https://example.com"}

CSS and XPath Support

Both selector types supported (XPath auto-detected):

{action: "click", selector: "button.submit"}
{action: "click", selector: "//button[text()='Submit']"}

Auto-Starting Chrome

Browser launches automatically on first use, no manual setup.

Multi-Tab Management

Control multiple tabs with tab_index parameter:

{action: "click", tab_index: 2, selector: "a.email"}

Token Efficiency

  • Main skill: 1050 words (target: <500 words for frequent skills)
  • Total skill: 6092 words across all files
  • Progressive loading ensures only relevant content loaded
  • Reference files separated by concern

Comparison with Playwright MCP

Use this skill when:

  • Working with existing browser sessions
  • Need authenticated workflows
  • Managing multiple tabs
  • Want minimal overhead

Use Playwright MCP when:

  • Need fresh isolated instances
  • Generating PDFs/screenshots
  • Prefer higher-level abstractions
  • Complex automation with built-in retry logic

Credits

Based on superpowers-chrome by obra (Jesse Vincent).

License

MIT