nixos/shared/linked-dotfiles/opencode/skills/browser-automation/README.md
2025-10-29 18:46:16 -06:00

162 lines
4.3 KiB
Markdown

# Browser Automation Skill
Control Chrome browser via DevTools Protocol using the `use_browser` MCP tool.
## Structure
```
browser-automation/
├── SKILL.md # Main skill (324 lines, 1050 words)
└── references/
├── examples.md # Complete workflows (672 lines)
├── troubleshooting.md # Error handling (546 lines)
└── advanced.md # Advanced patterns (678 lines)
```
## Quick Start
The skill provides:
- **Core patterns**: Navigate, wait, interact, extract
- **Form automation**: Multi-step forms, validation, submission
- **Data extraction**: Tables, structured data, batch operations
- **Multi-tab workflows**: Cross-site data correlation
- **Dynamic content**: AJAX waiting, infinite scroll, modals
## Installation
This skill requires the `use_browser` MCP tool from the superpowers-chrome package.
### Option 1: Use superpowers-chrome directly
```bash
/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers-chrome@superpowers-marketplace
```
### Option 2: Install as standalone skill
Copy this skill directory to your OpenCode skills directory:
```bash
cp -r browser-automation ~/.opencode/skills/
```
Then configure the `chrome` MCP server in your Claude Desktop config per the [superpowers-chrome installation guide](https://github.com/obra/superpowers-chrome#installation).
## Usage
The skill is automatically loaded when OpenCode starts. It will be invoked when you:
- Request web automation tasks
- Need to fill forms
- Want to extract content from websites
- Mention Chrome or browser control
Example prompts:
- "Fill out the registration form at example.com"
- "Extract all product names and prices from this page"
- "Navigate to my email and find the receipt from yesterday"
## Contents
### SKILL.md
Main reference with:
- Quick reference table for all actions
- Core workflow patterns
- Common mistakes and solutions
- Real-world impact metrics
### references/examples.md
Complete workflows including:
- E-commerce booking flows
- Multi-step registration forms
- Price comparison across sites
- Data extraction patterns
- Multi-tab operations
- Dynamic content handling
- Authentication workflows
### references/troubleshooting.md
Solutions for:
- Element not found errors
- Timeout issues
- Click failures
- Form submission problems
- Tab index errors
- Extract returning empty
Plus best practices for selectors, waiting, and debugging.
### references/advanced.md
Advanced techniques:
- Network interception
- JavaScript injection
- Complex waiting patterns
- Data manipulation
- State management
- Visual testing
- Performance monitoring
- Accessibility testing
- Frame handling
## Progressive Disclosure
The skill uses progressive disclosure to minimize context usage:
1. **SKILL.md** loads first - quick reference and common patterns
2. **examples.md** - loaded when implementing specific workflows
3. **troubleshooting.md** - loaded when encountering errors
4. **advanced.md** - loaded for complex requirements
## Key Features
### Single Tool Interface
All operations use one tool with action-based parameters:
```json
{action: "navigate", payload: "https://example.com"}
```
### CSS and XPath Support
Both selector types supported (XPath auto-detected):
```json
{action: "click", selector: "button.submit"}
{action: "click", selector: "//button[text()='Submit']"}
```
### Auto-Starting Chrome
Browser launches automatically on first use, no manual setup.
### Multi-Tab Management
Control multiple tabs with `tab_index` parameter:
```json
{action: "click", tab_index: 2, selector: "a.email"}
```
## Token Efficiency
- Main skill: 1050 words (target: <500 words for frequent skills)
- Total skill: 6092 words across all files
- Progressive loading ensures only relevant content loaded
- Reference files separated by concern
## Comparison with Playwright MCP
**Use this skill when:**
- Working with existing browser sessions
- Need authenticated workflows
- Managing multiple tabs
- Want minimal overhead
**Use Playwright MCP when:**
- Need fresh isolated instances
- Generating PDFs/screenshots
- Prefer higher-level abstractions
- Complex automation with built-in retry logic
## Credits
Based on [superpowers-chrome](https://github.com/obra/superpowers-chrome) by obra (Jesse Vincent).
## License
MIT