nixos/shared/linked-dotfiles/opencode/skills/browser-automation/SKILL.md
2025-10-29 18:46:16 -06:00

8.5 KiB

name description
browser-automation Use when automating web tasks, filling forms, extracting content, or controlling Chrome - provides Chrome DevTools Protocol automation via use_browser MCP tool for multi-tab workflows, form automation, and content extraction

Browser Automation with Chrome DevTools Protocol

Control Chrome directly via DevTools Protocol using the use_browser MCP tool. Single unified interface with auto-starting Chrome.

Core principle: One tool, action-based interface, zero dependencies.

When to Use This Skill

Use when:

  • Automating web forms and interactions
  • Extracting content from web pages (text, tables, links)
  • Managing authenticated browser sessions
  • Multi-tab workflows requiring context switching
  • Testing web applications interactively
  • Scraping dynamic content loaded by JavaScript

Don't use when:

  • Need fresh isolated browser instances
  • Require PDF/screenshot generation (use Playwright MCP)
  • Simple HTTP requests suffice (use curl/fetch)

Quick Reference

Task Action Key Parameters
Go to URL navigate payload: URL
Wait for element await_element selector, timeout
Click element click selector
Type text type selector, payload (add \n to submit)
Get content extract payload: 'markdown'|'text'|'html'
Run JavaScript eval payload: JS code
Get attribute attr selector, payload: attr name
Select dropdown select selector, payload: option value
Take screenshot screenshot payload: filename
List tabs list_tabs -
New tab new_tab -

The use_browser Tool

Parameters:

  • action (required): Operation to perform
  • tab_index (optional): Tab to operate on (default: 0)
  • selector (optional): CSS selector or XPath (XPath starts with / or //)
  • payload (optional): Action-specific data
  • timeout (optional): Timeout in ms (default: 5000, max: 60000)

Returns: JSON response with result or error

Core Pattern

Every browser workflow follows this structure:

1. Navigate to page
2. Wait for content to load
3. Interact or extract
4. Validate result

Example:

{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "h1"}
{action: "extract", payload: "text", selector: "h1"}

Common Workflows

Form Filling

{action: "navigate", payload: "https://app.com/login"}
{action: "await_element", selector: "input[name=email]"}
{action: "type", selector: "input[name=email]", payload: "user@example.com"}
{action: "type", selector: "input[name=password]", payload: "pass123\n"}
{action: "await_text", payload: "Welcome"}

Note: \n at end submits the form automatically.

Content Extraction

{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "body"}
{action: "extract", payload: "markdown"}

Multi-Tab Workflow

{action: "list_tabs"}
{action: "click", tab_index: 2, selector: "a.email"}
{action: "await_element", tab_index: 2, selector: ".content"}
{action: "extract", tab_index: 2, payload: "text", selector: ".amount"}

Dynamic Content

{action: "navigate", payload: "https://app.com"}
{action: "type", selector: "input[name=q]", payload: "query"}
{action: "click", selector: "button.search"}
{action: "await_element", selector: ".results"}
{action: "extract", payload: "text", selector: ".result-title"}

Get Structured Data

{action: "eval", payload: "Array.from(document.querySelectorAll('a')).map(a => ({ text: a.textContent.trim(), href: a.href }))"}

Implementation Steps

1. Verify Page Structure

Before building automation, check selectors:

{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "body"}
{action: "extract", payload: "html"}

2. Build Workflow Incrementally

Test each step before adding next:

// Step 1: Navigate and verify
{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "form"}

// Step 2: Fill first field and verify
{action: "type", selector: "input[name=email]", payload: "test@example.com"}
{action: "attr", selector: "input[name=email]", payload: "value"}

// Step 3: Complete form
{action: "type", selector: "input[name=password]", payload: "pass\n"}

3. Add Error Handling

Always wait before interaction:

// BAD - might fail
{action: "navigate", payload: "https://example.com"}
{action: "click", selector: "button"}

// GOOD - wait first
{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "button"}
{action: "click", selector: "button"}

4. Validate Results

Check output after critical operations:

{action: "click", selector: "button.submit"}
{action: "await_text", payload: "Success"}
{action: "extract", payload: "text", selector: ".confirmation"}

Selector Strategies

Use specific selectors:

  • button[type=submit]
  • #login-button
  • .modal button.confirm
  • button (too generic)

XPath for complex queries:

{action: "extract", selector: "//h2 | //h3", payload: "text"}
{action: "click", selector: "//button[contains(text(), 'Submit')]"}

Test selectors first:

{action: "eval", payload: "document.querySelector('button.submit')"}

Common Mistakes

Timing Issues

Problem: Clicking before element loads

{action: "navigate", payload: "https://example.com"}
{action: "click", selector: "button"}  // ❌ Fails if slow

Solution: Always wait

{action: "navigate", payload: "https://example.com"}
{action: "await_element", selector: "button"}  // ✅ Waits
{action: "click", selector: "button"}

Generic Selectors

Problem: Matches wrong element

{action: "click", selector: "button"}  // ❌ First button only

Solution: Be specific

{action: "click", selector: "button.login-button"}  // ✅ Specific

Missing Tab Management

Problem: Tab indices change after closing tabs

{action: "close_tab", tab_index: 1}
{action: "click", tab_index: 2, selector: "a"}  // ❌ Index shifted

Solution: Re-list tabs

{action: "close_tab", tab_index: 1}
{action: "list_tabs"}  // ✅ Get updated indices
{action: "click", tab_index: 1, selector: "a"}  // Now correct

Insufficient Timeout

Problem: Default 5s timeout too short

{action: "await_element", selector: ".slow-content"}  // ❌ Times out

Solution: Increase timeout

{action: "await_element", selector: ".slow-content", timeout: 30000}  // ✅

Advanced Patterns

Wait for AJAX Complete

{action: "eval", payload: `
  new Promise(resolve => {
    const check = () => {
      if (!document.querySelector('.spinner')) {
        resolve(true);
      } else {
        setTimeout(check, 100);
      }
    };
    check();
  })
`}

Extract Table Data

{action: "eval", payload: "Array.from(document.querySelectorAll('table tr')).map(row => Array.from(row.cells).map(cell => cell.textContent.trim()))"}

Handle Modals

{action: "click", selector: "button.open-modal"}
{action: "await_element", selector: ".modal.visible"}
{action: "type", selector: ".modal input[name=username]", payload: "testuser"}
{action: "click", selector: ".modal button.submit"}
{action: "eval", payload: `
  new Promise(resolve => {
    const check = () => {
      if (!document.querySelector('.modal.visible')) resolve(true);
      else setTimeout(check, 100);
    };
    check();
  })
`}

Access Browser Storage

// Get cookies
{action: "eval", payload: "document.cookie"}

// Get localStorage
{action: "eval", payload: "JSON.stringify(localStorage)"}

// Set localStorage
{action: "eval", payload: "localStorage.setItem('key', 'value')"}

Real-World Impact

Before: Manual form filling, 5 minutes per submission After: Automated workflow, 30 seconds per submission (10x faster)

Before: Copy-paste from multiple tabs, error-prone After: Multi-tab extraction with validation, zero errors

Before: Unreliable scraping with arbitrary delays After: Event-driven waiting, 100% reliability

Additional Resources

See references/examples.md for:

  • Complete e-commerce workflows
  • Multi-step form automation
  • Advanced scraping patterns
  • Infinite scroll handling
  • Cross-site data correlation

Chrome DevTools Protocol docs: https://chromedevtools.github.io/devtools-protocol/