Browser Agent

A DOM-driven browser agent powered by Gemini's native function calling. The agent uses a multi-turn chat with structured tool calls — no free-text JSON parsing.

Architecture: Snapshot → Interpreter → Multi-turn Chat (ReAct reasoning + function call) → Execution → Result fed back to chat

Key design choices:

Native function calling — the LLM returns typed FunctionCall objects, not free-text JSON. Eliminates parsing failures.
Multi-turn conversation — the chat maintains history across steps. The LLM sees its prior reasoning and tool results.
Chain-of-thought — the system prompt requires step-by-step reasoning (Observe → Think → Act) before every tool call.
Few-shot examples — the system instruction includes worked examples of correct reasoning patterns.
Human-in-the-loop — three approval modes (safe/hybrid/auto) gate risky actions.
Two-tiered memory — the agent learns from mistakes across runs and self-optimizes over time.

1) Setup From Scratch

1.1 Requirements

Python 3.11+
Playwright CLI installed
google-genai Python package (v1.66+)
A Gemini API key

1.2 Install Python dependencies

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-browser-agent.txt
pip install -e .

1.3 Install Playwright CLI

If playwright-cli is already available globally, skip this.

# Option A: global install (preferred)
# Follow your org's standard for installing the Playwright CLI

# Option B: use npx every time
npx playwright-cli open https://example.com

If you use npx, run the agent with --use-npx (or set it during setup).

1.4 First-time agent config

Run once to create config at ~/.browser_agent/config.yaml:

browser-agent "open example.com" --safe

Prompts:

API key
model (default gemini-1.5-flash)
default mode (safe / hybrid / auto)
default Playwright session name (optional)
default start URL (optional)
whether to use npx by default

2) Modes (Safe / Hybrid / Auto)

--safe: approve every action. Best for first-time flows and logins.
--hybrid: approves only risky actions (navigation, typing, submissions, destructive clicks).
--auto: fully autonomous.

Examples:

browser-agent "find best padel rackets" --safe --start-url https://google.com
browser-agent "find best padel rackets" --hybrid --start-url https://google.com
browser-agent "find best padel rackets" --auto --start-url https://google.com

3) Run the Agent

3.1 Basic run

browser-agent "open youtube.com" --safe

3.2 Run with a start URL

browser-agent "search for best padel rackets" --safe --start-url https://google.com

3.3 Session usage

browser-agent "check inbox" --session gmail --start-url https://mail.google.com

3.4 Persistent profile (recommended for logins)

browser-agent "open gmail" --session gmail --persistent --headed --start-url https://mail.google.com

3.5 Custom profile directory

browser-agent "open gmail" --session gmail --profile ~/.browser-agent-profiles/gmail --headed --start-url https://mail.google.com

4) Login Flows

4.1 Recommended login flow (safe + persistent)

Launch in safe mode with a persistent profile:

browser-agent "login to site" --safe --persistent --headed --start-url https://example.com/login

Approve steps or manually complete login in the browser window.
Save storage state after login:

playwright-cli state-save auth.json

Restore login next time:

playwright-cli state-load auth.json

Re-run the agent with the same session/profile.

4.2 Login flow using storage state only

playwright-cli state-load auth.json
browser-agent "check account" --safe --start-url https://example.com

4.3 Multi-account logins

browser-agent "open account A" --session acct-a --persistent --headed --start-url https://example.com/login
browser-agent "open account B" --session acct-b --persistent --headed --start-url https://example.com/login

5) Do’s and Don’ts

Do

Use --safe for any login or payment flow.
Use --persistent with a named --session for repeat logins.
Save state-save after successful login.
Use --debug when a flow fails to capture trace + video.
Start with smaller tasks and increase autonomy gradually.

Don’t

Run --auto on payment, checkout, or account-modifying tasks.
Rely on storage state across unrelated sessions or profiles.
Leave stale persistent profiles uncleaned for high‑risk apps.
Ignore repeated action loops; stop and review logs.

6) Debug Mode and Observability

6.1 Debug mode (tracing + video)

browser-agent "search for best padel rackets" --debug --start-url https://google.com

Outputs:

Traces: .playwright-cli/traces/
Video: runs/<run_id>/session.webm

6.2 Manual tracing

playwright-cli tracing-start
# run actions
playwright-cli tracing-stop

6.3 Logs

Each run produces:

runs/<run_id>/
  snapshots/
  screenshots/
  actions.jsonl            # Every action executed + approval status + stdout/stderr
  llm_responses.jsonl      # Tool calls + reasoning text from the LLM
  browser_state.jsonl      # URL, title, snapshot paths per step
  interpreter_state.jsonl  # Parsed page state per step
  memory_events.jsonl      # Memory access, recalls, learning, promotions
  run_meta.json            # Task, stop_reason, step count, runtime

6.4 Memory observability

The agent has a two-tiered memory system that learns from failure→recovery patterns across runs. Every memory interaction is logged to memory_events.jsonl:

Event	What it tells you
`tier1_loaded`	Which universal lessons were injected into the system prompt
`error_recall`	A command failed — did memory find relevant tips?
`domain_recall`	Navigated to a new domain — any site-specific advice?
`lesson_recorded`	Post-run learning found a new failure→recovery pattern
`lesson_deduplicated`	An existing lesson was reinforced (use count bumped)
`lesson_promoted`	A lesson graduated from reactive (Tier 2) to always-on (Tier 1)
`lessons_pruned`	Stale lessons were cleaned up on startup

Query examples:

# All memory activity for a run
cat runs/run_*/memory_events.jsonl | python -m json.tool

# Did Trigger A (error recall) fire? What matched?
grep "error_recall" runs/run_*/memory_events.jsonl

# Did Trigger B (domain recall) fire?
grep "domain_recall" runs/run_*/memory_events.jsonl

# Any promotions across all runs?
grep "lesson_promoted" runs/run_*/memory_events.jsonl

The memory store itself is persisted at ~/.browser_agent/memory.json. See docs/memory-system.md for the full architecture.

7) Manual Playwright CLI Commands

playwright-cli open https://example.com
playwright-cli snapshot
playwright-cli click e12
playwright-cli type "search query"
playwright-cli press Enter
playwright-cli screenshot
playwright-cli close

8) Common Issues

`playwright-cli not found`

Install it globally or run with --use-npx.

`429 quota exceeded`

Switch to a lower-cost model (e.g., gemini-1.5-flash) or wait for quota reset.

Too many approvals

Use --hybrid or --auto if you want fewer prompts.

9) Safety Notes

Keep sensitive tasks in --safe.
Store login profiles in a dedicated folder per account.
Use a dedicated browser profile for automation to avoid leaking personal sessions.

10) Setup Validation Checklist

Run these once to confirm Playwright CLI and the agent are working:

playwright-cli open https://example.com
playwright-cli snapshot
playwright-cli close

browser-agent "open example.com" --safe

If playwright-cli is missing, use:

npx playwright-cli open https://example.com

11) Login and MFA Guidance

MFA/OTP

Use --safe for any login flow.
Complete MFA manually in the headed browser.
Avoid automating OTP codes unless explicitly required by policy.

When login fails

Use --debug to capture traces and video.
Try a fresh profile: --persistent --profile ~/.browser-agent-profiles/<site>.
Clear stale session data:

playwright-cli -s=mysession delete-data

12) Profile and Session Lifecycle

Sessions isolate cookies/storage by --session name.
Use persistent profiles for long‑lived logins.
Delete persistent data if it becomes corrupt or risky.

Commands:

playwright-cli list
playwright-cli close-all
playwright-cli kill-all
playwright-cli -s=mysession delete-data

13) Debugging Failed Runs

Check runs/<run_id>/run_meta.json for stop reason.
Check actions.jsonl for execution errors.
Check llm_responses.jsonl for planner errors or malformed actions.
Inspect snapshots/step_XXXX.txt to see the DOM references.
Use --debug for trace/video if the issue is visual or timing‑related.

14) Guardrails and Approvals

SAFE: every action requires approval.
HYBRID: approves only risky actions (navigation, typing, storage changes, destructive clicks).
AUTO: no approvals.

If you see repeated approvals on a low‑risk task, switch to --hybrid.

15) Allowed Commands (Playwright CLI)

The agent enforces a strict whitelist, aligned to the skill docs. It will reject commands that:

are not in the allowed list
have malformed args (e.g., check --url ...)
do not target valid element refs when required

16) Example Flows

Search and open a result

browser-agent "search for best padel rackets and open the first result" --safe --start-url https://google.com

Fill a form

browser-agent "fill the contact form with my name and email" --safe --start-url https://example.com/contact

17) Reset and Clean Up

playwright-cli close-all
playwright-cli kill-all
playwright-cli delete-data

18) Operational Tips

Use --headed during early development for visibility.
Use --start-url to reduce unnecessary navigation steps.
Use a dedicated profile folder per site to isolate login state.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
browser_agent		browser_agent
docs		docs
skills		skills
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements-browser-agent.txt		requirements-browser-agent.txt
search_results.yml		search_results.yml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Browser Agent

1) Setup From Scratch

1.1 Requirements

1.2 Install Python dependencies

1.3 Install Playwright CLI

1.4 First-time agent config

2) Modes (Safe / Hybrid / Auto)

3) Run the Agent

3.1 Basic run

3.2 Run with a start URL

3.3 Session usage

3.4 Persistent profile (recommended for logins)

3.5 Custom profile directory

4) Login Flows

4.1 Recommended login flow (safe + persistent)

4.2 Login flow using storage state only

4.3 Multi-account logins

5) Do’s and Don’ts

Do

Don’t

6) Debug Mode and Observability

6.1 Debug mode (tracing + video)

6.2 Manual tracing

6.3 Logs

6.4 Memory observability

7) Manual Playwright CLI Commands

8) Common Issues

playwright-cli not found

429 quota exceeded

Too many approvals

9) Safety Notes

10) Setup Validation Checklist

11) Login and MFA Guidance

MFA/OTP

When login fails

12) Profile and Session Lifecycle

13) Debugging Failed Runs

14) Guardrails and Approvals

15) Allowed Commands (Playwright CLI)

16) Example Flows

Search and open a result

Fill a form

17) Reset and Clean Up

18) Operational Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`playwright-cli not found`

`429 quota exceeded`

Packages