OpenEnv documentation

BrowserGym Environment

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.4.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

BrowserGym Environment

BrowserGym is a unified framework for web-based agent tasks that provides access to multiple benchmarks under a single Gymnasium-compatible API. This integration brings the complete training-to-evaluation pipeline for web agents into OpenEnv.

Why BrowserGym?

BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.

What are these benchmarks?

  • MiniWoB++ (Training): 100+ synthetic web tasks like β€œclick this button”, β€œfill out this form”, β€œselect from dropdown”. Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. No external setup needed - tasks run in isolated browser sessions.

  • WebArena (Evaluation): 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like β€œfind the cheapest laptop and add to cart” or β€œcreate a merge request for bug #123”. Multistep, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. Requires running 7 backend services (shopping site, GitLab instance, etc.).

  • VisualWebArena: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.

  • WorkArena: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.

The training β†’ evaluation pipeline:

  1. Train on MiniWoB (simple, controlled, fast iterations)
  2. Evaluate on WebArena (complex, realistic, measures real-world capability)

Key advantage: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.

Quick Start - Training (MiniWoB)

No Setup Required! πŸŽ‰

from browsergym_env import BrowserGymEnv, BrowserGymAction

# Create environment for MiniWoB training task
env = BrowserGymEnv.from_docker_image(
    "ghcr.io/openenv/browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "miniwob",
        "BROWSERGYM_TASK_NAME": "click-test",  # or "click-button", "click-dialog", etc.
    }
)

# Train your agent!
for episode in range(1000):
    result = env.reset()
    print(f"Goal: {result.observation.goal}")

    done = False
    while not done:
        # Your agent decides what to do
        action_str = agent.get_action(result.observation.text)
        action = BrowserGymAction(action_str=action_str)

        result = env.step(action)
        done = result.done

        print(f"Reward: {result.reward}")

env.close()

Harness Sessions for TRL

If you want BrowserGym to participate in a tool-driven harness instead of a hand-written env.reset() / env.step() loop, use the BrowserGym session factory:

from browsergym_env import BrowserGymEnv
from browsergym_env.harness import BrowserGymSessionFactory
from openenv.core.harness import (
    HarnessRunLimits,
    MCPHarnessAdapter,
    build_harness_rollout_func,
)

session_factory = BrowserGymSessionFactory(
    client_factory=lambda: BrowserGymEnv(base_url="https://openenv-browsergym-env.hf.space"),
)

rollout_func = build_harness_rollout_func(
    session_factory=session_factory,
    harness_adapter=MCPHarnessAdapter(),
    model_step_builder=...,  # trainer-owned model sampling
    limits=HarnessRunLimits(max_turns=10),
)

BrowserGym exposes click, fill, send_keys, scroll, and noop as MCP-style tools while still translating them back into the underlying BrowserGymAction strings. See examples/browsergym_harness.py for a full TRL-oriented example.

Available Tasks by Benchmark

MiniWoB++ Tasks (Training - 100+ tasks)

MiniWoB tasks are organized by difficulty and type. Here are the main categories:

Click Tasks (Basic interaction)

Task Name Description Difficulty
click-test Click a single button ⭐ Easy
click-button Click button with specific text ⭐ Easy
click-button-sequence Click buttons in order ⭐⭐ Medium
click-checkboxes Select specific checkboxes ⭐⭐ Medium
click-checkboxes-soft Select checkboxes (multiple valid) ⭐⭐ Medium
click-checkboxes-large Many checkboxes to select from ⭐⭐ Medium
click-checkboxes-transfer Transfer learning variation ⭐⭐ Medium
click-dialog Click correct button in dialog ⭐ Easy
click-dialog-2 More complex dialog ⭐⭐ Medium
click-link Click on a link ⭐ Easy
click-option Select from dropdown ⭐⭐ Medium
click-pie Click on pie chart slice ⭐⭐ Medium
click-scroll-list Click item in scrollable list ⭐⭐⭐ Hard
click-shades Click on specific color shade ⭐⭐ Medium
click-shape Click on specific shape ⭐⭐ Medium
click-tab Switch between tabs ⭐⭐ Medium
click-tab-2 More complex tab switching ⭐⭐⭐ Hard
click-widget Click on UI widget ⭐⭐ Medium

Text Entry Tasks (Typing and forms)

Task Name Description Difficulty
enter-text Type text into input field ⭐ Easy
enter-text-dynamic Dynamic text entry ⭐⭐ Medium
enter-text-2 Multiple text fields ⭐⭐ Medium
enter-password Fill password field ⭐ Easy
enter-date Enter a date ⭐⭐ Medium
enter-time Enter a time ⭐⭐ Medium
login-user Complete login form ⭐⭐ Medium
login-user-popup Login via popup ⭐⭐⭐ Hard

Navigation Tasks (Multi-step interaction)

Task Name Description Difficulty
navigate-tree Navigate through tree structure ⭐⭐⭐ Hard
search-engine Use search interface ⭐⭐ Medium
use-autocomplete Interact with autocomplete ⭐⭐⭐ Hard
book-flight Book a flight (complex form) ⭐⭐⭐⭐ Very Hard
choose-date Pick date from calendar ⭐⭐⭐ Hard
choose-date-easy Simplified date picker ⭐⭐ Medium
choose-date-medium Medium difficulty date picker ⭐⭐⭐ Hard
choose-list Select from long list ⭐⭐ Medium

Visual/Spatial Tasks (Requires visual understanding)

Task Name Description Difficulty
count-sides Count sides of shape ⭐⭐ Medium
count-shape Count specific shapes ⭐⭐ Medium
find-word Find word in text ⭐⭐ Medium
focus-text Focus on text element ⭐ Easy
focus-text-2 More complex focus task ⭐⭐ Medium
grid-coordinate Click grid coordinate ⭐⭐ Medium
guess-number Guess a number game ⭐⭐⭐ Hard
identify-shape Identify shape type ⭐⭐ Medium
read-table Extract info from table ⭐⭐⭐ Hard
read-table-2 More complex table reading ⭐⭐⭐ Hard

Email/Social Tasks (Realistic scenarios)

Task Name Description Difficulty
email-inbox Manage email inbox ⭐⭐⭐⭐ Very Hard
email-inbox-forward Forward emails ⭐⭐⭐⭐ Very Hard
email-inbox-nl Natural language email task ⭐⭐⭐⭐ Very Hard
email-inbox-star-reply Star and reply to emails ⭐⭐⭐⭐ Very Hard
social-media Social media interaction ⭐⭐⭐⭐ Very Hard
social-media-some Partial social media task ⭐⭐⭐ Hard

Total: 100+ tasks across all categories

Usage:

# Easy task for quick testing
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-test"})

# Medium difficulty for training
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "click-checkboxes"})

# Hard task for evaluation
env = BrowserGymEnv(environment={"BROWSERGYM_TASK_NAME": "email-inbox"})

WebArena Tasks (Evaluation - 812 tasks)

WebArena tasks are organized by website and difficulty. Tasks are numbered 0-811.

By Website:

Website Task Count Description Example Tasks
Shopping ~200 E-commerce site Search products, add to cart, checkout
Shopping Admin ~150 Admin panel Manage products, orders, customers
Reddit ~150 Forum/social Post, comment, search discussions
GitLab ~200 Code repository Create issues, merge requests, review code
Wikipedia ~100 Knowledge base Search, read, extract information
Map ~12 Location service Find places, get directions

By Difficulty:

Difficulty Task Count Steps Required Example
Easy ~200 1-5 steps β€œFind the price of product X”
Medium ~400 5-15 steps β€œAdd cheapest laptop to cart”
Hard ~212 15+ steps β€œCreate merge request for bug fix”

Usage:

# Task 0 (usually easy)
env = BrowserGymEnv(environment={
    "BROWSERGYM_BENCHMARK": "webarena",
    "BROWSERGYM_TASK_NAME": "0",
    "SHOPPING": "http://your-server:7770",
    # ... other URLs
})

# Task 156 (GitLab merge request)
env = BrowserGymEnv(environment={
    "BROWSERGYM_BENCHMARK": "webarena",
    "BROWSERGYM_TASK_NAME": "156",
    # ... URLs
})

Note: WebArena tasks require the full backend infrastructure. See WebArena setup guide.

VisualWebArena Tasks (910 tasks)

Similar to WebArena but requires visual understanding. Tasks involve:

  • Image-based reasoning
  • Visual element identification
  • Multimodal interaction (text + images)

WorkArena Tasks

Enterprise software automation tasks:

  • CRM operations
  • Project management
  • Business workflows

Full task lists:

Evaluation (WebArena)

Prerequisites

WebArena requires setting up backend infrastructure. See the WebArena documentation.

Usage

from envs.browsergym_env import BrowserGymEnv, BrowserGymAction

# Create environment for WebArena evaluation
env = BrowserGymEnv.from_docker_image(
    "ghcr.io/openenv/browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "webarena",
        "BROWSERGYM_TASK_NAME": "0",  # Task ID
        # WebArena backend URLs (required)
        "SHOPPING": "http://your-server:7770",
        "SHOPPING_ADMIN": "http://your-server:7780/admin",
        "REDDIT": "http://your-server:9999",
        "GITLAB": "http://your-server:8023",
        "MAP": "http://your-server:3000",
        "WIKIPEDIA": "http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing",
        "HOMEPAGE": "http://your-server:4399",
    }
)

# Evaluate your trained agent
result = env.reset()
while not result.done:
    action_str = agent.get_action(result.observation)
    action = BrowserGymAction(action_str=action_str)
    result = env.step(action)

print(f"Success: {result.reward}")
env.close()

Building the Docker Image

Prerequisites

  1. Base Image: Build the OpenEnv base image first:
# From the OpenEnv repository root
docker build -t openenv-base:latest -f src/openenv/core/containers/images/Dockerfile .

Build the BrowserGym Environment

# From the browsergym_env directory
cd envs/browsergym_env
docker build -t browsergym-env:latest -f server/Dockerfile .

Run the Server

For MiniWoB (Training):

docker run -p 8000:8000 \
  -e BROWSERGYM_BENCHMARK="miniwob" \
  -e BROWSERGYM_TASK_NAME="click-test" \
  browsergym-env:latest

For WebArena (Evaluation):

docker run -p 8000:8000 \
  -e BROWSERGYM_BENCHMARK="webarena" \
  -e BROWSERGYM_TASK_NAME="0" \
  -e SHOPPING="http://your-server:7770" \
  -e SHOPPING_ADMIN="http://your-server:7780/admin" \
  -e REDDIT="http://your-server:9999" \
  -e GITLAB="http://your-server:8023" \
  -e MAP="http://your-server:3000" \
  -e WIKIPEDIA="http://your-server:8888/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing" \
  -e HOMEPAGE="http://your-server:4399" \
  browsergym-env:latest

Environment Details

Action

Actions in BrowserGym are natural language strings that describe browser operations:

from envs.browsergym_env import BrowserGymAction

# Click actions
action = BrowserGymAction(action_str="click('Submit button')")
action = BrowserGymAction(action_str="click('element_id_123')")

# Type actions
action = BrowserGymAction(action_str="fill('username', 'john@example.com')")
action = BrowserGymAction(action_str="fill('password', 'secret123')")

# Navigate actions
action = BrowserGymAction(action_str="goto('https://example.com')")

# Keyboard actions
action = BrowserGymAction(action_str="press('Enter')")
action = BrowserGymAction(action_str="press('Tab')")

# Scroll actions
action = BrowserGymAction(action_str="scroll('down')")

Observation

Observations contain multiple modalities:

result = env.step(action)
obs = result.observation

# Text observations
print(obs.text)          # Primary text representation (AXTree or DOM)
print(obs.axtree_txt)    # Accessibility tree
print(obs.pruned_html)   # Pruned HTML (interactive elements only)

# Page metadata
print(obs.url)           # Current URL
print(obs.goal)          # Task goal/instruction

# Visual (if enabled)
if obs.screenshot is not None:
    print(obs.screenshot.shape)  # [height, width, channels]

# Error handling
if obs.last_action_error:
    print(f"Action failed: {obs.error}")

# Episode status
print(obs.done)          # True if episode ended
print(obs.reward)        # Reward for the step

# Access full BrowserGym data (includes timestamps, etc.)
print(obs.metadata["browsergym_obs"])  # Full observation dict from BrowserGym
print(obs.metadata["browsergym_info"]) # Full info dict (timestamps, page state, etc.)

Advanced: Accessing Raw BrowserGym Data

For VisualWebArena or custom training, you may need additional data like timestamps or browser state. The full BrowserGym observation and info dicts are preserved in metadata:

result = env.step(action)

# Access timestamps (if available)
info = result.observation.metadata["browsergym_info"]
if "timestamp" in info:
    print(f"Action timestamp: {info['timestamp']}")

# Access additional observation fields
obs_dict = result.observation.metadata["browsergym_obs"]
if "dom_object" in obs_dict:
    dom = obs_dict["dom_object"]
    # Work with raw DOM object

# Access page performance data
if "performance" in info:
    print(f"Page load time: {info['performance']}")

State

The environment state tracks progress:

state = env.state()

print(f"Benchmark: {state.benchmark}")     # 'miniwob', 'webarena', etc.
print(f"Task: {state.task_name}")          # Task name/ID
print(f"Episode: {state.episode_id}")      # Unique episode ID
print(f"Steps: {state.step_count}")        # Number of steps taken
print(f"Total Reward: {state.cum_reward}") # Cumulative reward
print(f"Goal: {state.goal}")               # Task instruction
print(f"URL: {state.current_url}")         # Current page URL

Configuration

Environment variables:

Common Settings

  • BROWSERGYM_BENCHMARK: Benchmark to use (miniwob, webarena, visualwebarena, workarena)
  • BROWSERGYM_TASK_NAME: Specific task name (optional, will use first available if not set)
  • BROWSERGYM_HEADLESS: Run browser in headless mode (default: true)
  • BROWSERGYM_VIEWPORT_WIDTH: Browser viewport width (default: 1280)
  • BROWSERGYM_VIEWPORT_HEIGHT: Browser viewport height (default: 720)
  • BROWSERGYM_TIMEOUT: Action timeout in milliseconds (default: 10000)

WebArena-Specific (only needed for WebArena benchmark)

  • SHOPPING: Shopping website URL
  • SHOPPING_ADMIN: Shopping admin panel URL
  • REDDIT: Reddit-like forum URL
  • GITLAB: GitLab instance URL
  • MAP: Map service URL
  • WIKIPEDIA: Wikipedia instance URL
  • HOMEPAGE: Homepage URL

Supported Benchmarks

1. MiniWoB++ (Training) βœ… Recommended for Training

  • 100+ tasks ranging from simple (click buttons) to complex (form filling, navigation)
  • Fast: Instant resets, quick episodes
  • Randomized: Task variations for generalization
  • No setup: Works out-of-the-box
  • Dense rewards: Immediate feedback for learning

Use Case: Train agents on fundamental web navigation skills

2. WebArena (Evaluation) πŸ“Š Benchmark

  • 812 realistic tasks across 6 websites
  • Complex: Multi-step reasoning, real web interfaces
  • Requires setup: Need to run 7 backend services
  • Sparse rewards: Binary success/failure
  • Evaluation-focused: Test real-world performance

Use Case: Evaluate agents on realistic web tasks

3. VisualWebArena (Evaluation) πŸ‘οΈ Visual Benchmark

  • 910 tasks requiring visual understanding
  • Multimodal: Both text and visual observations
  • Requires setup: Similar to WebArena
  • Challenging: Requires visual reasoning

Use Case: Test visual web navigation capabilities

4. WorkArena (Evaluation) πŸ’Ό Enterprise Benchmark

  • Enterprise tasks: CRM, project management, etc.
  • Realistic workflows: Real enterprise software
  • Requires setup: Enterprise software instances

Use Case: Evaluate on business automation tasks

Typical Training Pipeline

from envs.browsergym_env import BrowserGymEnv, BrowserGymAction

# Stage 1: Train on MiniWoB (simple tasks, fast)
train_env = BrowserGymEnv.from_docker_image(
    "browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "miniwob",
        "BROWSERGYM_TASK_NAME": "click-button",
    }
)

# Train your agent (RL, imitation learning, etc.)
agent.train(train_env, num_episodes=10000)
train_env.close()

# Stage 2: Evaluate on WebArena (complex tasks, realistic)
eval_env = BrowserGymEnv.from_docker_image(
    "browsergym-env:latest",
    environment={
        "BROWSERGYM_BENCHMARK": "webarena",
        "BROWSERGYM_TASK_NAME": "0",
        # ... WebArena URLs
    }
)

# Test performance
success_rate = agent.evaluate(eval_env, num_tasks=812)
print(f"WebArena Success Rate: {success_rate:.2%}")
eval_env.close()

Development & Testing

Running Tests

# From the OpenEnv repository root
pytest tests/envs/test_browsergym_env.py

Local Development

# Install in development mode
cd /path/to/OpenEnv
pip install -e .

# Install BrowserGym
pip install browsergym browsergym-miniwob browsergym-webarena

# Run the server locally
cd envs/browsergym_env/server
export BROWSERGYM_BENCHMARK=miniwob
export BROWSERGYM_TASK_NAME=click-test
python app.py

Project Structure

browsergym_env/
β”œβ”€β”€ __init__.py              # Module exports
β”œβ”€β”€ models.py                # Action, Observation, State dataclasses
β”œβ”€β”€ client.py                # HTTPEnvClient implementation
β”œβ”€β”€ README.md                # This file
└── server/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ app.py               # FastAPI application
    β”œβ”€β”€ browsergym_environment.py  # Environment implementation
    β”œβ”€β”€ Dockerfile           # Container specification
    └── requirements.txt     # Python dependencies

References

Update on GitHub