This tool integrates the Stagehand Python SDK with CrewAI, allowing agents to interact with websites and automate browser tasks using natural language instructions.
Description
The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with the ability to control a real web browser and interact with websites using three core primitives:
- Act: Perform actions like clicking, typing, or navigating
- Extract: Extract structured data from web pages
- Observe: Identify and analyze elements on the page
Requirements
Before using this tool, you will need:
- A Browserbase account with API key and project ID
- An API key for an LLM (OpenAI or Anthropic Claude)
- The Stagehand Python SDK installed
Install the dependencies:
pip install stagehand-py crewai crewai-tools
Usage
Basic Usage
from crewai import Agent, Task, Crew
from crewai_tools import StagehandTool
from stagehand.schemas import AvailableModel
# Initialize the tool with your API keys
stagehand_tool = StagehandTool(
api_key="your-browserbase-api-key",
project_id="your-browserbase-project-id",
model_api_key="your-llm-api-key", # OpenAI or Anthropic API key
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST, # Optional: specify which model to use
)
# Create an agent with the tool
researcher = Agent(
role="Web Researcher",
goal="Find and summarize information from websites",
backstory="I'm an expert at finding information online.",
verbose=True,
tools=[stagehand_tool],
)
# Create a task that uses the tool
research_task = Task(
description="Go to https://www.example.com and tell me what you see on the homepage.",
agent=researcher,
)
# Run the crew
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=True,
)
result = crew.kickoff()
print(result)
Command Types
The StagehandTool supports three different command types, each designed for specific web automation tasks:
1. Act - Perform Actions on a Page
The act
command type (default) allows the agent to perform actions on a webpage, such as clicking buttons, filling forms, navigating, and more.
When to use: Use act
when you need to interact with a webpage by performing actions like clicking, typing, scrolling, or navigating.
Example usage:
# Perform an action (default behavior)
result = stagehand_tool.run(
instruction="Click the login button",
url="https://example.com",
command_type="act" # Default, so can be omitted
)
# Fill out a form
result = stagehand_tool.run(
instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'",
url="https://example.com/contact"
)
# Multiple actions in sequence
result = stagehand_tool.run(
instruction="Search for 'AI tools' in the search box and press Enter",
url="https://example.com"
)
The extract
command type allows the agent to extract structured data from a webpage, such as product information, article text, or table data.
When to use: Use extract
when you need to retrieve specific information from a webpage in a structured format.
Example usage:
# Extract all product information
result = stagehand_tool.run(
instruction="Extract all product names, prices, and descriptions",
url="https://example.com/products",
command_type="extract"
)
# Extract specific information with a selector
result = stagehand_tool.run(
instruction="Extract the main article title and content",
url="https://example.com/blog/article",
command_type="extract",
selector=".article-container" # Optional CSS selector to limit extraction scope
)
# Extract tabular data
result = stagehand_tool.run(
instruction="Extract the data from the pricing table as a structured list of plans with their features and costs",
url="https://example.com/pricing",
command_type="extract",
selector=".pricing-table"
)
3. Observe - Identify Elements on a Page
The observe
command type allows the agent to identify and analyze specific elements on a webpage, returning information about their attributes, location, and suggested actions.
When to use: Use observe
when you need to identify UI elements, understand page structure, or determine what actions are possible.
Example usage:
# Find interactive elements
result = stagehand_tool.run(
instruction="Find all interactive elements in the navigation menu",
url="https://example.com",
command_type="observe"
)
# Identify form fields
result = stagehand_tool.run(
instruction="Identify all the input fields in the registration form",
url="https://example.com/register",
command_type="observe",
selector="#registration-form"
)
# Analyze page structure
result = stagehand_tool.run(
instruction="Find the main content sections of this page",
url="https://example.com/about",
command_type="observe"
)
Advanced Configuration
You can customize the behavior of the StagehandTool by specifying different parameters:
stagehand_tool = StagehandTool(
api_key="your-browserbase-api-key",
project_id="your-browserbase-project-id",
model_api_key="your-llm-api-key",
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
dom_settle_timeout_ms=5000, # Wait longer for DOM to settle
headless=True, # Run browser in headless mode (no visible window)
self_heal=True, # Attempt to recover from errors
wait_for_captcha_solves=True, # Wait for CAPTCHA solving
verbose=1, # Control logging verbosity (0-3)
)
Task Examples for CrewAI Agents
Here are some examples of tasks that effectively use the StagehandTool:
from crewai import Agent, Task, Crew
from crewai_tools import StagehandTool
from stagehand.schemas import AvailableModel
import os
# Get API keys from environment
browserbase_api_key = os.environ.get("BROWSERBASE_API_KEY")
browserbase_project_id = os.environ.get("BROWSERBASE_PROJECT_ID")
model_api_key = os.environ.get("OPENAI_API_KEY") # or ANTHROPIC_API_KEY
# Initialize the tool
stagehand_tool = StagehandTool(
api_key=browserbase_api_key,
project_id=browserbase_project_id,
model_api_key=model_api_key,
model_name=AvailableModel.GPT_4O,
)
# Create an agent
researcher = Agent(
role="Web Researcher",
goal="Gather product information from an e-commerce website",
backstory="I specialize in extracting and analyzing web data.",
verbose=True,
tools=[stagehand_tool],
)
# Form submission task
form_submission_task = Task(
description="""
Submit a contact form on example.com:
1. Go to example.com/contact
2. Fill out the contact form with:
- Name: John Doe
- Email: john@example.com
- Subject: Information Request
- Message: I would like to learn more about your services
3. Submit the form
4. Confirm the submission was successful
""",
agent=researcher,
)
# Run the crew
crew = Crew(
agents=[researcher],
tasks=[form_submission_task],
verbose=True,
)
result = crew.kickoff()
print(result)
# Clean up resources
stagehand_tool.close()
Tips for Effective Use
- Be specific in instructions: The more specific your instructions, the better the results. For example, instead of “click the button,” use “click the ‘Submit’ button at the bottom of the contact form.”
- Use the right command type: Choose the appropriate command type based on your task:
- Use
act
for interactions and navigation
- Use
extract
for gathering information
- Use
observe
for understanding page structure
- Leverage selectors: When extracting data or observing elements, use CSS selectors to narrow the scope and improve accuracy.
- Handle multi-step processes: For complex workflows, break them down into multiple tool calls, each handling a specific step.
- Error handling: Implement appropriate error handling in your agent’s logic to deal with potential issues like elements not found or pages not loading.
Troubleshooting
- Session not starting: Ensure you have valid API keys for both Browserbase and your LLM provider.
- Elements not found: Try increasing the
dom_settle_timeout_ms
parameter to give the page more time to load.
- Actions not working: Make sure your instructions are clear and specific. You may need to use
observe
first to identify the correct elements.
- Extract returning incomplete data: Try refining your instruction or providing a more specific selector.
Additional Resources
Join the Stagehand Slack community for support and to connect with other users.