Skip to main content

Method Signatures

  • TypeScript
// No parameters (raw page content)
await stagehand.extract(): Promise<{ pageText: string }>

// Options only (for example, for targeted extraction)
await stagehand.extract(options: ExtractOptions): Promise<{ pageText: string }>

// String instruction only
await stagehand.extract(instruction: string): Promise<{ extraction: string }>

// With schema
await stagehand.extract<T extends ZodTypeAny>(
  instruction: string,
  schema: T,
  options?: ExtractOptions
): Promise<z.infer<T>>
ExtractOptions Interface:
interface ExtractOptions {
  model?: ModelConfiguration;
  timeout?: number;
  selector?: string;
  page?: PlaywrightPage | PuppeteerPage | PatchrightPage | Page;
}

// ModelConfiguration can be either a string or an object
type ModelConfiguration =
  | string  // Format: "provider/model" (e.g., "openai/gpt-5-mini", "anthropic/claude-sonnet-4-5")
  | {
      modelName: string;  // The model name
      apiKey?: string;    // Optional: API key override
      baseURL?: string;   // Optional: Base URL override
      // Additional provider-specific options
    }

Parameters

instruction
string
Natural language description of what data to extract. If omitted with no schema, returns raw page text.
schema
ZodTypeAny
Zod schema defining the structure of data to extract. Ensures type safety and validation. The return type is automatically inferred from the schema.
model
ModelConfiguration
Configure the AI model to use for this action. Can be either:
  • A string in the format "provider/model" (e.g., openai/gpt-5, google/gemini-2.5-flash)
  • An object with detailed configuration
timeout
number
Maximum time in milliseconds to wait for the extraction to complete. Default varies by configuration.
selector
string
Optional selector (XPath, CSS selector, etc.) to limit extraction scope to a specific part of the page. Reduces token usage and improves accuracy.
page
PlaywrightPage | PuppeteerPage | PatchrightPage | Page
Optional: Specify which page to perform the extraction on. Supports multiple browser automation libraries:
  • Playwright: Native Playwright Page objects
  • Puppeteer: Puppeteer Page objects
  • Patchright: Patchright Page objects
  • Stagehand Page: Stagehand’s wrapped Page object
If not specified, defaults to the current “active” page in your Stagehand instance.

Built-in Support

Iframe and Shadow DOM interactions are supported out of the box. Stagehand automatically handles iframe traversal and shadow DOM elements without requiring additional configuration or flags.

Response Types

  • With Schema
  • String Only
  • No Parameters
Returns: Promise<z.infer<T>> where T is your schemaThe returned object will be strictly typed according to your Zod schema definition.

Code Examples

  • Single Object
  • Arrays
  • URLs
  • Scoped
  • Schema-less
  • Advanced
import { Stagehand } from "@browserbasehq/stagehand";
import { z } from 'zod';

// Initialize with Browserbase (API key and project ID from environment variables)
// Set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID in your environment
const stagehand = new Stagehand({ env: "BROWSERBASE" });
await stagehand.init();
const page = stagehand.context.pages()[0];

await page.goto("https://example.com/product");

// Schema definition
const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean()
});

// Extraction with v3 API
const product = await stagehand.extract(
  "extract product details", 
  ProductSchema
);

Example Response

{
  "name": "Product Name",
  "price": 100,
  "inStock": true
}

Additional Examples

  • Custom Model
  • Multi-Page
import { z } from 'zod';

const DataSchema = z.object({
  title: z.string(),
  content: z.string()
});

// Using string format
const data1 = await stagehand.extract(
  "extract article data",
  DataSchema,
  { model: "openai/gpt-5-mini" }
);

// Using object format with custom configuration
const data2 = await stagehand.extract(
  "extract article data",
  DataSchema,
  {
    model: {
      modelName: "claude-3-5-sonnet-20241022",
      apiKey: process.env.ANTHROPIC_API_KEY
    }
  }
);

Error Types

The following errors may be thrown by the extract() method:
  • StagehandError - Base class for all Stagehand-specific errors
  • ZodSchemaValidationError - Extracted data does not match the provided Zod schema
  • StagehandDomProcessError - Error occurred while processing the DOM
  • StagehandEvalError - Error occurred while evaluating JavaScript in the page context
  • StagehandIframeError - Unable to resolve iframe for the target element
  • ContentFrameNotFoundError - Unable to obtain content frame for the selector
  • XPathResolutionError - XPath does not resolve in the current page or frames
  • StagehandShadowRootMissingError - No shadow root present on the resolved host element
  • LLMResponseError - Error in LLM response processing
  • MissingLLMConfigurationError - No LLM API key or client configured
  • UnsupportedModelError - The specified model is not supported for this operation
  • InvalidAISDKModelFormatError - Model string does not follow the required provider/model format