> ## Documentation Index
> Fetch the complete documentation index at: https://docs.stagehand.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# extract()

> Complete API reference for the extract() method

export const V3Banner = () => null;

<V3Banner />

<CardGroup cols={1}>
  <Card title="Extract" icon="ufo-beam" href="/v3/basics/extract">
    See how to use extract() to extract structured data from web pages
  </Card>
</CardGroup>

### Method Signatures

<Tabs>
  <Tab title="TypeScript">
    ```typescript theme={null}
    // No parameters (raw page content)
    await stagehand.extract(): Promise<{ pageText: string }>

    // Options only (for example, for targeted extraction)
    await stagehand.extract(options: ExtractOptions): Promise<{ pageText: string }>

    // String instruction only
    await stagehand.extract(instruction: string): Promise<{ extraction: string }>

    // With schema
    await stagehand.extract<T extends ZodTypeAny>(
      instruction: string,
      schema: T,
      options?: ExtractOptions
    ): Promise<z.infer<T>>
    ```

    **ExtractOptions Interface:**

    ```typescript theme={null}
    interface ExtractOptions {
      model?: ModelConfiguration;
      timeout?: number;
      selector?: string;
      page?: PlaywrightPage | PuppeteerPage | PatchrightPage | Page;
      serverCache?: boolean;
    }

    // ModelConfiguration can be either a string or an object
    type ModelConfiguration =
      | string  // Format: "provider/model" (e.g., "openai/gpt-5-mini", "anthropic/claude-sonnet-4-5")
      | {
          modelName: string;  // The model name
          apiKey?: string;    // Optional: API key override
          baseURL?: string;   // Optional: Base URL override
          // Additional provider-specific options
        }
    ```
  </Tab>
</Tabs>

### Parameters

<ParamField path="instruction" type="string" optional>
  Natural language description of what data to extract. If omitted with no schema, returns raw page text.
</ParamField>

<ParamField path="schema" type="ZodTypeAny" optional>
  Zod schema defining the structure of data to extract. Ensures type safety and validation. The return type is automatically inferred from the schema.
</ParamField>

<ParamField path="model" type="ModelConfiguration" optional>
  Configure the AI model to use for this action. Can be either:

  * A string in the format `"provider/model"` (e.g., `openai/gpt-5`, `google/gemini-2.5-flash`)
  * An object with detailed configuration

  <Expandable title="Model Configuration Object">
    <ParamField path="modelName" type="string" required>
      The model name (e.g., `anthropic/claude-sonnet-4-5`, `google/gemini-2.5-flash`)
    </ParamField>

    <ParamField path="apiKey" type="string" optional>
      API key for the model provider (overrides default)
    </ParamField>

    <ParamField path="baseURL" type="string" optional>
      Base URL for the API endpoint (for custom endpoints or proxies)
    </ParamField>
  </Expandable>
</ParamField>

<ParamField path="timeout" type="number" optional>
  Maximum time in milliseconds to wait for the extraction to complete. Default varies by configuration.
</ParamField>

<ParamField path="selector" type="string" optional>
  Optional selector (XPath, CSS selector, etc.) to limit extraction scope to a specific part of the page. Reduces token usage and improves accuracy.
</ParamField>

<ParamField path="page" type="PlaywrightPage | PuppeteerPage | PatchrightPage | Page" optional>
  Optional: Specify which page to perform the extraction on. Supports multiple browser automation libraries:

  * **Playwright**: Native Playwright Page objects
  * **Puppeteer**: Puppeteer Page objects
  * **Patchright**: Patchright Page objects
  * **Stagehand Page**: Stagehand's wrapped Page object

  If not specified, defaults to the current "active" page in your Stagehand instance.
</ParamField>

<ParamField path="serverCache" type="boolean" optional>
  Override the instance-level `serverCache` setting for this request. When `true`, enables server-side caching. When `false`, disables it.

  <Note>Only applies when `env` is `"BROWSERBASE"`. Has no effect in local environments.</Note>

  Defaults to the value set on the Stagehand constructor (which itself defaults to `true`).
</ParamField>

### Built-in Support

<Note>
  **Iframe and Shadow DOM interactions are supported out of the box.** Stagehand automatically handles iframe traversal and shadow DOM elements without requiring additional configuration or flags.
</Note>

### Response Types

<Tabs>
  <Tab title="With Schema">
    **Returns:** `Promise<z.infer<T> & { cacheStatus?: "HIT" | "MISS" }>` where T is your schema

    The returned object will be strictly typed according to your Zod schema definition. The optional `cacheStatus` field indicates whether the result was served from the server-side cache (`"HIT"`) or computed fresh (`"MISS"`). Only present when running with `env: "BROWSERBASE"` and server-side caching is enabled.
  </Tab>

  <Tab title="String Only">
    **Returns:** `Promise<{ extraction: string; cacheStatus?: "HIT" | "MISS" }>`

    `extraction`: Simple string extraction without schema validation. The optional `cacheStatus` field indicates cache hit or miss when using Browserbase with server-side caching.
  </Tab>

  <Tab title="No Parameters">
    **Returns:** `Promise<{ pageText: string }>`

    `pageText`: Raw accessibility tree representation of page content.
  </Tab>
</Tabs>

### Code Examples

<Tabs>
  <Tab title="Single Object">
    ```typescript theme={null}
    import { Stagehand } from "@browserbasehq/stagehand";
    import { z } from 'zod';

    // Initialize with Browserbase (API key from environment variable)
    // Set BROWSERBASE_API_KEY in your environment
    const stagehand = new Stagehand({ env: "BROWSERBASE" });
    await stagehand.init();
    const page = stagehand.context.pages()[0];

    await page.goto("https://example.com/product");

    // Schema definition
    const ProductSchema = z.object({
      name: z.string(),
      price: z.number(),
      inStock: z.boolean()
    });

    // Extraction with v3 API
    const product = await stagehand.extract(
      "extract product details", 
      ProductSchema
    );
    ```

    #### Example Response

    ```json theme={null}
    {
      "name": "Product Name",
      "price": 100,
      "inStock": true
    }
    ```
  </Tab>

  <Tab title="Arrays">
    ```typescript theme={null}
    import { z } from 'zod';

    // Schema definition
    const ApartmentListingsSchema = z.array(
      z.object({
        address: z.string(),
        price: z.string(),
        bedrooms: z.number()
      })
    );

    // Extraction with v3 API
    const listings = await stagehand.extract(
      "extract all apartment listings",
      ApartmentListingsSchema
    );
    ```

    #### Example Response

    ```json theme={null}
    [
      {
        "address": "123 Main St",
        "price": "$100,000",
        "bedrooms": 3
      },
      {
        "address": "456 Elm St",
        "price": "$150,000",
        "bedrooms": 2
      }
    ]
    ```
  </Tab>

  <Tab title="URLs">
    ```typescript theme={null}
    import { z } from 'zod';

    // Schema definition
    const NavigationSchema = z.object({
      links: z.array(z.object({
        text: z.string(),
        url: z.string().url()  // URL validation
      }))
    });

    // Extraction with v3 API
    const links = await stagehand.extract(
      "extract navigation links",
      NavigationSchema
    );
    ```

    #### Example Response

    ```json theme={null}
    {
      "links": [
        {
          "text": "Home",
          "url": "https://example.com"
        }
      ]
    }
    ```
  </Tab>

  <Tab title="Scoped">
    ```typescript theme={null}
    import { z } from 'zod';

    const ProductSchema = z.object({
      name: z.string(),
      price: z.number(),
      description: z.string()
    });

    // Extract from specific page section with v3 API
    const data = await stagehand.extract(
      "extract product info from this section",
      ProductSchema,
      { selector: "/html/body/div/div" }
    );
    ```

    #### Example Response

    ```json theme={null}
    {
      "name": "Product Name",
      "price": 100,
      "description": "Product description"
    }
    ```
  </Tab>

  <Tab title="Schema-less">
    ```typescript theme={null}
    // String only extraction
    const title = await stagehand.extract("get the page title");
    // Returns: { extraction: "Page Title" }

    // Raw page content
    const content = await stagehand.extract();
    // Returns: { pageText: "Accessibility Tree: ..." }
    ```

    #### Example Response

    ```json theme={null}
    {
      "extraction": "Page Title"
    }
    ```
  </Tab>

  <Tab title="Advanced">
    ```typescript theme={null}
    import { z } from 'zod';

    // Schema with descriptions and validation
    const ProductSchema = z.object({
      price: z.number().describe("Product price in USD"),
      rating: z.number().min(0).max(5).describe("Customer rating out of 5"),
      available: z.boolean().describe("Whether product is in stock"),
      tags: z.array(z.string()).optional()
    });

    // Nested schema
    const EcommerceSchema = z.object({
      product: z.object({
        name: z.string(),
        price: z.object({
          current: z.number(),
          original: z.number().optional()
        })
      }),
      reviews: z.array(z.object({
        rating: z.number(),
        comment: z.string()
      }))
    });
    ```

    #### Example Response

    ```json theme={null}
    {
      "product": {
        "name": "Product Name",
        "price": {
          "current": 100,
          "original": 120
        }
      },
      "reviews": [
        {
          "rating": 4,
          "comment": "Great product!"
        }
      ]
    }
    ```
  </Tab>
</Tabs>

### Additional Examples

<Tabs>
  <Tab title="Custom Model">
    ```typescript theme={null}
    import { z } from 'zod';

    const DataSchema = z.object({
      title: z.string(),
      content: z.string()
    });

    // Using string format
    const data1 = await stagehand.extract(
      "extract article data",
      DataSchema,
      { model: "openai/gpt-5-mini" }
    );

    // Using object format with custom configuration
    const data2 = await stagehand.extract(
      "extract article data",
      DataSchema,
      {
        model: {
          modelName: "claude-sonnet-4-6",
          apiKey: process.env.ANTHROPIC_API_KEY
        }
      }
    );
    ```
  </Tab>

  <Tab title="Multi-Page">
    ```typescript theme={null}
    import { z } from 'zod';

    const page1 = stagehand.context.pages()[0];
    const page2 = await stagehand.context.newPage();

    const Schema = z.object({ title: z.string() });

    const data1 = await stagehand.extract("get title", Schema, { page: page1 });
    const data2 = await stagehand.extract("get title", Schema, { page: page2 });
    ```
  </Tab>
</Tabs>

### Error Types

The following errors may be thrown by the `extract()` method:

* **StagehandError** - Base class for all Stagehand-specific errors
* **ZodSchemaValidationError** - Extracted data does not match the provided Zod schema
* **StagehandDomProcessError** - Error occurred while processing the DOM
* **StagehandEvalError** - Error occurred while evaluating JavaScript in the page context
* **StagehandIframeError** - Unable to resolve iframe for the target element
* **ContentFrameNotFoundError** - Unable to obtain content frame for the selector
* **XPathResolutionError** - XPath does not resolve in the current page or frames
* **StagehandShadowRootMissingError** - No shadow root present on the resolved host element
* **LLMResponseError** - Error in LLM response processing
* **MissingLLMConfigurationError** - No LLM API key or client configured
* **UnsupportedModelError** - The specified model is not supported for this operation
* **InvalidAISDKModelFormatError** - Model string does not follow the required `provider/model` format
