Extract

What is `extract()`?

await stagehand.extract("extract the name of the repository");

extract() grabs structured data from a webpage. You can define your schema with Zod (TypeScript) or JSON. If you don’t want to define a schema, you can also call extract with just a natural language prompt, or call extract with no parameters.

Why use `extract()`?

Structured

Turn messy webpage data into clean objects that follow a schema.

Resilient

Build resilient extractions that don’t break when the website changes

Return value

When you use extract(), Stagehand will return a Promise<ExtractResult> with the following structure:

Basic Schema
Array
Primitive
Instruction Only
No Parameters

When extracting with a schema, the return type is inferred from your Zod schema:

const result = await stagehand.extract(
  "extract product details",
  z.object({
    name: z.string(),
    price: z.number(),
    inStock: z.boolean()
  })
);

Example result:

{
  name: "Wireless Mouse",
  price: 29.99,
  inStock: true
}

Advanced Configuration

You can pass additional options to configure the model, timeout, and selector scope:

const result = await stagehand.extract("extract the repository name", {
  model: "anthropic/claude-sonnet-4-5",
  timeout: 30000,
  selector: "//header" // Focus on specific area
});

Targeted Extract

Pass a selector to extract to target a specific element on the page.

This helps reduce the context passed to the LLM, optimizing token usage/speed and improving accuracy.

const tableData = await stagehand.extract(
  "Extract the values of the third row",
  z.object({
    values: z.array(z.string())
  }),
  {
    // xPath or CSS selector
    selector: "xpath=/html/body/div/table/" 
  }
);

Best practices

Extract with Context

You can provide additional context to your schema to help the model extract the data more accurately.

const apartments = await stagehand.extract(
  "Extract ALL the apartment listings and their details, including address, price, and square feet.",
  z.array(
    z.object({
      address: z.string().describe("the address of the apartment"),
      price: z.string().describe("the price of the apartment"),
      square_feet: z.string().describe("the square footage of the apartment"),
    })
  )
);

Link Extraction

To extract links or URLs, define the relevant field as z.string().url().

Here is how an extract call might look for extracting a link or URL. This also works for image links.

const contactLink = await stagehand.extract(
  "extract the link to the 'contact us' page",
  z.string().url() // note the usage of z.string().url() for URL validation
);

console.log("the link to the contact us page is: ", contactLink);

Inside Stagehand, extracting links works by asking the LLM to select an ID. Stagehand looks up that ID in a mapping of IDs -> URLs. When logging the LLM trace, you should expect to see IDs. The actual URLs will be included in the final ExtractResult.

Troubleshooting

Empty or partial results

Problem: extract() returns empty or incomplete dataSolutions:

Check your instruction clarity: Make sure your instruction is specific and describes exactly what data you want to extract
Verify the data exists: Use stagehand.observe() first to confirm the data is present on the page
Wait for dynamic content: If the page loads content dynamically, use stagehand.act("wait for the content to load") before extracting

Solution: Wait for content before extracting

// Wait for content before extracting
await stagehand.act("wait for the product listings to load");
const products = await stagehand.extract(
  "extract all product names and prices",
  z.array(z.object({
    name: z.string(),
    price: z.string()
  }))
);

Schema validation errors

Problem: Getting schema validation errors or type mismatchesSolutions:

Use optional fields: Make fields optional with z.optional() if the data might not always be present
Use flexible types: Consider using z.string() instead of z.number() for prices that might include currency symbols
Add descriptions: Use .describe() to help the model understand field requirements

Solution: More flexible schema

const schema = z.object({
  price: z.string().describe("price including currency symbol, e.g., '$19.99'"),
  availability: z.string().optional().describe("stock status if available"),
  rating: z.number().optional()
});

Inconsistent results

Problem: Extraction results vary between runsSolutions:

Be more specific in instructions: Instead of “extract prices”, use “extract the numerical price value for each item”
Use context in schema descriptions: Add field descriptions to guide the model
Combine with observe: Use stagehand.observe() to understand the page structure first

Solution: Validate with observe first

// First observe to understand the page structure
const elements = await stagehand.observe("find all product listings");
console.log("Found elements:", elements.map(e => e.description));

// Then extract with specific targeting
const products = await stagehand.extract(
  "extract name and price from each product listing shown on the page",
  z.array(z.object({
    name: z.string().describe("the product title or name"),
    price: z.string().describe("the price as displayed, including currency")
  }))
);

Performance issues

Problem: Extraction is slow or timing outSolutions:

Reduce scope: Extract smaller chunks of data in multiple calls rather than everything at once
Use targeted instructions: Be specific about which part of the page to focus on
Consider pagination: For large datasets, extract one page at a time
Increase timeout: Use timeoutMs parameter for complex extractions

Solution: Break down large extractions

// Instead of extracting everything at once
const allData = [];
const pageNumbers = [1, 2, 3, 4, 5];

for (const pageNum of pageNumbers) {
  await stagehand.act(`navigate to page ${pageNum}`);

  const pageData = await stagehand.extract(
    "extract product data from the current page only",
    z.array(z.object({
      name: z.string(),
      price: z.number()
    })),
    { timeout: 60000 } // 60 second timeout
  );

  allData.push(...pageData);
}

First Steps

The Basics

Configuration

Best Practices

Integrations

Reference

Migration Guides

What is `extract()`?

Why use `extract()`?

Structured

Resilient

Return value

Advanced Configuration

Targeted Extract

Best practices

Extract with Context

Link Extraction

Troubleshooting

Next steps

Act

Observe

First Steps

The Basics

Configuration

Best Practices

Integrations

Reference

Migration Guides

​What is extract()?

​Why use extract()?

Structured

Resilient

​Return value

​Advanced Configuration

​Targeted Extract

​Best practices

​Extract with Context

​Link Extraction

​Troubleshooting

​Next steps

Act

Observe

What is `extract()`?

Why use `extract()`?

Return value

Advanced Configuration

Targeted Extract

Best practices

Extract with Context

Link Extraction

Troubleshooting

Next steps