What is extract()?
extract grabs structured data from a webpage. You can define your schema with zod (TypeScript) or JSON. If you do not want to define a schema, you can also call extract with just a natural language prompt, or call extract with no parameters.
Why use extract()?
Structured
Turn messy webpage data into clean objects that follow a schema.
Resilient
Build resilient extractions that don’t break when the website changes
Using extract()
You can use extract() to extract structured data from a webpage. You can define your schema with zod (TypeScript) or JSON. If you do not want to define a schema, you can also call extract with just a natural language prompt, or call extract with no parameters.
Return value of extract()?
When you use extract(), Stagehand will return a Promise<ExtractResult> with the following structure:
- Basic Schema
- Array
- Primitive
- Instruction Only
- No Parameters
When extracting with a schema, the return type is inferred from your Zod schema:Example result:
Advanced Configuration
You can pass additional options to configure the model, timeout, and selector scope:Targeted Extract
Pass a selector toextract to target a specific element on the page.
This helps reduce the context passed to the LLM, optimizing token usage/speed and improving accuracy.
Best practices
Extract with Context
You can provide additional context to your schema to help the model extract the data more accurately.Link Extraction
To extract links or URLs, define the relevant field as
z.string().url().extract call might look for extracting a link or URL. This also works for image links.
Inside Stagehand, extracting links works by asking the LLM to select an ID. Stagehand looks up that ID in a mapping of IDs -> URLs. When logging the LLM trace, you should expect to see IDs. The actual URLs will be included in the final
ExtractResult.Troubleshooting
Empty or partial results
Empty or partial results
Problem:
extract() returns empty or incomplete dataSolutions:- Check your instruction clarity: Make sure your instruction is specific and describes exactly what data you want to extract
- Verify the data exists: Use
stagehand.observe()first to confirm the data is present on the page - Wait for dynamic content: If the page loads content dynamically, use
stagehand.act("wait for the content to load")before extracting
Schema validation errors
Schema validation errors
Problem: Getting schema validation errors or type mismatchesSolutions:
- Use optional fields: Make fields optional with
z.optional()if the data might not always be present - Use flexible types: Consider using
z.string()instead ofz.number()for prices that might include currency symbols - Add descriptions: Use
.describe()to help the model understand field requirements
Inconsistent results
Inconsistent results
Problem: Extraction results vary between runsSolutions:
- Be more specific in instructions: Instead of “extract prices”, use “extract the numerical price value for each item”
- Use context in schema descriptions: Add field descriptions to guide the model
- Combine with observe: Use
stagehand.observe()to understand the page structure first
Performance issues
Performance issues
Problem: Extraction is slow or timing outSolutions:
- Reduce scope: Extract smaller chunks of data in multiple calls rather than everything at once
- Use targeted instructions: Be specific about which part of the page to focus on
- Consider pagination: For large datasets, extract one page at a time
- Increase timeout: Use
timeoutMsparameter for complex extractions

