Extract structured data from the page
extract()
grabs structured text from the current page using structured schemas. Given instructions and schema
, you will receive structured data.
For TypeScript, the extract schemas are defined using zod schemas.
For Python, the extract schemas are defined using pydantic models.
Here is how an extract
call might look for a single object:
Your output schema will look like:
Here is how an extract
call might look for extracting a link or URL.
Here is how an extract
call might look for a list of objects.
Your output schema will look like:
You can provide additional context to your schema to help the model extract the data more accurately.
ExtractOptions<T extends z.AnyZodObject>
Provides instructions for extraction
Defines the structure of the data to extract (TypeScript only)
Set iframes: true
if the extraction content exists within an iframe.
This field is now deprecated and has no effect.
An xpath that can be used to reduce the scope of an extraction. If an xpath is passed in, extract
will only process
the contents of the HTML element that the xpath points to. Useful for reducing input tokens and increasing extraction
accuracy.
Specifies the model to use
Configuration options for the model client. See ClientOptions
.
Timeout in milliseconds for waiting for the DOM to settle
Promise<ExtractResult<T extends z.AnyZodObject>>
Resolves to the structured data as defined by the provided schema
.