Extract

extract() grabs structured text from the current page using structured schemas. Given instructions and schema, you will receive structured data.

For TypeScript, the extract schemas are defined using zod schemas.

For Python, the extract schemas are defined using pydantic models.

Extract a single object

Here is how an extract call might look for a single object:

const item = await page.extract({
  instruction: "extract the price of the item",
  schema: z.object({
    price: z.number(),
  }),
});

Your output schema will look like:

{ price: number }

Extract a link

To extract links or URLs, specify this clearly in your instruction. See the snippet below.

Here is how an extract call might look for extracting a link or URL.

const extraction = await page.extract({
  instruction: "extract the link to the 'contact us' page",
  schema: z.object({
    link: z.string().url(),
  }),
});

console.log("the link to the contact us page is: ", extraction.link);

Extract a list of objects

Here is how an extract call might look for a list of objects.

const apartments = await page.extract({
  instruction:
    "Extract ALL the apartment listings and their details, including address, price, and square feet."
  schema: z.object({
    list_of_apartments: z.array(
      z.object({
        address: z.string(),
        price: z.string(),
        square_feet: z.string(),
      }),
    ),
  })
})

console.log("the apartment list is: ", apartments);

Your output schema will look like:

list_of_apartments: [
    {
      address: "street address here",
      price: "$1234.00",
      square_feet: "700"
    },
    {
        address: "another address here",
        price: "1010.00",
        square_feet: "500"
    },
    .
    .
    .
]

Extract with additional context

You can provide additional context to your schema to help the model extract the data more accurately.

const apartments = await page.extract({
 instruction:
   "Extract ALL the apartment listings and their details, including address, price, and square feet."
 schema: z.object({
   list_of_apartments: z.array(
     z.object({
       address: z.string().describe("the address of the apartment"),
       price: z.string().describe("the price of the apartment"),
       square_feet: z.string().describe("the square footage of the apartment"),
     }),
   ),
 })
})

Arguments: `ExtractOptions<T extends z.AnyZodObject>`

instruction

string

required

Provides instructions for extraction

schema

z.AnyZodObject

required

Defines the structure of the data to extract (TypeScript only)

iframes

boolean

Set iframes: true if the extraction content exists within an iframe.

useTextExtract

boolean

deprecated

This field is now deprecated and has no effect.

selector

string

An xpath that can be used to reduce the scope of an extraction. If an xpath is passed in, extract will only process the contents of the HTML element that the xpath points to. Useful for reducing input tokens and increasing extraction accuracy.

modelName

AvailableModel

Specifies the model to use

modelClientOptions

object

Configuration options for the model client. See ClientOptions.

domSettleTimeoutMs

number

Timeout in milliseconds for waiting for the DOM to settle

Returns: `Promise<ExtractResult<T extends z.AnyZodObject>>`

Resolves to the structured data as defined by the provided schema.

Get Started

Concepts

Playbooks

Reference

Integrations

Extract a single object

Extract a link

Extract a list of objects

Extract with additional context

Arguments: `ExtractOptions<T extends z.AnyZodObject>`

Returns: `Promise<ExtractResult<T extends z.AnyZodObject>>`

Arguments: `ExtractOptions<T extends z.AnyZodObject>`

Returns: `Promise<ExtractResult<T extends z.AnyZodObject>>`

Arguments: `ExtractOptions<T extends BaseModel>`

Returns: `Promise<ExtractResult<BaseModel>>`

Get Started

Concepts

Playbooks

Reference

Integrations

​Extract a single object

​Extract a link

​Extract a list of objects

​Extract with additional context

​Arguments: ExtractOptions<T extends z.AnyZodObject>

​Returns: Promise<ExtractResult<T extends z.AnyZodObject>>

​Arguments: ExtractOptions<T extends z.AnyZodObject>

​Returns: Promise<ExtractResult<T extends z.AnyZodObject>>

​Arguments: ExtractOptions<T extends BaseModel>

​Returns: Promise<ExtractResult<BaseModel>>

Extract a single object

Extract a link

Extract a list of objects

Extract with additional context

Arguments: `ExtractOptions<T extends z.AnyZodObject>`

Returns: `Promise<ExtractResult<T extends z.AnyZodObject>>`

Arguments: `ExtractOptions<T extends z.AnyZodObject>`

Returns: `Promise<ExtractResult<T extends z.AnyZodObject>>`

Arguments: `ExtractOptions<T extends BaseModel>`

Returns: `Promise<ExtractResult<BaseModel>>`