extract() grabs structured text from the current page using structured schemas. Given instructions and schema, you will receive structured data.

For TypeScript, the extract schemas are defined using zod schemas.

For Python, the extract schemas are defined using pydantic models.

Extract a single object

Here is how an extract call might look for a single object:

const item = await page.extract({
  instruction: "extract the price of the item",
  schema: z.object({
    price: z.number(),
  }),
});

Your output schema will look like:

{ price: number }
To extract links or URLs, specify this clearly in your instruction. See the snippet below.

Here is how an extract call might look for extracting a link or URL.

const extraction = await page.extract({
  instruction: "extract the link to the 'contact us' page",
  schema: z.object({
    link: z.string().url(),
  }),
});

console.log("the link to the contact us page is: ", extraction.link);

Extract a list of objects

Here is how an extract call might look for a list of objects.

const apartments = await page.extract({
  instruction:
    "Extract ALL the apartment listings and their details, including address, price, and square feet."
  schema: z.object({
    list_of_apartments: z.array(
      z.object({
        address: z.string(),
        price: z.string(),
        square_feet: z.string(),
      }),
    ),
  })
})

console.log("the apartment list is: ", apartments);

Your output schema will look like:

list_of_apartments: [
    {
      address: "street address here",
      price: "$1234.00",
      square_feet: "700"
    },
    {
        address: "another address here",
        price: "1010.00",
        square_feet: "500"
    },
    .
    .
    .
]

Extract with additional context

You can provide additional context to your schema to help the model extract the data more accurately.

const apartments = await page.extract({
 instruction:
   "Extract ALL the apartment listings and their details, including address, price, and square feet."
 schema: z.object({
   list_of_apartments: z.array(
     z.object({
       address: z.string().describe("the address of the apartment"),
       price: z.string().describe("the price of the apartment"),
       square_feet: z.string().describe("the square footage of the apartment"),
     }),
   ),
 })
})

Arguments: ExtractOptions<T extends z.AnyZodObject>

instruction
string
required

Provides instructions for extraction

schema
z.AnyZodObject
required

Defines the structure of the data to extract (TypeScript only)

iframes
boolean

Set iframes: true if the extraction content exists within an iframe.

useTextExtract
boolean
deprecated

This field is now deprecated and has no effect.

selector
string

An xpath that can be used to reduce the scope of an extraction. If an xpath is passed in, extract will only process the contents of the HTML element that the xpath points to. Useful for reducing input tokens and increasing extraction accuracy.

modelName
AvailableModel

Specifies the model to use

modelClientOptions
object

Configuration options for the model client. See ClientOptions.

domSettleTimeoutMs
number

Timeout in milliseconds for waiting for the DOM to settle

Returns: Promise<ExtractResult<T extends z.AnyZodObject>>

Resolves to the structured data as defined by the provided schema.