Lesson 6 — Search and Extract: From Query to Structured Data

⏱ Est. reading time: 4 min Updated on 5/7/2026

Beyond scraping known URLs, Firecrawl can search the vast internet for information and transform it into precise JSON format.

6.1 Search: Unified Search and Scrape

The Search tool returns not just a list of search results but also automatically scrapes the Markdown content of those results.

Key Parameters:

  • query: Search keywords.
  • limit: Number of results to return.
  • scrapeOptions: Scraping parameters for the result pages (same as Scrape).

Search Operators:

  • Exact Match: "Firecrawl MCP"
  • Exclusion: -deprecated
  • Domain Limit: site:github.com
  • Title Keyword: intitle:tutorial

6.2 Extract: Multi-page Structured Extraction

The Extract tool utilizes LLMs to extract structured data from one or more URLs according to a schema you define.

Example Scenario: Extract pricing info from three competitor pages.

{
  "urls": ["url1", "url2", "url3"],
  "prompt": "Extract product name, monthly price, and key features",
  "schema": {
    "type": "object",
    "properties": {
      "name": { "type": "string" },
      "price": { "type": "string" },
      "features": { "type": "array", "items": { "type": "string" } }
    }
  }
}

6.3 Extract vs Scrape (JSON Format)

You might notice that Scrape also supports jsonOptions. What's the difference?

Dimension Extract Scrape (JSON Format)
Input List of URLs Single URL
Primary Goal Batch data collection, comparison Precise structure for a single page
Schema Definition Top-level parameter Nested within jsonOptions

6.4 Practical Tips

  • Precise Filtering: When using Search, combine it with includeDomains or excludeDomains to drastically increase efficiency. For example, search only within official documentation and GitHub.
  • Avoid Token Overflow: When Extracting from multiple pages, keep the Schema concise and only extract the most critical fields.
  • Prompt Optimization: In Extract, a clearly described Prompt is often more effective than a complex Schema in improving AI extraction success rates.