Lesson 6 — Search and Extract: From Query to Structured Data — Mastering Firecrawl — The Ultimate Guide to AI Web Scraping

Beyond scraping known URLs, Firecrawl can search the vast internet for information and transform it into precise JSON format.

6.1 Search: Unified Search and Scrape

The Search tool returns not just a list of search results but also automatically scrapes the Markdown content of those results.

Key Parameters:

query: Search keywords.
limit: Number of results to return.
scrapeOptions: Scraping parameters for the result pages (same as Scrape).

Search Operators:

Exact Match: "Firecrawl MCP"
Exclusion: -deprecated
Domain Limit: site:github.com
Title Keyword: intitle:tutorial

6.2 Extract: Multi-page Structured Extraction

The Extract tool utilizes LLMs to extract structured data from one or more URLs according to a schema you define.

Example Scenario: Extract pricing info from three competitor pages.

{
  "urls": ["url1", "url2", "url3"],
  "prompt": "Extract product name, monthly price, and key features",
  "schema": {
    "type": "object",
    "properties": {
      "name": { "type": "string" },
      "price": { "type": "string" },
      "features": { "type": "array", "items": { "type": "string" } }
    }
  }
}

6.3 Extract vs Scrape (JSON Format)

You might notice that Scrape also supports jsonOptions. What's the difference?

Dimension	Extract	Scrape (JSON Format)
Input	List of URLs	Single URL
Primary Goal	Batch data collection, comparison	Precise structure for a single page
Schema Definition	Top-level parameter	Nested within `jsonOptions`

6.4 Practical Tips

Precise Filtering: When using Search, combine it with includeDomains or excludeDomains to drastically increase efficiency. For example, search only within official documentation and GitHub.
Avoid Token Overflow: When Extracting from multiple pages, keep the Schema concise and only extract the most critical fields.
Prompt Optimization: In Extract, a clearly described Prompt is often more effective than a complex Schema in improving AI extraction success rates.