Lesson 6 — Search and Extract: From Query to Structured Data
Beyond scraping known URLs, Firecrawl can search the vast internet for information and transform it into precise JSON format.
6.1 Search: Unified Search and Scrape
The Search tool returns not just a list of search results but also automatically scrapes the Markdown content of those results.
Key Parameters:
query: Search keywords.limit: Number of results to return.scrapeOptions: Scraping parameters for the result pages (same as Scrape).
Search Operators:
- Exact Match:
"Firecrawl MCP" - Exclusion:
-deprecated - Domain Limit:
site:github.com - Title Keyword:
intitle:tutorial
6.2 Extract: Multi-page Structured Extraction
The Extract tool utilizes LLMs to extract structured data from one or more URLs according to a schema you define.
Example Scenario: Extract pricing info from three competitor pages.
{
"urls": ["url1", "url2", "url3"],
"prompt": "Extract product name, monthly price, and key features",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "string" },
"features": { "type": "array", "items": { "type": "string" } }
}
}
}
6.3 Extract vs Scrape (JSON Format)
You might notice that Scrape also supports jsonOptions. What's the difference?
| Dimension | Extract | Scrape (JSON Format) |
|---|---|---|
| Input | List of URLs | Single URL |
| Primary Goal | Batch data collection, comparison | Precise structure for a single page |
| Schema Definition | Top-level parameter | Nested within jsonOptions |
6.4 Practical Tips
- Precise Filtering: When using Search, combine it with
includeDomainsorexcludeDomainsto drastically increase efficiency. For example, search only within official documentation and GitHub. - Avoid Token Overflow: When Extracting from multiple pages, keep the Schema concise and only extract the most critical fields.
- Prompt Optimization: In Extract, a clearly described Prompt is often more effective than a complex Schema in improving AI extraction success rates.