Lesson 1 — What is Firecrawl: Web Data Infrastructure for AI
1.1 Positioning
Firecrawl is a web data infrastructure specifically designed for AI Agents and LLM applications. It is not just a simple web scraper, but a comprehensive platform for web data acquisition, transformation, and interaction, turning unstructured web content into clean data ready for AI consumption.
Core Value Propositions:
- Search: Built-in search engine; one call returns both search results and page content.
- Scrape: Convert any URL into Markdown, HTML, screenshots, or structured JSON.
- Interact: Perform clicks, form fills, and scrolls in the browser to handle dynamic content.
- Autonomous Research (Agent): Allows AI Agents to browse multiple sites autonomously for complex research tasks.
- File Parsing (Parse): Directly convert local PDF, Word, and Excel files into structured data.
1.2 Core Capabilities at a Glance
| Capability | API Endpoint | MCP Tool Name | Description |
|---|---|---|---|
| Scrape | /v1/scrape |
firecrawl_scrape |
Scrape a single page with JS rendering support |
| Search | /v1/search |
firecrawl_search |
Integrated search and scrape |
| Crawl | /v1/crawl |
firecrawl_crawl |
Batch deep scraping of entire sites |
| Map | /v1/map |
firecrawl_map |
Discover all URLs of a site |
| Extract | /v1/extract |
firecrawl_extract |
Structured multi-page extraction via LLM |
| Interact | /v1/scrape + interact |
firecrawl_interact |
Browser interaction after scraping |
| Parse | /v1/parse |
firecrawl_parse |
Local file parsing |
| Agent | /v1/agent |
firecrawl_agent |
Autonomous browsing research agent |
1.3 Architecture Overview
Firecrawl's underlying architecture ensures stability under high concurrency and complex anti-scraping environments:
- API Server (Express.js): Handles request dispatching, authentication, and routing.
- Worker Queue (BullMQ/Redis): Manages asynchronous tasks like Crawl and Agent jobs.
- Browser Engine (Playwright): A pool of headless browsers for JS rendering and interaction.
- Proxy Pool: Built-in global residential proxies providing three levels of anti-scraping protection.
1.4 Use Cases
| Scenario | Recommended Tool Combination |
|---|---|
| AI Agents getting real-time web info | Search → Scrape |
| Building RAG knowledge bases | Map → Crawl → Markdown |
| Competitor price monitoring | Extract + JSON Schema |
| Batch technical doc collection | Map (search) → Crawl |
| Scraping sites that require login | Scrape → Interact |
| Local PDF/Word doc parsing | Parse |
1.5 Two "Agent" Concepts: Don't Get Confused
Within the Firecrawl ecosystem, there are two distinct types of Agents with very different responsibilities:
| Feature | Firecrawl Agent (FIRE-1) | LLM Agent (e.g., Claude Code) |
|---|---|---|
| Location | Firecrawl Cloud Servers | Your local development environment |
| Decision Maker | Firecrawl AI decides what to search/visit | LLM decides which MCP tool to call |
| Execution Mode | One call completes all steps | Multiple calls in a loop |
| Billing | Dynamic billing (on-demand) | Standard API call billing |
Key Difference:
- Cloud Mode: Claude Code makes one call, and the Firecrawl Cloud Agent autonomously performs all searching and browsing.
- Local Mode: Claude Code acts as the Agent itself, calling different Firecrawl tools step-by-step to compile the final result.