Lesson 1 — What is Firecrawl: Web Data Infrastructure for AI — Mastering Firecrawl — The Ultimate Guide to AI Web Scraping

1.1 Positioning

Firecrawl is a web data infrastructure specifically designed for AI Agents and LLM applications. It is not just a simple web scraper, but a comprehensive platform for web data acquisition, transformation, and interaction, turning unstructured web content into clean data ready for AI consumption.

Core Value Propositions:

Search: Built-in search engine; one call returns both search results and page content.
Scrape: Convert any URL into Markdown, HTML, screenshots, or structured JSON.
Interact: Perform clicks, form fills, and scrolls in the browser to handle dynamic content.
Autonomous Research (Agent): Allows AI Agents to browse multiple sites autonomously for complex research tasks.
File Parsing (Parse): Directly convert local PDF, Word, and Excel files into structured data.

1.2 Core Capabilities at a Glance

Capability	API Endpoint	MCP Tool Name	Description
Scrape	`/v1/scrape`	`firecrawl_scrape`	Scrape a single page with JS rendering support
Search	`/v1/search`	`firecrawl_search`	Integrated search and scrape
Crawl	`/v1/crawl`	`firecrawl_crawl`	Batch deep scraping of entire sites
Map	`/v1/map`	`firecrawl_map`	Discover all URLs of a site
Extract	`/v1/extract`	`firecrawl_extract`	Structured multi-page extraction via LLM
Interact	`/v1/scrape` + interact	`firecrawl_interact`	Browser interaction after scraping
Parse	`/v1/parse`	`firecrawl_parse`	Local file parsing
Agent	`/v1/agent`	`firecrawl_agent`	Autonomous browsing research agent

1.3 Architecture Overview

Firecrawl's underlying architecture ensures stability under high concurrency and complex anti-scraping environments:

API Server (Express.js): Handles request dispatching, authentication, and routing.
Worker Queue (BullMQ/Redis): Manages asynchronous tasks like Crawl and Agent jobs.
Browser Engine (Playwright): A pool of headless browsers for JS rendering and interaction.
Proxy Pool: Built-in global residential proxies providing three levels of anti-scraping protection.

1.4 Use Cases

Scenario	Recommended Tool Combination
AI Agents getting real-time web info	Search → Scrape
Building RAG knowledge bases	Map → Crawl → Markdown
Competitor price monitoring	Extract + JSON Schema
Batch technical doc collection	Map (search) → Crawl
Scraping sites that require login	Scrape → Interact
Local PDF/Word doc parsing	Parse

1.5 Two "Agent" Concepts: Don't Get Confused

Within the Firecrawl ecosystem, there are two distinct types of Agents with very different responsibilities:

Feature	Firecrawl Agent (FIRE-1)	LLM Agent (e.g., Claude Code)
Location	Firecrawl Cloud Servers	Your local development environment
Decision Maker	Firecrawl AI decides what to search/visit	LLM decides which MCP tool to call
Execution Mode	One call completes all steps	Multiple calls in a loop
Billing	Dynamic billing (on-demand)	Standard API call billing

Key Difference:

Cloud Mode: Claude Code makes one call, and the Firecrawl Cloud Agent autonomously performs all searching and browsing.
Local Mode: Claude Code acts as the Agent itself, calling different Firecrawl tools step-by-step to compile the final result.