Lesson 9 — Parse and Agent: From Local Files to Autonomous Research
Firecrawl doesn't just handle online webpages; it can also deeply parse local documents and even act as your autonomous AI research assistant.
9.1 Parse: Structured Parsing for Local Files
The Parse tool allows you to convert unstructured local documents directly into AI-readable Markdown or structured JSON.
Supported Formats:
- PDF (Most common)
- Word (.docx, .doc)
- Excel (.xlsx, .xls)
- HTML/RTF
Core Feature: Structured PDF Extraction
For contracts or financial reports, Parse's Extract mode is incredibly powerful:
{
"filePath": "/path/to/contract.pdf",
"formats": ["json"],
"jsonOptions": {
"prompt": "Extract the names of both parties, the start date, and the total amount",
"schema": { ... }
}
}
Tip: When parsing large PDFs, be sure to set the
maxPagesparameter to prevent Token overflow.
9.2 Agent (FIRE-1): Autonomous Research Assistant
The Agent is an advanced cloud-only feature of Firecrawl (executed asynchronously). You provide a research topic, and it automatically:
- Searches for relevant webpages.
- Browses multiple pages to extract information.
- Summarizes and outputs the final result.
Use Cases:
- "Research and compare the pricing strategies of Firecrawl and Tavily."
- "Summarize the functional differences of the most popular AI coding assistants in 2026."
9.3 Agent Asynchronous Workflow
Since Agent research tasks typically take 2–5 minutes, the process is as follows:
- Start Task: Call
firecrawl_agentto receive aJob ID. - Poll Status: Call
firecrawl_agent_statusevery 30 seconds. - Retrieve Data: Once the task is complete, get the result from the
datafield in the response.
9.4 Agent vs Other Tools
| Your Need | Recommended Tool |
|---|---|
| Know which URL to get data from | Scrape |
| Know what keyword to search for summaries | Search |
| Vague research goal requiring multi-site data | Agent |
| Processing PDF reports on your computer | Parse |