You can now crawl an entire website with a single API call using Browser Rendering's new /crawl endpoint, available in open beta. Submit a starting URL, and pages are automatically discovered, rendered in a headless browser, and returned in multiple formats, including HTML, Markdown, and structured JSON. This is great for training models, building RAG pipelines, and researching or monitoring content across a site.
Crawl jobs run asynchronously. You submit a URL, receive a job ID, and check back for results as pages are processed.
# Initiate a crawl
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://blog.cloudflare.com/"
}'
# Check results
curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}' \
-H 'Authorization: Bearer <apiToken>'
Key features:
modifiedSince and maxAge to skip pages that haven't changed or were recently fetched, saving time and cost on repeated crawlsrender: false to fetch static HTML without spinning up a browser, for faster crawling of static sitesrobots.txt directives, including crawl-delayAvailable on both the Workers Free and Paid plans.
To get started, refer to the crawl endpoint documentation. If you are setting up your own site to be crawled, review the robots.txt and sitemaps best practices.