Quick Start

Get your first extraction running in under 2 minutes.

terminal
# 1. Get your API key from the dashboard export CRAWLO_KEY="crw_sk_..." # 2. Your first extraction curl -X POST https://api.crawlo.com/v3/extract \ -H "Authorization: Bearer $CRAWLO_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/data", "format": "json", "render_js": true }' # 3. Check job status curl https://api.crawlo.com/v3/jobs/crw_8f3a2b1c \ -H "Authorization: Bearer $CRAWLO_KEY" # 4. Download extracted data curl https://api.crawlo.com/v3/jobs/crw_8f3a2b1c/data \ -H "Authorization: Bearer $CRAWLO_KEY" \ -o output.json

Authentication

All requests require an API key passed via the Authorization header. Keys are generated from the control panel and come in two types:

  • Live keys (crw_sk_live_...) — production requests, billed against your plan.
  • Test keys (crw_sk_test_...) — sandbox environment, no billing, limited to 100 requests/day.
authentication header
Authorization: Bearer crw_sk_live_abc123...

Security: Never expose API keys in client-side code. Use environment variables or a secrets manager. Keys can be rotated from the dashboard at any time.

Core Endpoints

MethodEndpointDescription
POST/v3/extractSingle URL extraction
POST/v3/batchBatch extraction (up to 1,000 URLs)
GET/v3/jobs/{id}Get job status and metadata
GET/v3/jobs/{id}/dataDownload extracted data
GET/v3/jobsList recent jobs (paginated)
GET/v3/usageCurrent plan usage and quotas
POST/v3/webhooksRegister a delivery webhook
GET/v3/webhooksList configured webhooks
DELETE/v3/webhooks/{id}Remove a webhook

Extraction Parameters

POST /v3/extract
{ "url": "string", // Target URL (required) "format": "json|csv|xml", // Output format (default: json) "render_js": false, // Enable headless browser "proxy_type": "residential", // residential | datacenter "geo": "US", // ISO 3166-1 country code "timeout_ms": 30000, // Custom timeout (max: 120000) "wait_for": "selector", // CSS selector to wait for (JS only) "headers": { // Custom request headers "Accept-Language": "en-US" }, "delivery": { // Optional delivery config "method": "s3|gcs|webhook|api", "target": "s3://bucket/raw/" } }
ParameterTypeDefaultDescription
url requiredstringTarget URL to extract
format optionalstringjsonOutput format: json, csv, xml
render_js optionalbooleanfalseEnable Chromium headless rendering
proxy_type optionalstringdatacenterProxy pool: residential, datacenter
geo optionalstringautoProxy geolocation (ISO 3166-1)
timeout_ms optionalinteger30000Request timeout in milliseconds
wait_for optionalstringCSS selector to wait for (requires render_js)
headers optionalobjectCustom HTTP headers for the request
delivery optionalobjectapiDelivery method and target

Delivery Configuration

API (default)

Data is available for download via GET /v3/jobs/{id}/data for 72 hours after extraction.

Amazon S3

S3 delivery config
"delivery": { "method": "s3", "target": "s3://your-bucket/crawlo/raw/", "credentials_id": "cred_abc123" // configured in dashboard }

Google Cloud Storage

GCS delivery config
"delivery": { "method": "gcs", "target": "gs://your-bucket/crawlo/raw/", "credentials_id": "cred_xyz789" }

Webhook

Webhook delivery config
"delivery": { "method": "webhook", "target": "https://your-api.com/crawlo/callback" }

Response Format

Job created (202 Accepted)

202 response
{ "id": "crw_8f3a2b1c", "status": "processing", "created_at": "2025-02-08T10:30:00Z", "estimated_ms": 2500 }

Job completed

GET /v3/jobs/{id}
{ "id": "crw_8f3a2b1c", "status": "completed", "records_extracted": 48291, "format": "json", "delivery": "s3://bucket/raw/", "processing": "none", "latency_ms": 2340, "ttl_hours": 72, "expires_at": "2025-02-11T10:30:00Z" }

Possible status values: queued, processing, completed, failed, expired.

Error Handling

CodeMeaningAction
400Bad Request — invalid parametersCheck request body
401Unauthorized — invalid or missing API keyCheck Authorization header
403Forbidden — plan limit or AUP violationCheck plan quotas or AUP
404Not Found — job ID does not existVerify job ID
410Gone — data expired (72h TTL)Re-run extraction
429Too Many Requests — rate limit exceededWait for Retry-After header
500Internal Server ErrorRetry with exponential backoff
503Service Unavailable — maintenanceCheck status page
error response format
{ "error": { "code": "rate_limit_exceeded", "message": "Rate limit of 50 req/s exceeded", "retry_after": 1.2 } }

Webhooks

Webhooks send a POST request to your endpoint when a job completes or fails. The payload includes the full job object.

webhook payload
{ "event": "job.completed", "timestamp": "2025-02-08T10:32:40Z", "data": { "id": "crw_8f3a2b1c", "status": "completed", "records_extracted": 48291, "download_url": "https://api.crawlo.com/v3/jobs/crw_8f3a2b1c/data" } }

Webhooks include an X-Crawlo-Signature header for payload verification. Delivery timeout is 10 seconds with 3 automatic retries on failure.

SDKs

Official SDKs for major languages:

installation
# Python pip install crawlo # Node.js npm install @crawlo/sdk # PHP composer require crawlo/sdk

Python example

example.py
from crawlo import Crawlo client = Crawlo(api_key="crw_sk_live_...") job = client.extract( url="https://example.com/data", format="json", render_js=True, geo="US" ) # Wait for completion and get data data = job.wait().data() print(f"Extracted {job.records_extracted} records")

Node.js example

example.js
import { Crawlo } from '@crawlo/sdk'; const client = new Crawlo('crw_sk_live_...'); const job = await client.extract({ url: 'https://example.com/data', format: 'json', renderJs: true, geo: 'US' }); const data = await job.waitForCompletion(); console.log(`Extracted ${data.records_extracted} records`);

Postman Collection

Import our Postman collection to test all endpoints interactively. Includes pre-configured environments for test and live keys.

Download Postman Collection ↗

Need help? For technical questions, contact [email protected]. See Limits & Quotas for rate limits and plan details.