Documentation

API Reference

Crawlo API v3 · Base URL: https://api.crawlo.com/v3

Quick Start

Get your first extraction running in under 2 minutes.

terminal
# 1. Get your API key from the dashboard
export CRAWLO_KEY="crw_sk_..."

# 2. Your first extraction
curl -X POST https://api.crawlo.com/v3/extract \
  -H "Authorization: Bearer $CRAWLO_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/data",
    "format": "json",
    "render_js": true
  }'

# 3. Check job status
curl https://api.crawlo.com/v3/jobs/crw_8f3a2b1c \
  -H "Authorization: Bearer $CRAWLO_KEY"

# 4. Download extracted data
curl https://api.crawlo.com/v3/jobs/crw_8f3a2b1c/data \
  -H "Authorization: Bearer $CRAWLO_KEY" \
  -o output.json

Authentication

All requests require an API key passed via the Authorization header. Keys are generated from the control panel and come in two types:

Live keys (crw_sk_live_...) — production requests, billed against your plan.
Test keys (crw_sk_test_...) — sandbox environment, no billing, limited to 100 requests/day.

authentication header
Authorization: Bearer crw_sk_live_abc123...

Security: Never expose API keys in client-side code. Use environment variables or a secrets manager. Keys can be rotated from the dashboard at any time.

Core Endpoints

Method	Endpoint	Description
POST	`/v3/extract`	Single URL extraction
POST	`/v3/batch`	Batch extraction (up to 1,000 URLs)
GET	`/v3/jobs/{id}`	Get job status and metadata
GET	`/v3/jobs/{id}/data`	Download extracted data
GET	`/v3/jobs`	List recent jobs (paginated)
GET	`/v3/usage`	Current plan usage and quotas
POST	`/v3/webhooks`	Register a delivery webhook
GET	`/v3/webhooks`	List configured webhooks
DELETE	`/v3/webhooks/{id}`	Remove a webhook

Extraction Parameters

POST /v3/extract
{
  "url": "string",              // Target URL (required)
  "format": "json|csv|xml",    // Output format (default: json)
  "render_js": false,           // Enable headless browser
  "proxy_type": "residential",  // residential | datacenter
  "geo": "US",                  // ISO 3166-1 country code
  "timeout_ms": 30000,          // Custom timeout (max: 120000)
  "wait_for": "selector",      // CSS selector to wait for (JS only)
  "headers": {                  // Custom request headers
    "Accept-Language": "en-US"
  },
  "delivery": {                 // Optional delivery config
    "method": "s3|gcs|webhook|api",
    "target": "s3://bucket/raw/"
  }
}

Parameter	Type	Default	Description
`url` required	string	—	Target URL to extract
`format` optional	string	json	Output format: json, csv, xml
`render_js` optional	boolean	false	Enable Chromium headless rendering
`proxy_type` optional	string	datacenter	Proxy pool: residential, datacenter
`geo` optional	string	auto	Proxy geolocation (ISO 3166-1)
`timeout_ms` optional	integer	30000	Request timeout in milliseconds
`wait_for` optional	string	—	CSS selector to wait for (requires render_js)
`headers` optional	object	—	Custom HTTP headers for the request
`delivery` optional	object	api	Delivery method and target

Delivery Configuration

API (default)

Data is available for download via GET /v3/jobs/{id}/data for 72 hours after extraction.

Amazon S3

S3 delivery config
"delivery": {
  "method": "s3",
  "target": "s3://your-bucket/crawlo/raw/",
  "credentials_id": "cred_abc123"  // configured in dashboard
}

Google Cloud Storage

GCS delivery config
"delivery": {
  "method": "gcs",
  "target": "gs://your-bucket/crawlo/raw/",
  "credentials_id": "cred_xyz789"
}

Webhook

Webhook delivery config
"delivery": {
  "method": "webhook",
  "target": "https://your-api.com/crawlo/callback"
}

Response Format

Job created (202 Accepted)

202 response
{
  "id": "crw_8f3a2b1c",
  "status": "processing",
  "created_at": "2025-02-08T10:30:00Z",
  "estimated_ms": 2500
}

Job completed

GET /v3/jobs/{id}
{
  "id": "crw_8f3a2b1c",
  "status": "completed",
  "records_extracted": 48291,
  "format": "json",
  "delivery": "s3://bucket/raw/",
  "processing": "none",
  "latency_ms": 2340,
  "ttl_hours": 72,
  "expires_at": "2025-02-11T10:30:00Z"
}

Possible status values: queued, processing, completed, failed, expired.

Error Handling

Code	Meaning	Action
`400`	Bad Request — invalid parameters	Check request body
`401`	Unauthorized — invalid or missing API key	Check Authorization header
`403`	Forbidden — plan limit or AUP violation	Check plan quotas or AUP
`404`	Not Found — job ID does not exist	Verify job ID
`410`	Gone — data expired (72h TTL)	Re-run extraction
`429`	Too Many Requests — rate limit exceeded	Wait for `Retry-After` header
`500`	Internal Server Error	Retry with exponential backoff
`503`	Service Unavailable — maintenance	Check status page

error response format
{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit of 50 req/s exceeded",
    "retry_after": 1.2
  }
}

Webhooks

Webhooks send a POST request to your endpoint when a job completes or fails. The payload includes the full job object.

webhook payload
{
  "event": "job.completed",
  "timestamp": "2025-02-08T10:32:40Z",
  "data": {
    "id": "crw_8f3a2b1c",
    "status": "completed",
    "records_extracted": 48291,
    "download_url": "https://api.crawlo.com/v3/jobs/crw_8f3a2b1c/data"
  }
}

Webhooks include an X-Crawlo-Signature header for payload verification. Delivery timeout is 10 seconds with 3 automatic retries on failure.

SDKs

Official SDKs for major languages:

installation
# Python
pip install crawlo

# Node.js
npm install @crawlo/sdk

# PHP
composer require crawlo/sdk

Python example

example.py
from crawlo import Crawlo

client = Crawlo(api_key="crw_sk_live_...")

job = client.extract(
    url="https://example.com/data",
    format="json",
    render_js=True,
    geo="US"
)

# Wait for completion and get data
data = job.wait().data()
print(f"Extracted {job.records_extracted} records")

Node.js example

example.js
import { Crawlo } from '@crawlo/sdk';

const client = new Crawlo('crw_sk_live_...');

const job = await client.extract({
  url: 'https://example.com/data',
  format: 'json',
  renderJs: true,
  geo: 'US'
});

const data = await job.waitForCompletion();
console.log(`Extracted ${data.records_extracted} records`);

Postman Collection

Import our Postman collection to test all endpoints interactively. Includes pre-configured environments for test and live keys.

Download Postman Collection ↗

Need help? For technical questions, contact [email protected]. See Limits & Quotas for rate limits and plan details.