# ScrAPI

> Privacy-first web scraping API with browser automation, captcha solving, proxy rotation, and AI integration

https://scrapi.tech

## Overview

ScrAPI is a privacy-first, cloud-based web scraping API designed for developers who need reliable, scalable data extraction from any website. It handles the hard parts of web scraping — bot detection bypass, CAPTCHA solving, JavaScript rendering, IP rotation, and geotargeting — so you can focus on your data. ScrAPI does not log, store, or sell any data extracted through the service.

## API Endpoint

Primary endpoint: `https://api.scrapi.tech/v1/scrape`
- Supports GET (basic scraping) and POST (advanced options including browser commands)
- Authentication: API key via `apiKey` URL parameter or `X-API-KEY` header
- OpenAPI specification: https://api.scrapi.tech/scalar/v1
- Interactive playground: https://scrapi.tech/playground

## Core Capabilities

### Scraping Features
- **Universal website scraping**: Extract data from any URL (static or dynamic)
- **Real browser rendering**: Optional headless browser with full JavaScript execution for SPAs, React, Vue, Angular sites
- **Multiple output formats**: JSON (default), raw HTML, or clean Markdown responses — Markdown is ideal for LLM and RAG pipeline consumption
- **Response selectors**: Extract specific content using CSS selectors or XPath queries to reduce payload size
- **Unlimited concurrency & bandwidth**: No arbitrary limits on paid plans
- **Automatic retries**: Smart retry logic on transient failures

### Browser Automation Commands
- **Click elements**: `{ "click": "#buttonId" }` — Automate button clicks and navigation
- **Fill forms**: `{ "input": { "input[name='email']": "value" } }` — Enter text with human-like typing
- **Select dropdowns**: `{ "select": { "select[name='country']": "USA" } }` — Choose dropdown options
- **Scroll pages**: `{ "scroll": 1000 }` — Scroll to trigger lazy-loaded content
- **Wait for elements**: `{ "waitfor": "#results" }` — Delay until specific elements appear
- **Execute JavaScript**: `{ "javascript": "console.log('hello')" }` — Run arbitrary JS in browser context
- **Timed waits**: `{ "wait": 5000 }` — Pause execution in milliseconds (max 15000)
- **Human-like behavior**: All commands execute with realistic mouse movements, variable typing speed, and natural delays
- **Complex workflows**: Chain multiple commands for login flows, multi-step processes, cookie consent, pagination

### Anti-Detection & Privacy
- **Advanced bot detection bypass**: State-of-the-art techniques to defeat anti-bot systems
- **Automatic CAPTCHA solving**: Supports reCAPTCHA v2/v3 (click, invisible, enterprise), hCaptcha (normal, invisible), and Cloudflare (Turnstile, challenge pages)
- **Ads and trackers disabled**: Removes ads and web trackers from browser sessions
- **No data logging**: Extracted content is never stored, inspected, or sold
- **Anti-fingerprinting**: Realistic browser fingerprints, viewports, and user agents

### Proxy Options
- **Free proxy**: Random anonymous proxies for testing (0 extra credits)
- **Data center proxy**: Fast, reliable proxy pool (5 extra credits per request)
- **Residential proxy**: Premium proxies with widest geolocation coverage (10 extra credits per request)
- **Tor proxy**: Access .onion hidden services and maximum anonymity (1 extra credit per request)
- **Custom proxy**: Use your own proxy infrastructure via `customProxyUrl` (0 extra credits)
- **Geotargeting**: Route requests through specific countries (ISO 3166-1 alpha-3 codes) and cities using `proxyCountry` and `proxyCity`
- **Automatic IP rotation**: Built-in rotation with 10-minute persistence when using sessions

### Visual Capture
- **Screenshots**: Capture full-page PNG screenshots of the final rendered page (2 extra credits)
- **PDF generation**: Convert any web page to a downloadable PDF document (2 extra credits)
- **Video recording**: Record the full browser session as a WEBM video (3 extra credits)
- All capture files are stored temporarily with download URLs in the response

### Session Management
- **Persistent sessions**: Use `sessionId` to maintain the same IP address, user agent, and cookies across multiple requests
- **Cookie forwarding**: Response cookies are returned and can be reused; sessions handle this automatically
- **Custom cookies and headers**: Set request-specific authentication tokens, session IDs, or site configuration

### Integration Options
- **RESTful API**: Standard HTTP interface for any programming language
- **Webhooks/callbacks**: Async scraping with results POSTed to your endpoint; includes status polling via reference ID
- **Official .NET SDK**: Available via NuGet (`dotnet add package ScrAPI`)
- **Auto-generated SDKs**: Go, Java, PHP, Python, Ruby, TypeScript, Dart (via Microsoft Kiota from OpenAPI spec)
- **MCP Server**: Model Context Protocol server for AI agent integration — available via Docker, NPX, or cloud (SSE/HTTP)
- **Interactive playground**: Test requests and generate code samples at https://scrapi.tech/playground

## Request Parameters

**Required:**
- `url` (string): Target website URL

**Optional:**
- `useBrowser` (boolean): Enable headless browser for JavaScript rendering (5 credits)
- `solveCaptchas` (boolean): Automatically detect and solve CAPTCHAs (30 credits per captcha; enables browser)
- `includeScreenshot` (boolean): Capture PNG screenshot of final page (2 credits; enables browser)
- `includePdf` (boolean): Generate PDF of final page (2 credits; enables browser)
- `includeVideo` (boolean): Record WEBM video of browser session (3 credits; enables browser)
- `proxyType` (string): Proxy type — `Free`, `DataCenter`, `Residential`, `Tor`
- `proxyCountry` (string): ISO 3166-1 alpha-3 country code for geotargeting (requires proxyType)
- `proxyCity` (string): City name for fine-grained geotargeting
- `customProxyUrl` (string): Your own proxy URL (`protocol://username:password@host:port`)
- `requestMethod` (string): HTTP method — GET (default), POST, PUT, DELETE, HEAD, PATCH (non-browser only)
- `responseFormat` (string): Response format — `JSON` (default), `HTML`, or `Markdown`
- `responseSelector` (string): CSS selector or XPath query to extract specific content
- `callbackUrl` (string): Webhook URL to receive async results via POST
- `cookies` (object/string): Custom cookies as key/value pairs
- `headers` (object/string): Custom HTTP headers as key/value pairs
- `sessionId` (string): Session identifier for persistent state across requests
- `acceptDialogs` (boolean): Accept popup dialogs instead of cancelling (default: false)
- `browserCommands` (array): Browser automation commands — POST method only

## Credit System

ScrAPI uses credit-based billing. Credits are additive per request. No credits consumed on failures.

| Feature | Credits |
|---|---|
| HTTP client (no browser) | 1 |
| Real browser | 5 |
| Screenshot | 2 |
| PDF generation | 2 |
| Video recording | 3 |
| Data center proxy | 5 |
| Residential proxy | 10 |
| Tor proxy | 1 |
| Captcha solved | 30 per captcha |
| Free proxy | 0 |
| Custom proxy | 0 |

## Use Cases

- Scraping sites with anti-bot protection and bot detection
- Extracting data from JavaScript-heavy single-page applications (SPAs)
- Price monitoring and competitive intelligence across regions
- RAG (Retrieval-Augmented Generation) pipeline content ingestion
- AI agent web research via MCP Server integration
- Content aggregation and summarization using Markdown output
- Form automation and login workflows
- Market research and data analytics
- Visual regression monitoring with screenshots
- Report generation and archival with PDF capture
- Multi-step web crawling with session persistence
- Ad verification and geotargeted content validation
- Dark web research via Tor proxy access

## Pricing

- **Free/Test Key**: 1 concurrent request, 20 requests/day
- **Pay-as-you-go**: Credits never expire, flexible for small projects
- **Subscription plans**: Regular usage with unlimited concurrency
- **Enterprise**: Custom solutions with dedicated support

## Documentation Resources

- Main documentation: https://scrapi.tech/docs
- Getting started guide: https://scrapi.tech/docs/api_details/v1_scrape
- Credit usage and pricing: https://scrapi.tech/docs/credits
- Real browser rendering: https://scrapi.tech/docs/api_details/v1_scrape/use_browser
- Browser commands: https://scrapi.tech/docs/api_details/v1_scrape/browser_commands
- Captcha solving: https://scrapi.tech/docs/api_details/v1_scrape/solve_captchas
- Proxy options: https://scrapi.tech/docs/api_details/v1_scrape/free_proxy
- Geotargeting: https://scrapi.tech/docs/api_details/v1_scrape/geotargeting
- Response formats: https://scrapi.tech/docs/api_details/v1_scrape/html_markdown_response
- Response selectors: https://scrapi.tech/docs/api_details/v1_scrape/response_selector
- Screenshots, PDF, video: https://scrapi.tech/docs/api_details/v1_scrape/screenshot
- Cookies and headers: https://scrapi.tech/docs/api_details/v1_scrape/cookies_headers
- Webhooks/callbacks: https://scrapi.tech/docs/api_details/v1_scrape/callback_webhook
- Sessions: https://scrapi.tech/docs/api_details/v1_scrape/session_id
- MCP Server: https://scrapi.tech/docs/mcp_server
- SDK / API clients: https://scrapi.tech/docs/sdk_client
- Available countries: https://scrapi.tech/docs/api_details/available_countries
- Available cities: https://scrapi.tech/docs/api_details/available_cities
- Credit balance API: https://scrapi.tech/docs/api_details/credit_balance
- OpenAPI reference: https://api.scrapi.tech/scalar/v1

## Support

Contact: hello@scrapi.tech
Website: https://scrapi.tech