DocumentationAPI DetailsScrape URLGetting Started

Getting Started

ScrAPI can be used via a GET or a POST request to https://api.scrapi.tech/v1/scrape

The only required parameter is the url that you want to perform a web scraping operation against. The most basic looking call would be something like this: https://api.scrapi.tech/v1/scrape?url=http://deventerprise.com (click it to view the raw JSON result).

POST is preferred since the JSON request body is more expressive and supports advanced options such as browser commands.

API Keys

When using the test API key (or no API key) you are limited to one concurrent call and twenty free calls per day with minimal queuing capabilities.

If you have purchased a subscription you will have received an email with your API key. You can also retrieve your API key and manage your subscription by logging into the subscriber area.

To use your API key you can either add the apiKey parameter to the URL (https://api.scrapi.tech/v1/scrape?apiKey=01234567-8901-2345-6789-0ABCDEF01234) or supply a X-API-KEY header with your API key as the header value to your request. The simplest option in most cases is just adding the URL parameter.

OpenAPI Specification

https://api.scrapi.tech/scalar/v1

Parameter Details

The API supports the following parameters in the body or as URL parameters.

NameData TypeDescription
urlString (URL)(Required) URL to scrape for content.
useBrowserBoolean(Optional) Indicate whether you want to use a full headless browser (will execute JavaScript) or just a regular HTTP client call to retrieve the response (faster but no JavaScript).
solveCaptchasBoolean(Optional) Indicate whether you want to attempt to solve any captchas detected on the page. Will automatically enable useBrowser.
proxyTypeString or Number(Optional) The type of the proxy to use for the scrape request.
proxyCountryString (3)(Optional) The proxy country to use if you require the scrape request to come from.
customProxyUrlString (URL)(Optional) A custom proxy URL to use with the format protocol://username:password@host:port.
requestMethodString(Optional) The type of request method that will be used (defaults to GET). This option is only available when useBrowser is disabled.
responseFormatString or Number(Optional) The type of API response (default to JSON). Refer to the HTML/Markdown response section.
callbackUrlString (URL)(Optional) A URL that the response data will POST to once complete.
cookiesString(Optional) A key/value pair list of cookies to provide in the scrape request (format: cookie1=value1;cookie2=value2;).
headersString(Optional) A key/value pair list of headers to provide in the scrape request (format: header1=value1;header2=value2;).
sessionIdString(Optional) Using a session identifier will reuse the same contextual information for multiple requests. This means the same IP address, user agent and any cookies collected will apply on each request. Using sessions is useful to avoid having bypass multiple captchas because the request does not change and clearance cookies will remain.
acceptDialogsBoolean(Optional) By default all popup dialogs are cancelled, if that it not the expected behavior you can choose to accept any popup dialogs instead.
browserCommandsArray(Optional) List of browser commands to execute once the web page has loaded. Refer to the browser commands section.

Browser commands are not available using this method. Use the POST method for more advanced features.

JSON Request (POST)

The following request data provides the URL, headers, cookies and any flags you want to set in order to scrape a web page successfully.

{
  "url": "https://deventerprise.com",
  "useBrowser": false,
  "solveCaptchas": false,
  "proxyType": null,
  "proxyCountry": null,
  "customProxyUrl": null,
  "requestMethod": "GET",
  "responseFormat": "JSON",
  "callbackUrl": null,
  "cookies": {},
  "headers": {},
  "sessionId": null,
  "acceptDialogs": false,
  "browserCommands": []
}

JSON Response

The response data contains all the result information about your request including the HTML data, headers and any cookies. The example below shows all the possible properties that can be returned in the response data. This is the default and response data can also be return in HTML or Markdown if you prefer.

{
  "requestUrl": "https://www.deventerprise.com",
  "responseUrl": "https://deventerprise.com",
  "duration": "00:00:01.000000",
  "attempts": 1,
  "errorMessages": [
    "Browser Command Click Timeout - Could not find input target #click_me"
  ],
  "captchasSolved": {
    "reCaptchaV2": 1
  },
  "creditsUsed": 35,
  "statusCode": 200,
  "cookies": {
    "session-id": "12345...",
    "timezone": "UTC"
  },
  "headers": {
    "content-encoding": "br",
    "content-type": "text/html; charset=utf-8"
  },
  "content": "<html>...</html>"
}

Refer to the specific sections on each feature to understand how to use it correctly or head to the Playground to start test driving all the features and generating code.