Getting Started
ScrAPI can be used via a GET
or a POST
request to https://api.scrapi.tech/v1/scrape
The only required parameter is the url
that you want to perform a web scraping operation against. The most basic looking call would be something like this: https://api.scrapi.tech/v1/scrape?url=http://deventerprise.com (click it to view the raw JSON result).
POST
is preferred since the JSON request body is more expressive and supports advanced options such as browser commands.
API Keys
When using the test API key (or no API key) you are limited to one concurrent call and twenty free calls per day with minimal queuing capabilities.
If you have purchased a subscription you will have received an email with your API key. You can also retrieve your API key and manage your subscription by logging into the subscriber area.
To use your API key you can either add the apiKey
parameter to the URL (https://api.scrapi.tech/v1/scrape?apiKey=01234567-8901-2345-6789-0ABCDEF01234) or supply a X-API-KEY
header with your API key as the header value to your request. The simplest option in most cases is just adding the URL parameter.
OpenAPI Specification
https://api.scrapi.tech/scalar/v1
Parameter Details
The API supports the following parameters in the body or as URL parameters.
Name | Data Type | Description |
---|---|---|
url | String (URL) | (Required) URL to scrape for content. |
useBrowser | Boolean | (Optional) Indicate whether you want to use a full headless browser (will execute JavaScript) or just a regular HTTP client call to retrieve the response (faster but no JavaScript). |
solveCaptchas | Boolean | (Optional) Indicate whether you want to attempt to solve any captchas detected on the page. Will automatically enable useBrowser . |
proxyType | String or Number | (Optional) The type of the proxy to use for the scrape request. |
proxyCountry | String (3) | (Optional) The proxy country to use if you require the scrape request to come from. |
customProxyUrl | String (URL) | (Optional) A custom proxy URL to use with the format protocol://username:password@host:port . |
requestMethod | String | (Optional) The type of request method that will be used (defaults to GET). This option is only available when useBrowser is disabled. |
responseFormat | String or Number | (Optional) The type of API response (default to JSON). Refer to the HTML/Markdown response section. |
callbackUrl | String (URL) | (Optional) A URL that the response data will POST to once complete. |
cookies | String | (Optional) A key/value pair list of cookies to provide in the scrape request (format: cookie1=value1;cookie2=value2;). |
headers | String | (Optional) A key/value pair list of headers to provide in the scrape request (format: header1=value1;header2=value2;). |
sessionId | String | (Optional) Using a session identifier will reuse the same contextual information for multiple requests. This means the same IP address, user agent and any cookies collected will apply on each request. Using sessions is useful to avoid having bypass multiple captchas because the request does not change and clearance cookies will remain. |
acceptDialogs | Boolean | (Optional) By default all popup dialogs are cancelled, if that it not the expected behavior you can choose to accept any popup dialogs instead. |
browserCommands | Array | (Optional) List of browser commands to execute once the web page has loaded. Refer to the browser commands section. |
Browser commands are not available using this method. Use the POST method for more advanced features.
JSON Request (POST)
The following request data provides the URL, headers, cookies and any flags you want to set in order to scrape a web page successfully.
{
"url": "https://deventerprise.com",
"useBrowser": false,
"solveCaptchas": false,
"proxyType": null,
"proxyCountry": null,
"customProxyUrl": null,
"requestMethod": "GET",
"responseFormat": "JSON",
"callbackUrl": null,
"cookies": {},
"headers": {},
"sessionId": null,
"acceptDialogs": false,
"browserCommands": []
}
JSON Response
The response data contains all the result information about your request including the HTML data, headers and any cookies. The example below shows all the possible properties that can be returned in the response data. This is the default and response data can also be return in HTML or Markdown if you prefer.
{
"requestUrl": "https://www.deventerprise.com",
"responseUrl": "https://deventerprise.com",
"duration": "00:00:01.000000",
"attempts": 1,
"errorMessages": [
"Browser Command Click Timeout - Could not find input target #click_me"
],
"captchasSolved": {
"reCaptchaV2": 1
},
"creditsUsed": 35,
"statusCode": 200,
"cookies": {
"session-id": "12345...",
"timezone": "UTC"
},
"headers": {
"content-encoding": "br",
"content-type": "text/html; charset=utf-8"
},
"content": "<html>...</html>"
}
Refer to the specific sections on each feature to understand how to use it correctly or head to the Playground to start test driving all the features and generating code.