To compare how web data scraping providers handle Amazon review extraction, we tested 5 web scraping providers on the same set of Amazon product review URLs, totaling 2,500 requests across all providers.
Amazon reviews scraping benchmark
Read our benchmark methodology for more detail on our testing process.
Domain coverage and available metadata fields by provider
✅✅ Structured JSON: Provider returns parsed review data with named fields, ready to use without additional parsing.
✅ HTML: Provider returns rendered HTML.
Amazon reviews scraping benchmark results
Bright Data led with a 96% success rate on Amazon and returned the richest structured output of any provider, with 29 JSON fields per review. It was one of three providers that returned structured JSON on this domain, and the only one to include extended fields such as review images, variant details, and product-level rating breakdowns alongside the standard review data. On the 348 URLs where all four top providers succeeded, Bright Data consistently returned the most complete response.
Oxylabs achieved a 92% success rate on Amazon with the fastest completion time in the benchmark at 4s per request. It returned 10 structured JSON fields per review. The combination of high success rate and low latency made it the most efficient option on this domain.
Decodo recorded an 11% success rate on Amazon with an average completion time of 10s on the URLs it processed. Although it used a dedicated Amazon parser with structured JSON output, the API returned empty results for the vast majority of URLs. The successful responses came primarily from correct 404 detection rather than actual review extraction.
Zyte reached a 75% success rate on Amazon with an average completion time of 13s. It returned rendered HTML rather than structured data, with review fields extracted via CSS selectors. While the success rate was lower than the top group, it covered the majority of the test URLs without requiring a domain-specific configuration.
Nimble posted a 92% success rate on Amazon, matching Oxylabs, with an average completion time of 13s. It returned rendered HTML parsed with CSS selectors. The result was consistent across the URL set with no significant drops.
Amazon reviews benchmark methodology
We tested 5 web scraping providers on 500 Amazon product URLs. Each provider received the same set of URLs.
Providers and integration types
Three providers returned structured JSON with parsed review fields: Bright Data (29 fields), Oxylabs (10 fields), and Decodo (dedicated Amazon parser). Nimble and Zyte returned rendered HTML, which we parsed using CSS selectors to extract five standard review fields (reviewer_name, review_text, rating, review_date, review_title).
Validation
Each response went through a three-step validation:
- Submission: HTTP status code between 200-399 or 404 was required to pass.
- Execution: For async providers, the scraping job had to complete without timeout or error.
- Validation: The response had to contain usable review data. For JSON responses, this meant at least one review with a valid review_text (string) or rating (integer). For HTML responses, at least one CSS selector had to match and return review content.
Prior to the full benchmark, we sent each provider a set of intentionally broken URLs, confirmed 404 pages, and live pages with zero reviews. This allowed us to map how each provider communicates these edge cases, whether through explicit error codes, HTTP status, or empty response bodies. Pages identified as 404 or containing no reviews were counted as valid, since the provider correctly processed the request and returned an appropriate response.
We then applied a cross-provider verification step across the full results: when a provider returned empty output on a URL where at least one other provider extracted review data, that empty result was reclassified as a failure. This separated extraction failures from pages that had no reviews to return.
Completion time
Completion time was measured end-to-end from the initial API request to receiving the final response. For async providers, this includes the polling and wait time until results were ready.
Dataset
The 500 test URLs were selected from Amazon product pages with varying review counts and product categories. URLs were cleaned to remove invalid formats and duplicates before testing.
Shared configuration
All providers received identical URLs and were tested under the same conditions:
- Sequential execution: one request at a time, no parallel requests
- Delay between requests: 2 seconds
- Rate limit handling: 30-second wait with up to 3 retries on HTTP 429
- Submission timeout: 300 seconds
- Execution timeout: 600 seconds
- Each URL was tested once per provider
Provider configurations
Bright Data used the Dataset API with a dedicated Amazon Reviews dataset, returning structured JSON with 29 fields per review. The API was polled via the /progress/{snapshot_id} endpoint at 1-second intervals until ready.
Oxylabs used a dedicated Amazon source API (source: amazon) with structured JSON output, returning 10 fields per review.
Decodo used a dedicated Amazon parser (target: amazon, parse: true) with structured JSON output. Despite using a domain-specific configuration, the API returned empty results for most URLs.
Nimbleway used the Web API with render: true for JavaScript rendering. All requests returned rendered HTML parsed with CSS selectors.
Zyte used the Extract API with browserHtml: true, returning JavaScript-rendered HTML via a headless browser, parsed with CSS selectors.
FAQs about Amazon review scraping
Amazon review scraping is the automated extraction of customer review data from Amazon product pages, including review text, ratings, author details, and dates. It is commonly used for sentiment analysis, competitor monitoring, product research, and market analysis at scale.
Amazon uses rate limiting, CAPTCHAs, and browser fingerprinting to detect automated access. Scraping providers handle this through rotating residential proxies, headless browser rendering, and request throttling. Some providers offer dedicated Amazon APIs that manage these protections internally, while others use general-purpose unblockers that render the page and return HTML.
Most scraping APIs return between 10 and 30 reviews per request by default. Providers with dedicated Amazon APIs, such as Bright Data and Oxylabs, allow configuring the number of reviews per product through parameters like limit_multiple_results. HTML-based providers return whatever reviews are rendered on the page, which is typically the first page of reviews (around 10).
The providers tested in this benchmark extract reviews from publicly accessible product pages without authentication. Reviews that are only visible to logged-in users, such as certain Vine reviews or purchase-specific content, are not accessible through these APIs.
Be the first to comment
Your email address will not be published. All fields are required.