Contact Us
No results found.

Review Scraping Benchmark: Bright Data, Oxylabs & Decodo

Nazlı Şipi
Nazlı Şipi
updated on Apr 15, 2026

We tested 5 web scraping providers across 5 major review platforms for a total of 12,500 requests, and measured success rate, completion time, and metadata fields.

Review scraping benchmark

You can read benchmark methodology section for more details on the testing process.

Domain coverage by provider

  • ✅ = supported, returns HTML
  • ✅ ✅ = supported, returns structured data

Review scraping performance by domain

Available metadata fields for providers with structured JSON responses

Review scraping benchmark results

Bright Data achieved the highest average success rate at 78% across all five review platforms and was the only provider to return structured JSON on four of them: Amazon, Google Maps, Trustpilot, and Yelp. It led on Amazon (96%) and Trustpilot (98%), delivering up to 39 metadata fields per review including verification status, reviewer location, and owner responses. Google Maps was its weakest domain at 39%, though most providers also failed on this domain due to JavaScript-rendered review content.

Oxylabs was the fastest provider in the benchmark at 5s average completion time, significantly ahead of the next closest at 13s. It posted high results on Trustpilot (98%) and Tripadvisor (91%), and matched the top tier on Amazon (92%) with 10 structured JSON fields. It did not return results on Google Maps or Yelp, where it lacked dedicated scraping configurations for these platforms.

Decodo scored 93% on Trustpilot and 76% on Tripadvisor using its unblocker proxy, demonstrating solid performance on server-rendered review pages. However, it recorded 0% on both Google Maps and Yelp, and only 11% on Amazon despite using a structured API endpoint. Its coverage is limited to two of the five tested platforms, making it the narrowest option in the benchmark for review scraping.

Zyte was one of only two providers to return results on all five platforms, finishing with a 65% average success rate. It performed best on Tripadvisor (86%) and Yelp (57%), maintaining steady extraction across domains. Google Maps was a relative bright spot at 41%, one of the higher scores on a domain where most providers failed. All extraction was HTML-based with CSS selector parsing, so no structured metadata fields were returned beyond the five standard review fields.

Nimble reached 92% on Amazon and 66% on Trustpilot, showing it can handle structured review pages effectively. However, performance dropped to 1% on Google Maps and 31% on Yelp, where JavaScript-heavy rendering limited its HTML-based extraction. Its 52% overall average reflects this uneven platform support, with completion times averaging 20s.

Review scraping benchmark methodology

We tested 5 web scraping API providers across 5 review platforms selected from the Tranco top-ranked review domains: Amazon, Google Maps, Tripadvisor, Trustpilot, and Yelp. Each provider received the same set of 2,500 URLs (500 per platform), and we measured three metrics: success rate, completion time, and available metadata fields.

Providers and integration types

Providers were integrated using two approaches depending on the platform:

  • JSON structured API: The provider returns parsed review data in JSON format with named fields (e.g., reviewer_name, rating, review_text). Bright Data and Oxylabs offered this for select platforms.
  • HTML response: The provider returns rendered HTML, which we parsed using CSS selectors to extract review fields. Decodo, Nimble, and Zyte primarily used this approach.

Note: Decodo returned a JSON structured response for Amazon, but none of the responses contained successful review data. Its 11% success rate on Amazon came entirely from correct 404 detection, so no metadata fields are reported for that combination.

Validation

Each response went through a three-step validation:

  1. Submission: HTTP status code between 200-399 or 404 was required to pass.
  2. Execution: For async providers, the scraping job had to complete without timeout or error.
  3. Validation: The response had to contain usable review data.
    • For JSON responses: at least one review with a valid review_text (string) or rating (integer).
    • For HTML responses: at least one CSS selector match returning review content.

Before running the full benchmark, we tested each provider with intentionally broken URLs, confirmed 404 pages, and live pages with zero reviews to map how each provider signals these edge cases. Providers returned different indicators depending on their implementation, including explicit error codes, HTTP 404 status, or empty response bodies.

When a provider correctly identified a page as not found or returned an appropriate response for a page with no reviews, the result was counted as valid. We then applied a cross-provider verification step: if a provider returned empty results on a URL where at least one other provider extracted review data, that empty result was reclassified as a failure. This separated extraction failures from pages that simply had no reviews to return.

Completion time

Completion time was measured end-to-end from the initial API request to receiving the final response. For async providers (e.g., Bright Data dataset API), this includes the polling/wait time until results were ready.

Available metadata fields

For providers returning structured JSON, we counted the total number of unique fields returned across all reviews. For HTML-based responses, the metadata count reflects the fixed set of CSS selector fields used for extraction (5 fields: reviewer_name, review_text, rating, review_date, review_title).

Dataset

The 2,500 test URLs were collected from publicly accessible review pages across the five Tranco top-ranked review platforms. URLs were cleaned to remove locale parameters, invalid formats, and duplicates before testing.

Shared configuration

All providers received identical URLs from the same dataset and were tested under the same conditions:

  • Sequential execution: one request at a time, no parallel requests
  • Delay between requests: 2 seconds
  • Rate limit handling: 30-second wait with up to 3 retries on HTTP 429
  • Submission timeout: 300 seconds
  • Execution timeout: 600 seconds
  • Each URL was tested once per provider

Provider configurations

Bright Data

Bright Data used two integration methods depending on the domain. For Amazon, Google Maps, Trustpilot, and Yelp, we used the Dataset API, which returns structured JSON with parsed fields. For Tripadvisor, we used a web unblocker that returns rendered HTML, which we parsed locally with CSS selectors.

The Dataset API was polled via the /progress/{snapshot_id} endpoint at 1-second intervals until the status reached ‘ready’. Results were then fetched from the /snapshot/{snapshot_id} endpoint.

Decodo

Decodo used the Universal Scraper API for Amazon. For Google Maps, Tripadvisor, Trustpilot, and Yelp, we used the web unblocker with the X-SU-Headless: HTML header for JavaScript rendering. All requests included a desktop User-Agent header.

Oxylabs

Oxylabs used a dedicated source API for Amazon (source: amazon_reviews) with structured JSON output. For Google Maps, Tripadvisor, Trustpilot, and Yelp, we used the Web Unblocker proxy. Unblocker requests included a desktop User-Agent header.

Nimble

Nimble used the Web API for all domains with render: true for JavaScript rendering. All requests returned rendered HTML, which we parsed with CSS selectors. No domain-specific configuration was applied.

Zyte

Zyte used the Extract API for all domains with browserHtml: true, which returns JavaScript-rendered HTML via a headless browser. No domain-specific configuration was applied.

FAQs about review scraping

Manual product review scraping is slow and incomplete. Scraping customer reviews using automated tools allows you to extract hundreds or thousands of reviews in minutes.

This saves time and ensures your data collection process captures both positive and negative reviews.

Scraped reviews provide valuable customer insights for market research. Companies can track customer concerns, measure customer loyalty, and analyze customer preferences over time.

Most review platforms set restrictions on automated data extraction. Running web scrapers too aggressively can trigger CAPTCHA, IP blocks, or bans.

To reduce risks, use a respectful automated process with rate limits, random delays, and residential proxies if needed.

Typical fields include review text, star ratings, user names, dates, and metadata. Some setups also track structured data like location, product category, or business type.

You can collect customer reviews from various websites, including e-commerce platforms, social media networks, and popular platforms like Amazon, Walmart, Yelp, Google Play, and Trustpilot.

AI Researcher
Nazlı Şipi
Nazlı Şipi
AI Researcher
Nazlı is a data analyst at AIMultiple. She has prior experience in data analysis across various industries, where she worked on transforming complex datasets into actionable insights.
View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

0/450