Data Web Data Scraping E-commerce Scraping

Review Scraping Benchmark: Bright Data, Oxylabs & Decodo

updated on Apr 16, 2026

We tested 5 web scraping providers across 5 major review platforms for a total of 12,500 requests, and measured success rate, completion time, and metadata fields.

Review scraping benchmark

You can read benchmark methodology section for more details on the testing process.

Domain coverage by provider

✅ = supported, returns HTML
✅ ✅ = supported, returns structured data

Review scraping performance by domain

Available metadata fields for providers with structured JSON responses

Review scraping providers & benchmark results

Bright Data achieved the highest average success rate at 78% across all five review platforms and was the only provider to return structured JSON on four of them: Amazon, Google Maps, Trustpilot, and Yelp. It led on Amazon (96%) and Trustpilot (98%), delivering up to 39 metadata fields per review including verification status, reviewer location, and owner responses. Google Maps was its weakest domain at 39%, though most providers also failed on this domain due to JavaScript-rendered review content.

Oxylabs was the fastest provider in the benchmark at 5s average completion time, significantly ahead of the next closest at 13s. It posted high results on Trustpilot (98%) and Tripadvisor (91%), and matched the top tier on Amazon (92%) with 10 structured JSON fields. It did not return results on Google Maps or Yelp, where it lacked dedicated scraping configurations for these platforms.

Decodo scored 93% on Trustpilot and 76% on Tripadvisor using its unblocker proxy, demonstrating solid performance on server-rendered review pages. However, it recorded 0% on both Google Maps and Yelp, and only 11% on Amazon despite using a structured API endpoint. Its coverage is limited to two of the five tested platforms, making it the narrowest option in the benchmark for review scraping.

SerpApi offers separate dedicated APIs for each major review platform rather than a single general-purpose scraping endpoint. It provides individual APIs for Google Maps Reviews, Yelp Reviews, Tripadvisor, each returning structured JSON with platform-specific fields such as topic mentions and sub-ratings on Google Maps, elite status and language breakdowns on Yelp, or location details on Tripadvisor by query.

Zyte was one of only two providers to return results on all five platforms, finishing with a 65% average success rate. It performed best on Tripadvisor (86%) and Yelp (57%), maintaining steady extraction across domains. Google Maps was a relative bright spot at 41%, one of the higher scores on a domain where most providers failed. All extraction was HTML-based with CSS selector parsing, so no structured metadata fields were returned beyond the five standard review fields.

Nimble reached 92% on Amazon and 66% on Trustpilot, showing it can handle structured review pages effectively. However, performance dropped to 1% on Google Maps and 31% on Yelp, where JavaScript-heavy rendering limited its HTML-based extraction. Its 52% overall average reflects this uneven platform support, with completion times averaging 20s.

To get up to date on enterprise AI and software, follow us:

Cem Dilmegani

Principal Analyst

Follow On

Review scraping benchmark methodology

We selected the top 5 review-focused domains from the Tranco top sites list: Amazon, Google Maps, Tripadvisor, Trustpilot, and Yelp. The five scraping providers were chosen from web data scraping companies with at least 100 employees. Each provider received the same set of 2,500 URLs (500 per platform), and we measured three metrics: success rate, completion time, and available metadata fields.

Providers and integration types

Providers were integrated using two approaches depending on the platform:

JSON structured API: The provider returns parsed review data in JSON format with named fields (e.g., reviewer_name, rating, review_text). Bright Data and Oxylabs offered this for select platforms.
HTML response: The provider returns rendered HTML, which we parsed using CSS selectors to extract review fields. Decodo, Nimble, and Zyte primarily used this approach.

Note: Decodo returned a JSON structured response for Amazon, but none of the responses contained successful review data. Its 11% success rate on Amazon came entirely from correct 404 detection, so no metadata fields are reported for that combination.

Reviews scraping benchmark validation rules

Each response went through a three-step validation:

Submission: HTTP status code between 200-399 or 404 was required to pass.
Execution: For async providers, the scraping job had to complete without timeout or error.
Validation: The response had to contain usable review data.
- For JSON responses: at least one review with a valid review_text (string) or rating (integer).
- For HTML responses: at least one CSS selector match returning review content.

Before running the full benchmark, we tested each provider with intentionally broken URLs, confirmed 404 pages, and live pages with zero reviews to map how each provider signals these edge cases. Providers returned different indicators depending on their implementation, including explicit error codes, HTTP 404 status, or empty response bodies.

When a provider correctly identified a page as not found or returned an appropriate response for a page with no reviews, the result was counted as valid. We then applied a cross-provider verification step: if a provider returned empty results on a URL where at least one other provider extracted review data, that empty result was reclassified as a failure. This separated extraction failures from pages that simply had no reviews to return.

Completion time

Completion time was measured end-to-end from the initial API request to receiving the final response. For async providers (e.g., Bright Data dataset API), this includes the polling/wait time until results were ready.

Available metadata fields

For providers returning structured JSON, we counted the total number of unique fields returned across all reviews. For HTML-based responses, the metadata count reflects the fixed set of CSS selector fields used for extraction (5 fields: reviewer_name, review_text, rating, review_date, review_title).

Reviews scraping benchmark dataset

The 2,500 test URLs were collected from publicly accessible review pages across the five Tranco top-ranked review platforms. URLs were cleaned to remove locale parameters, invalid formats, and duplicates before testing.

Shared configuration

All providers received identical URLs from the same dataset and were tested under the same conditions:

Sequential execution: one request at a time, no parallel requests
Delay between requests: 2 seconds
Rate limit handling: 30-second wait with up to 3 retries on HTTP 429
Submission timeout: 300 seconds
Execution timeout: 600 seconds
Each URL was tested once per provider

Provider configurations

Bright Data

Bright Data used two integration methods depending on the domain. For Amazon, Google Maps, Trustpilot, and Yelp, we used the Dataset API, which returns structured JSON with parsed fields. For Tripadvisor, we used a web unblocker that returns rendered HTML, which we parsed locally with CSS selectors.

The Dataset API was polled via the /progress/{snapshot_id} endpoint at 1-second intervals until the status reached ‘ready’. Results were then fetched from the /snapshot/{snapshot_id} endpoint.

Decodo

Decodo used the Universal Scraper API for Amazon. For Google Maps, Tripadvisor, Trustpilot, and Yelp, we used the web unblocker with the X-SU-Headless: HTML header for JavaScript rendering. All requests included a desktop User-Agent header.

Oxylabs

Oxylabs used a dedicated source API for Amazon (source: amazon_reviews) with structured JSON output. For Google Maps, Tripadvisor, Trustpilot, and Yelp, we used the Web Unblocker proxy. Unblocker requests included a desktop User-Agent header.

Nimble

Nimble used the Web API for all domains with render: true for JavaScript rendering. All requests returned rendered HTML, which we parsed with CSS selectors. No domain-specific configuration was applied.

Zyte

Zyte used the Extract API for all domains with browserHtml: true, which returns JavaScript-rendered HTML via a headless browser. No domain-specific configuration was applied.

FAQs

Manual product review scraping is slow and incomplete. Scraping customer reviews using automated tools allows you to extract hundreds or thousands of reviews in minutes.

This saves time and ensures your data collection process captures both positive and negative reviews.

Scraped reviews provide valuable customer insights for market research. Companies can track customer concerns, measure customer loyalty, and analyze customer preferences over time.

Most review platforms set restrictions on automated data extraction. Running web scrapers too aggressively can trigger CAPTCHA, IP blocks, or bans.

To reduce risks, use a respectful automated process with rate limits, random delays, and residential proxies if needed.

Typical fields include review text, star ratings, user names, dates, and metadata. Some setups also track structured data like location, product category, or business type.

You can collect customer reviews from various websites, including e-commerce platforms, social media networks, and popular platforms like Amazon, Walmart, Yelp, Google Play, and Trustpilot.

Nazlı Şipi

AI Researcher

Follow On

Nazlı is a data analyst at AIMultiple. She has prior experience in data analysis across various industries, where she worked on transforming complex datasets into actionable insights.

View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

Next to Read

Review ScrapingMay 7

Review Scraping Benchmark: Bright Data, Oxylabs & Decodo

Review scraping benchmark

Domain coverage by provider

Review scraping performance by domain

Available metadata fields for providers with structured JSON responses

Review scraping providers & benchmark results

Review scraping benchmark methodology

Providers and integration types

Reviews scraping benchmark validation rules

Completion time

Available metadata fields

Reviews scraping benchmark dataset

Shared configuration

Provider configurations

FAQs

Be the first to comment

Next to Read

6 Best Google Reviews Scraping Providers Compared

5 Best Tripadvisor Reviews Scraping APIs

Benchmarked the Top 5 Yelp Reviews Scrapers

Best Zillow Scraper APIs Compared: Performance review

Ninjaone Review: 15 Capabilities for Enterprise IT

Best Lead Scraping Tools: Pricing & Performance Review

Review Scraping Benchmark: Bright Data, Oxylabs & Decodo

Review scraping benchmark

Domain coverage by provider

Review scraping performance by domain

Available metadata fields for providers with structured JSON responses

Review scraping providers & benchmark results

Review scraping benchmark methodology

Providers and integration types

Reviews scraping benchmark validation rules

Completion time

Available metadata fields

Reviews scraping benchmark dataset

Shared configuration

Provider configurations

FAQs

Why scrape customer reviews instead of collecting them manually?

How can businesses use scraped review data?

What are the risks of scraping review platforms?

What kind of data can be extracted from reviews?

How many websites can I scrape reviews from?

Be the first to comment

Next to Read

6 Best Google Reviews Scraping Providers Compared

5 Best Tripadvisor Reviews Scraping APIs

Benchmarked the Top 5 Yelp Reviews Scrapers

Best Zillow Scraper APIs Compared: Performance review

Ninjaone Review: 15 Capabilities for Enterprise IT

Best Lead Scraping Tools: Pricing & Performance Review