Is it legal to use a web crawler?

Q: Is it legal to use a web crawler?

Legality of crawling is currently a gray area and the Linkedin's lawsuit against hiQ which is still in progress, will likely create the first steps of a legal framework around data crawling. In case you are betting your business on crawling, for now don't. Unless severe restrictions are placed crawling, crawling will remain an important tool in the corporate toolbox. Leading web crawling companies claim to work with Fortune 500 companies like PwC and P&G. BusinessInsider claims in a paywalled article that hedgefunds spend billions on crawling. We will update this as the Linkedin vs HiQ case comes to a close. Please note that this does not constitute legal advice.

Web Crawler

Reviewed by Cem Dilmegani

Researched by Gulbahar Karatas

Last update: January 22, 2025

Web crawlers enable businesses to extract data from the web, converting the largest unstructured data source into structured data +Show More

Web is the largest source of public information however many websites do not want their information to be automatically collected. They employ anti-scraping measures that make web data collection more technically challenging.

Web scraping services solve this issue. These solutions can be customized for various types of websites to efficiently extract relevant data, including Google Scrapers, SERP (Search Engine Results Page) Scrapers, eCommerce Scrapers, and Social Media Scrapers.

To be included in this list, a product must provide a/an:

Interface (API or graphics based) for retrieving data from selected web pages
Administration console where users can track spending

Clarifications:

Web crawlers are also called web scrapers, web data extractors or collectors.
Historically, only bots from search engines were called web crawlers. With more companies adopting bots to collect web data, web crawlers now refer to all bots that collect web data.
Web crawler is a bot (i.e. software program) that crawls web pages. Most of the providers in this list provide both
- Bots that users can leverage to pull real-time data from websites. These are web crawlers / scrapers.
- Output of such bots (i.e. web data). In this case, the service is called as a web crawling / scraping service.

If you’d like to learn about the ecosystem consisting of Web Crawler and others, feel free to check AIMultiple Proxies & scrapers.

How relevant, verifiable metrics drive AIMultiple’s rankings

AIMultiple uses relevant & verifiable metrics to evaluate vendors.

Metrics are selected based on typical enterprise procurement processes ensuring that market leaders, fast-growing challengers, feature-complete solutions and cost-effective solutions are ranked highly so they can be shortlisted.
Data regarding these metrics are collected from public sources as outlined in the “What are AIMultiple’s data sources?” section of this page.

There are 2 ways in which vendor metrics are processed to help prioritization:
1- Vendors are grouped within 4 metrics (customer satisfaction, market presence, growth and features) according to their performance in that metric.
2- Vendors that perform high in these metrics are ranked higher in the list.

The data used in each vendor’s ranking can be accessed by expanding the vendor’s row in the below list.
This page includes links to AIMultiple’s sponsors. Sponsored links are included in “Visit Website” buttons and ranked at the top of the list when results are sorted by “Sponsored”. Sponsors have no say over the ranking which is based on market data. Organic ranking can be seen by sorting by “AIMultiple” or other sorting approaches. For more on how AIMultiple works, please see the ethical standards that we follow and how we fund our research.

Products	Position	Unblocker	Solution type	Interactive scraper
Bright Data	Leader	✅	No-code & API	✅	Visit Website
Bright Data empowers AI and business intelligence teams with real-time, high-quality public web data. Trusted by Fortune 500 companies, leading AI labs, and fast-scaling startups, Bright Data provides the foundational infrastructure needed to fuel AI models, automate decision-making, and unlock insights at scale. Why Bright Data Is the Go-To Infrastructure for AI & BI: Built for AI: Real-World Data at Scale AI models are only as good as the data they’re trained on. Bright Data offers access to massive volumes of structured, real-time, and historical web data — ideal for training large language models (LLMs), powering AI agents, and fine-tuning machine learning pipelines. From market research to competitive analysis, Bright Data delivers the data infrastructure BI teams need to make faster, smarter decisions. Access accurate, up-to-date data from millions of websites across industries like e-commerce, real estate, finance, and more. Bright Data offers a full stack of tools to collect, structure, and deliver web data — no scraping infrastructure required. Datasets: Pre-collected, ready-to-use datasets from sources like LinkedIn, Google Maps, Crunchbase, and Zillow — ideal for AI training and enrichment. Scraper APIs: Code-free and developer-friendly APIs to extract real-time data from any website. Web Unlocker: Bypass anti-bot systems and access even the most protected websites with ease. Proxy Infrastructure: The world’s largest and most reliable proxy network for custom data collection. Key Use Cases: AI Model Training: Feed LLMs and ML models with diverse, real-world data. AI Agents: Enable autonomous agents to interact with and learn from the live web. Data Enrichment: Enhance internal datasets with public web data for better predictions and insights. BI Dashboards: Power analytics tools with fresh, structured data from across the web. Market Intelligence: Monitor competitors, pricing, and trends in real time. Features: Why Leading AI & BI Teams Choose Bright Data . Access to 72M+ IPs and 1M+ websites . Developer-friendly APIs and SDKs . AI-ready datasets with minimal preprocessing . Enterprise-grade compliance (GDPR, CCPA) . 99.99% uptime and 24/7 support . Real-time and historical data options. Developed For: AI & ML Engineers Data Scientists BI Analysts Product & Innovation Teams Enterprises building AI-driven products Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.70 / 5 based on ~200 reviews Market presence Company's number of employees 1k-2k employees Company's social media followers 30k-40k followers Features Unblocker ✅ Solution type No-code & API Proxy support ✅ JavaScript rendering ✅ Interactive scraper ✅ Company Type of company private Founding year 1901 Price Growth 15400 Requests for $499 / Month
Oxylabs	Leader	✅	API	❌	Visit Website
Provides: More than 177M IPs in 195 countries worldwide, including residential, mobile, datacenter, ISP, and SOCKS5 proxy servers. Large-scale scraping of public web data without being detected and blocked by the target websites. Web Unblocker to collect data at scale from JavaScript-heavy websites API-based web scrapers Web datasets for teams that want to get fresh, structured web data without building a web scraping and parsing infrastructure Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.50 / 5 based on ~100 reviews Market presence Company's number of employees 300-400 employees Company's social media followers 20k-30k followers Features Unblocker ✅ Solution type API Interactive scraper ❌ Company Type of company private Founding year 2015 Price Micro 10 Requests for $49 / Month
Decodo	Leader	✅	API	❌	Visit Website
Provides proxies and scrapers for web data collection. 40M+ ethically sourced residential and datacenter proxies in 195+ countries, including states and cities worldwide, to avoid geo and IP blocks while scraping. Decodo's proxy network includes residential, ISP (static residential), mobile, datacenter and dedicated datacenter proxies. Proxies support HTTP and SOCKS5 protocols. Offers Site Unblocker that allows users to automate proxy selection and render JavaScript web pages. Scrapers retrieve data from any website without writing a single line of code. Schedules the scraping task and receives the results via email or webhook. Provides pre-made scraping templates. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.60 / 5 based on ~200 reviews Market presence Company's number of employees 100-200 employees Company's social media followers 1k-2k followers Features Unblocker ✅ Solution type API Proxy support ✅ JavaScript rendering ✅ Interactive scraper ❌ Company Type of company private Founding year 2018 Price 25K requests 25000 Requests for $50 / Month
Apify	Leader	✅	No-code & API	❌
Apify is a platform for web scraping and automation, enabling users to extract data from websites, process it, and automate their workflows. It provides scrapers and proxies to support data collection projects. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.62 / 5 based on ~200 reviews Market presence Company's number of employees 100-200 employees Company's social media followers 5k-10k followers Total funding $1-5m # of funding rounds 4 Latest funding date June 19, 2019 Last funding amount $1-5m Features Unblocker ✅ Solution type No-code & API Proxy support ✅ JavaScript rendering ✅ Interactive scraper ❌ Company Type of company private Founding year 2015 Price Starter 32 GB for $49 / Month
NetNut	Leader	✅	API	❌	View Profile
NetNut is a proxy service providers that offers proxies for individuals and businesses, including residential (rotating & static), datacenter and mobile proxy servers. The proxy provider also offers Website Unblocker technology. NetNut provides customers with proxy services that are customised to their specific applications. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.90 / 5 based on ~100 reviews Market presence Company's number of employees 50-100 employees Company's social media followers 5k-10k followers Features Unblocker ✅ Solution type API Interactive scraper ❌ Company Type of company private Founding year 2017 Price Production 1000000 Requests for $1080 / Month
Phantombuster	Challenger	-	API	❌
Offers cloud-based LinkedIn profile scraper and a company scraper to help users scrape public data from the platform. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.36 / 5 based on ~100 reviews Market presence Company's number of employees 5-10 employees Company's social media followers 100-200 followers Total funding $1-1m # of funding rounds 2 Latest funding date May 1, 2019 Last funding amount $1-1m Features Solution type API Interactive scraper ❌ Company Type of company private Founding year 2016 Price Starter 10000 Requests for $56 / Month
Octoparse	Challenger	-	No-code	❌
Free Web Scraping Tool & Free Web Crawlers for Data Extraction without coding. Cloud-Based Web Crawling/Data As A Service. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.05 / 5 based on ~80 reviews Market presence Company's number of employees 20-30 employees Company's social media followers 1k-2k followers Features Solution type No-code Interactive scraper ❌ Company Type of company private Founding year 2016 Price Standard Plan 100 Requests for $99 / Month
Diffbot	Challenger	-	API	❌
Diffbot provides a suite of products built to turn unstructured data from across the web into structured, contextual databases. Diffbot's products are built off of cutting-edge machine vision and natural language processing software that's able to read billions of documents every day. Diffbot Knowledge Graph Diffbot's Knowledge Graph product is the world's largest contextual database comprised of over 10 billion entities including organizations, products, articles, events, and more. Knowledge Graph's innovative NLP and fact parsing technologies link up entities into contextual databases, incorporating over 1 trillion "facts" from across the web in nearly live time. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.70 / 5 based on ~30 reviews Market presence Number of case studies 5-10 case studies Company's number of employees 30-40 employees Company's social media followers 10k-20k followers Total funding $10-50m # of funding rounds 3 Latest funding date February 11, 2016 Last funding amount $10-50m Features Solution type API Interactive scraper ❌ Company Type of company private Founding year 2012 Price Startup 250.000 Requests for $299 / Month
Datahut	Niche Player	-	-	-
Datahut is a web scraping service provider providing web scraping, data scraping, web crawling and web data extraction to help companies get structured data Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.70 / 5 based on ~10 reviews Market presence Company's number of employees 20-30 employees Company's social media followers 2k-3k followers Company Type of company private Founding year 2015
Zyte	Niche Player	-	API	❌
Offers proxy networks, API for data collection activities, and web data extraction services for businesses. Basis for Evaluation We made these evaluations based on the following parameters; Customer satisfaction Average rating 4.20 / 5 based on ~20 reviews Market presence Company's number of employees 200-300 employees Company's social media followers 40k-50k followers Features Solution type API Interactive scraper ❌ Price Starter 1000 Requests for $100 / Month

“-”: AIMultiple team has not yet verified that vendor provides the specified feature. AIMultiple team focuses on feature verification for top 10 vendors.

Sources

AIMultiple uses these data sources for ranking solutions and awarding badges in web crawlers:

13 vendor web domains

10 funding announcements

35 social media profiles

22 profiles on review platforms

16 search engine queries

Web crawling Leaders

According to the weighted combination of 4 metrics

What are web crawling
customer satisfaction leaders?

Taking into account the latest metrics outlined below, these are the current web crawling customer satisfaction leaders:

Which web crawling solution provides the most customer satisfaction?

AIMultiple uses product and service reviews from multiple review platforms in determining customer satisfaction.

While deciding a product's level of customer satisfaction, AIMultiple takes into account its number of reviews, how reviewers rate it and the recency of reviews.

Number of reviews is important because it is easier to get a small number of high ratings than a high number of them.
Recency is important as products are always evolving.
Reviews older than 5 years are not taken into consideration
older than 12 months have reduced impact in average ratings in line with their date of publishing.

What are web crawling
market leaders?

Taking into account the latest metrics outlined below, these are the current web crawling market leaders:

Which one has collected the most reviews?

AIMultiple uses multiple datapoints in identifying market leaders:

Product line revenue (when available)
Number of reviews
Number of case studies
Number and experience of employees
Social media presence and engagement

Out of these, number of reviews information is available for all products and is summarized in the graph:

Bright Data Proxies & Scrapers

Smartproxy Proxies & Scrapers

Apify

NetNut

Oxylabs Proxies & Scrapers

What are web crawling feature leaders?

Taking into account the latest metrics outlined below, these are the current rpa software feature leaders.

Which one offers the most features?

Bright Data Proxies & Scrapers, Smartproxy Proxies & Scrapers, Nimble offer the most feature complete products.

See how features are counted.

Bright Data Proxies & Scrapers

5 features

Smartproxy Proxies & Scrapers

5 features

Nimble

5 features

Apify

5 features

Oxylabs Proxies & Scrapers

3 features

What are the most mature web crawlers?

Which one has the most employees?

Which web crawling companies have the most employees?

92 employees work for a typical company in this solution category which is 69 more than the number of employees for a typical company in the average solution category.

In most cases, companies need at least 10 employees to serve other businesses with a proven tech product or service. 14 companies with >10 employees are offering web crawlers. Top 3 products are developed by companies with a total of 1k employees. The largest company in this domain is Bright Data with more than 1000 employees. Bright Data provides the web crawling solution: Bright Data Proxies & Scrapers

Bright Data

Oxylabs

Zyte

Apify

smartproxy.com

Insights

What are the most common words describing web crawlers?

This data is collected from customer reviews for all web crawling companies. The most positive word describing web crawlers is “Easy to use” that is used in 5% of the reviews. The most negative one is “Expensive” with which is used in 2% of all the web crawling reviews.

What is the average customer size?

According to customer reviews, most common company size for web crawling customers is 1-50 Employees. Customers with 1-50 Employees make up 71% of web crawling customers. For an average Proxies & scrapers solution, customers with 1-50 Employees make up 35% of total customers.

Customer Evaluation

These scores are the average scores collected from customer reviews for all web crawlers. Web Crawlers are most positively evaluated in terms of "Overall" but falls behind in "Likelihood to Recommend".

Overall

Customer Service

Ease of Use

Likelihood to Recommend

Value For Money

Where are web crawling vendors' HQs located?

Trends

What is the level of interest in web crawlers?

This category was searched on average for 86.6k times per month on search engines in 2024. This number has decreased to 0 in 2025. If we compare with other proxies & scrapers solutions, a typical solution was searched 30.5k times in 2024 and this decreased to 0 in 2025.

Learn more about Web Crawlers

Web crawlers extract data from websites. Websites are designed for human interaction so they include a mix of structured data like tables, semi-structured data like lists and unstructured data like text. Web crawlers analyze the patterns in websites to extract and transform all these different types of data.

Crawlers are useful when data is spread over multiple pages which makes it difficult for a human to copy the data

First, user needs to communicate the relevant content to the crawler. For the technically savvy, this can be done by programming a crawler. For those with less technical skills, there are tens of web crawlers with GUIs (Graphical User Interface) which let users select the relevant data

Then, user starts the crawler using a bot management module. Crawling tends to take time (e.g. 10-20 pages per minute in the starter packages of most crawlers). This is because the web crawler visits the pages to be crawled like a regular browser and copies the relevant information.

If you tried doing this manually, you would quickly get visual tests to verify that you are human. This test is called a CAPTCHA "Completely Automated Public Turing test to tell Computers and Humans Apart". Websites have variety of methods like CAPTCHA to stop such automated behavior. Web crawlers rely on methods like changing their IP adresses and digital fingerprints to make their automated behavior less noticeable

Web crawling is a true Swiss army knife like Excel, therefore we will stick to the most obvious use cases here:

Competitive analysis: Knowing your competitor's campaigns, product launches, price changes, new customers etc. can be invaluable in competitive markets. Crawlers can be set to produce alarms and reports to inform your sales, marketing and strategy teams. For example, Amazon sellers set up price monitoring bots to ensure that their products remain in the correct relative position compared to the competition. Things can take an unexpected turn when two companies automatically update their prices based on one another's price changes. Such automated pricing bots led a book to reach a $23m sales price.
Track customers: While competition rarely kills companies, failing to understand changing customer demands can be far more damaging. Crawling customers' websites can help better understand their business and identify opportunities to serve them.
Extract leads: Emails and contact information of potential customers can be crawled for building a lead funnel. For example, info@[domain].com email addresses get hundreds of sales pitches as these get added into companies' lead funnels
Enable data-driven decision making: Even today, most business decisions rely on a subset of the available relevant data. Leveraging the world's largest database, internet, for data-driven decision making makes sense especially for important decisions where cost of crawling would be insignificant.

Web crawlers are most commonly used by search engines to index web content. Here are some of the main applications of web crawlers:

Data mining
Web archiving
Website testing
Web scraping
SEO Monitoring

A web crawler systematically browses and indexes the web, while a web scraper is used to extract specific data from websites for individual use and analysis.

The legality of web crawling depends on various factors, including the country in which it is conducted, the specific website being crawled, and the actions of the crawler. Websites often contain specific instructions for web crawlers in their "robots.txt" file or their terms of service. Adhering to these instructions is important when performing ethical web crawling activities.

For the United States, these are high level guidelines:

It can be illegal to login to scrape data as outlined in hiQ Labs v. LinkedIn
It is legal to scrape public data if the scraper is not a user of the platform to be scraped. Example case: Meta Platforms v. Bright Data

Unless severe restrictions are placed crawling, crawling will remain an important tool in the corporate toolbox. Leading web crawling companies claim to work with Fortune 500 companies like PwC and P&G. BusinessInsider claims in a paywalled article that hedgefunds spend billions on crawling.

This does not constitute legal advice.

The concept of a "politeness policy" in the context of web crawling refers to a set of guidelines aimed at preventing web crawlers from overloading websites with excessive requests. A politeness policy may include rules such as crawling frequency, respect for robots.txt, or content scraping restrictions. It is important to adhere to the politeness policy set by website owners regarding the scraping.

Web Crawler

Bright Data

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Oxylabs

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Decodo

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Apify

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

NetNut

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Phantombuster

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Octoparse

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Diffbot

Basis for Evaluation

Customer satisfaction

Market presence

Features

Company

Price

Datahut

Basis for Evaluation

Customer satisfaction

Market presence

Company

Zyte

Basis for Evaluation

Customer satisfaction

Market presence

Features

Price

Sources

Web crawling Leaders

What are web crawlingcustomer satisfaction leaders?

Which web crawling solution provides the most customer satisfaction?

What are web crawlingmarket leaders?

Which one has collected the most reviews?

What are web crawling feature leaders?

Which one offers the most features?

What are the most mature web crawlers?

Which web crawling companies have the most employees?

Insights

What are the most common words describing web crawlers?

What are web crawling
customer satisfaction leaders?

What are web crawling
market leaders?