We benchmarked 5 web scraping providers on Indeed job postings with 2,500 requests, measuring success rate, completion time, and metadata output.
Indeed job postings benchmark
You can read our benchmark methodology for more details on our testing process.
What you can scrape from Indeed job postings
Bright Data was the only provider to return structured JSON for Indeed, delivering 25 parsed fields per job posting. The other four providers returned rendered HTML, which we extracted locally with CSS selectors.
Indeed scrapers output & free trial options
The best Indeed scrapers
Bright Data led the Indeed scraping benchmark with 100% success rate.
The platform also includes anti-blocking infrastructure, CAPTCHA handling, residential proxies, and JavaScript rendering. Beyond the Dataset API, Bright Data offers Web Unblocker and SERP API products for users who prefer to scrape Indeed directly via proxy.
Get 25% off Bright Data's Web Scraping APIs
Visit WebsiteOxylabs achieved 99% success rate on Indeed. Oxylabs web scraper API processes URLs through the universal source, which handles JavaScript rendering, anti-bot bypass, and IP rotation, then returns rendered HTML for local parsing with CSS selectors.
Get 2,000 free scraping credits
Visit WebsiteDecodo posted a 99% success rate on Indeed. We used Decodo’s Web Scraper API to scrape Indeed. It handles JavaScript rendering, evades browser detection, controls request rates, and retries failed attempts automatically. Results come back as rendered HTML. You can choose between a Core plan for simpler jobs or an Advanced plan with premium proxies and robust JS rendering.
Apply SCRAPE30 for 30% offZyte failed to extract any data from Indeed, recording a complete 0% success rate. Indeed URLs were sent through Zyte’s Extract API with browserHtml: true, which is meant to render JavaScript via a headless browser. The API returned HTTP 200 with full-size HTML on 484 of 500 requests (16 returned HTTP 520 proxy errors), but the rendered output never contained Indeed’s job-detail DOM elements, so no job data could be extracted under CSS-selector validation.
Zyte’s Extract API works as a single-endpoint platform across many sites, but Indeed’s client-side rendering left the response as a JavaScript shell rather than a populated job page in this run.
Nimble reached a 14% success rate on Indeed benchmark. Nimble’s Web Extract API was employed to handle Indeed URLs with browser rendering, returning rendered HTML for parsing. Yet, Indeed’s inconsistent content rendering across the test set prevented successful CSS-selector extraction of job fields on most pages.
Underneath, Nimble routes traffic through residential IPs with smart proxy selection and backconnect gateways. Search parameters like job title, keyword, and country can be sent with each request.
Indeed robots.txt and scraping policy
Indeed robots.txt file outlines which parts of the site can be accessed by bots and which paths are restricted. For example, Indeed blocks or restricts crawling of several internal endpoints such as job pages, search APIs, and GraphQL endpoints. These restrictions are intended to control automated traffic and protect the platform from excessive scraping.
Developers performing Indeed web scraping should always:
- Review the latest Indeed robots.txt rules
- Respect the website’s terms of service
Because robots.txt policies can change over time, it is recommended to check the file regularly before running large-scale scraping processes.1
Indeed job postings benchmark methodology
We benchmarked 5 web scraping providers on Indeed job posting extraction. Each provider received the same set of 500 Indeed job postings URLs (individual job pages), submitted sequentially with a 2-second delay between requests. Total: 2,500 requests across the benchmark.
Providers and integration
Each provider was tested using its standard production endpoint. No custom proxies or third-party tooling were inserted between us and the provider.
Bright Data was tested through its dedicated Indeed Dataset API (gd_l4dx9j9sscpvs7no2), which returns parsed JSON.
Oxylabs was tested through its Web Scraper API using source: universal, which returns rendered HTML.
Decodo was tested through its Web Scraper API using headless: html and proxy_pool: premium, which returns rendered HTML.
Nimble was tested through its Web Extract API with render: true and driver: vx10, which returns rendered HTML.
Zyte was tested through its Extract API with browserHtml: true, which returns rendered HTML.
For HTML responses, we parsed the page locally with CSS selectors targeting Indeed’s job-detail elements.
Timeout and rate limiting
Each async request had a 10-minute execution timeout. HTTP 429 responses triggered a 30-second backoff with up to 3 retries; beyond that, the run was recorded as a failure.
Validation rules
Each request went through three checks.
The submission check required an HTTP status of 200 to 399 or 404 from the provider. The execution check required async jobs (Bright Data Dataset API) to finish within timeout without errors; sync providers auto-passed. The validation check required at least one of job_title or company_name to be returned as a non-empty string. For JSON providers, this came from the parsed response. For HTML providers, it came from CSS selector matches.
A request that detected a 404 page (HTTP 404, “page not found” content, or a provider’s explicit “dead page” signal) was also counted as valid, since the provider correctly identified an unavailable listing.
Empty responses with no error were initially counted as valid, then re-checked: if any other provider extracted real job data on the same URL, the empty response was flipped to invalid. 404 detections were exempt from this flip; a provider’s explicit “page doesn’t exist” signal was trusted unless contradicted by real extracted data from another provider.
A run was counted as overall successful only if submission, execution, and validation all passed.
Metrics measured
Validation success rate is the share of URLs that passed all three checks.
End-to-end completion time is the time from request submission to response, measured in seconds. For async providers (Bright Data), this includes polling time until the dataset job finished.
Available metadata fields is, for providers returning structured JSON, the unique field count across all responses computed as a set union. For HTML providers, this is the fixed five-selector CSS schema we used.
FAQs
Here are some examples of job listing data that can be scraped from Indeed:
Job title
Company name
Location (city, state, sometimes remote flag)
Job description/responsibilities
Salary info (when disclosed or estimated)
Employment type (full-time, part-time, contract, internship, etc.)
Date posted / how long ago
Job URL / posting ID
These fields may appear sometimes or require user interaction:
Company reviews and ratings
Application links/buttons (may redirect to employer ATS)
Recruiter/employer contact info (rare, often hidden or behind logins)
Yes, Indeed offers official public APIs. To access these APIs, you need to become an Indeed partner, set up an app in their Partner Console, get credentials, and use OAuth to get access tokens. Here is how they work and what they provide:
Job Sync API (GraphQL): Enables ATS (Applicant Tracking System) partners to create, update (upsert), expire, and list job postings on Indeed.
Employer Data API: Let users create or update “employer entities”. They can manage employer attributes so that job seekers see the correct company information.
Job Update API: For listing and updating job postings by criteria.
Be the first to comment
Your email address will not be published. All fields are required.