Email scraping large site - Printable Version +- ScrapeBox Forum (https://www.scrapeboxforum.com) +-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion) +--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk) +--- Thread: Email scraping large site (/Thread-email-scraping-large-site) |
Email scraping large site - scrapingby - 05-12-2017 So i´m trying to email scrape this large site and am getting to my limits. I´m using a 18k seedlist of URLS and the grab emails from URL list but SB kept crashing and i lost the harvested emails. Now i´m trying with just 10 connections ( no proxies) yet and it takes ages. As loopline mentioned i probably need proxies. Now the questions is how many do i need? The site in question has 20 million pages indexed. If is crawler going through URLS at X per hour/minute second rate can i assume that if i use more proxies /connections i can simply multiply the number by the number of proxies? It´s been an hour now and it went through roughly 70k urls can i assume/extrapolate to reach 20 mio. pages in 285 hours more or less ? Or is it not that simple? Please help ..i feel stupid RE: Email scraping large site - loopline - 06-03-2017 My experience has been that with that size of a scraping run things aren't always linear, but you can try and see. I would get some shared proxies, which should work fine and be inexpensive. |