05-12-2017, 04:05 PM
So i´m trying to email scrape this large site and am getting to my limits.
I´m using a 18k seedlist of URLS and the grab emails from URL list but
SB kept crashing and i lost the harvested emails. Now i´m trying with
just 10 connections ( no proxies) yet and it takes ages.
As loopline mentioned i probably need proxies.
Now the questions is how many do i need?
The site in question has 20 million pages indexed.
If is crawler going through URLS at X per hour/minute second rate
can i assume that if i use more proxies /connections i can simply multiply
the number by the number of proxies?
It´s been an hour now and it went through roughly 70k urls
can i assume/extrapolate to reach 20 mio. pages in 285 hours more
or less ? Or is it not that simple?
Please help ..i feel stupid
I´m using a 18k seedlist of URLS and the grab emails from URL list but
SB kept crashing and i lost the harvested emails. Now i´m trying with
just 10 connections ( no proxies) yet and it takes ages.
As loopline mentioned i probably need proxies.
Now the questions is how many do i need?
The site in question has 20 million pages indexed.
If is crawler going through URLS at X per hour/minute second rate
can i assume that if i use more proxies /connections i can simply multiply
the number by the number of proxies?
It´s been an hour now and it went through roughly 70k urls
can i assume/extrapolate to reach 20 mio. pages in 285 hours more
or less ? Or is it not that simple?
Please help ..i feel stupid