ScrapeBox Forum
Harvester Stopping Early - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Harvester Stopping Early (/Thread-harvester-stopping-early)



Harvester Stopping Early - kissthechicken28 - 08-17-2016

So I've run into an issue recently with Scrapebox Harvester. I'll run the scraper and find several million keywords and then plug them into Harvester (specifically for Bing if that matters). I'll start and everything will seem to be going smoothly. However, the Harvester will stop early and say complete and only find a fraction of the sites it is supposed to.

For example, I recently scraped around 2 million keywords. I ran Harvester and it got to 1% on the completion bar then stopped, only finding around 500,000 sites. When I scrolled through the keywords almost all of them were red and showed that zero sites were found.

Any idea what is happening? Thanks.

If it makes a difference the error log is showing that I am getting 403 errors


RE: Harvester Stopping Early - adam110 - 10-14-2016

yup the same is happening with me too.. I will add keywords and it will do about 20% and then suddenly it will say complete.


RE: Harvester Stopping Early - adam110 - 10-15-2016

Any idea how we can troubleshoot this? I feel many of my keywords are being skipped


RE: Harvester Stopping Early - adam110 - 10-16-2016

Ok just an update that this is happening to me each and every time..

I ran a list of 10k keywords into the harvester yesterday using 100 proxies and 25 threads..

Once everything was complete (I checked my vps in the morning) and it said 100% complete.. Then scrapebox shows you the keywords and how many urls it was able to scrape from them...It also gives you the option to save all keywords that produced 0 results

I went up and down that log and not even half of the keywords had actually got results..

So I downloaded all keywords with 0 results (6500+) which means scrapebox done less than half and reported it back as 100% complete.

I then cleared all the keywords and imported the keywords that got 0 results and scrapebox is flying again scraping thousands of urls from the keywords it decided to skip (more then half)

Please let me know if there is something I am doing wrong here or if there is anything I can do to make scrapebox actually go through all the keywords I am adding..

I would love to add a large list of keywords but scrapebox doesnt seem to be able to go through the current lists I am adding

please help


RE: Harvester Stopping Early - adam110 - 10-17-2016

Quick Update

Out of the initial 10k keywords I added to scrapebox 6400 resulted with 0 results (I noted 6500 in my post above but its actually 6400)

I then added those 6400 keywords back into scrapebox and this time it got 0 results for 1279 keywords

Ive now added those 1279 keywords back into scrapebox and its churning away scraping urls..

Im not sure if Im posting in the wrong section because im reciving zero replies here.. Is there a better place for me to post or maybe even an email I can contact support with?

Because scrapebox is not going through all the keywords I need to attend to my VPS each day which is not what im wanting to do


RE: Harvester Stopping Early - loopline - 10-18-2016

Go to settings >> connections timeouts and other settings >> more harvester options >> proxy retries. Max it out.

What happens is that when scrapebox queries an engine and the proxy is blocked, it will try a new one, up to the number you have set in proxy retries. If it gets blocked proxies each time until it reaches the max you have set, it will skip the keyword.

So you can turn that up and also ideally sort out how to go slow enough that your not getting a bunch of proxies blocked so you don't have that issue in the first place. Unless your using public proxies at which point to some degree this is a case of it is what it is.


RE: Harvester Stopping Early - adam110 - 10-20-2016

Thanks Loopline - I Currently got a big scrape running so will wait till its complete and then change the settings as you noted

If I select 100 threads and I have 100 proxies then how does that work?

does it open a thread for each proxy ? and now lets say 3 of the proxies failed then it would change to another proxy to try and get the results till total number of proxies retries has lapsed? which could result in multiple requests from the same proxy in different threads?

If that is the case I can see how this could kill the proxies

My proxies are actually back connect proxies and rotate every 10 mins...I have 100 of them and they seem to be working very well - I just cant seem to get any results from Yahoo

At the moment im only using 20-25 threads per a scrape with 100 proxies loaded - is this is a safer setup then maxing things out at 100 threads ? and then I can up the retries

thanks


RE: Harvester Stopping Early - loopline - 10-21-2016

Your going to use a thread for every proxy and get your proxies banned in probably 60 seconds. Yes it could result in multiple requests from the same proxy, but even 1 request per ip in rapid fire will get your ips banned.
With back connect its a bit different but if you 1 conneciton per an ip in a minute or two it will probably be banned (or less) and thus leave you with 8-9 mins of nothing. So I would try like 20 threads with 100 proxies.

You can try going up and see at what point your success lowers. You can time it and see where the sweet spot is etc..


RE: Harvester Stopping Early - adam110 - 10-27-2016

Thanks Loopline - Im getting way fewer keywords with 0 results now

Im using 100 proxies with 16 threads


RE: Harvester Stopping Early - loopline - 10-28-2016

Your welcome. Thats probably still a high thread count, but if it works run with it.