ScrapeBox Forum
Havester Problem - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Havester Problem (/Thread-havester-problem)



Havester Problem - edge13 - 10-18-2016

Hi,

I'm a new user, I have 20 private proxies from squidproxies, and set connections to 1 when I'm using the custom harvester, problem is it can't go too far, mostly after I get around 3000 urls, I start receiving errors all the time. The latest time, I only got less 700 urls.

   

Anyone can tell me what is the problem please?

Thanks!


RE: Havester Problem - fatbusybee - 10-19-2016

im having almost the same issue...(im always haing one ever since i started using scrapebox)

i am still testing out SB since i purchase like 3 weeks ago....

never got past the harvesting url stage (getting error message like access violation bla bla bla)...

i dont use private proxy yet because im only scraping for a single web site
any tips to get pass this stage?


RE: Havester Problem - loopline - 10-20-2016

You should send your bug report to scrapebox support

Are you getting the access violation when using what function? If one the main program functions then go into your scrapebox folder and there should be a bugreport.txt file, send that to

scrapeboxhelp (at] gmail [dot) com


RE: Havester Problem - Taken - 11-08-2016

(10-20-2016, 02:54 AM)loopline Wrote: You should send your bug report to scrapebox support

Are you getting the access violation when using what function? If one the main program functions then go into your scrapebox folder and there should be a bugreport.txt file, send that to

scrapeboxhelp (at] gmail [dot) com

I am having the same issues as the following user. Why does this occur? It's been like this since I have purchased the software. I've tried different settings to solve this matter but nothing seems to work at all.

I've sent an email to you guys just now and waiting for a reply. Hopefully, it shouldn't be too long Smile


RE: Havester Problem - adam110 - 11-11-2016

have you tried to access google manually with your proxies? (outside of scrapebox) if so then is google showing you the results?

Im scraping with scrapebox right now and as long as you have the proper setup then you should be ok to scrape..

I have squid proxies but I dont use them for google scraping because the amount I scrape would kill them -


RE: Havester Problem - loopline - 11-14-2016

(11-08-2016, 03:56 PM)Taken Wrote:
(10-20-2016, 02:54 AM)loopline Wrote: You should send your bug report to scrapebox support

Are you getting the access violation when using what function? If one the main program functions then go into your scrapebox folder and there should be a bugreport.txt file, send that to

scrapeboxhelp (at] gmail [dot) com

I am having the same issues as the following user. Why does this occur? It's been like this since I have purchased the software. I've tried different settings to solve this matter but nothing seems to work at all.

I've sent an email to you guys just now and waiting for a reply. Hopefully, it shouldn't be too long Smile

If your getting an access violation then contact support would be the only option. or are you having another issue?


RE: Havester Problem - edge13 - 11-18-2016

(11-11-2016, 10:44 AM)adam110 Wrote: have you tried to access google manually with your proxies? (outside of scrapebox) if so then is google showing you the results?

Im scraping with scrapebox right now and as long as you have the proper setup then you should be ok to scrape..

I have squid proxies but I dont use them for google scraping because the amount I scrape would kill them -

I'm using squid proxies as well, and those good people have change 3 batches proxies for me already, but I keep getting this same issue.
[Image: scrapebox.png]
[Image: scrapebox-20161116.png]

Can you share your setup to us please?

Thank you!

(10-20-2016, 02:54 AM)loopline Wrote: You should send your bug report to scrapebox support

Are you getting the access violation when using what function? If one the main program functions then go into your scrapebox folder and there should be a bugreport.txt file, send that to

scrapeboxhelp (at] gmail [dot) com

Thank you loopline, I contacted squid proxies afterwards, they have changed 3 batches proxies for me already, but still have the same problem, I will send scrapebox for help.


RE: Havester Problem - loopline - 11-19-2016

how many connections are you running? 50K errors is a LOT Thats coming from blocked ips most likley. If you go to help >> show error log - and choose harvester, what are the errors?


RE: Havester Problem - adam110 - 11-19-2016

yup it sounds like your killing the squid proxies way too quickly - probably due to your settings

I have 100 back connect proxies so my proxies rotate every 10 minutes

here is how I have things setup

8 threads when running harvester (100 proxies added) 17 sec wait time for proxy - harvester wait time is set to 30 sec - I also have retry proxies set to max which is 10 I think (Im wondering if scrapebox could code something for unlimited retries with delays or have some type of back connect proxies checkbox that will auto add the settings for us)

From your screenshots I can see you was able to scrape google but i see all those errors too..

Squid are excellent proxy providers but im not sure they are the right choice for something like this.. I personally use squid proxies for my social accounts such as google and youtube I dont scrape with them


RE: Havester Problem - loopline - 11-22-2016

(10-18-2016, 05:45 PM)edge13 Wrote: Hi,

I'm a new user, I have 20 private proxies from squidproxies, and set connections to 1 when I'm using the custom harvester, problem is it can't go too far, mostly after I get around 3000 urls, I start receiving errors all the time. The latest time, I only got less 700 urls.



Anyone can tell me what is the problem please?

Thanks!

Your proxies are all getting banned. Typically you would need to have 50-75 proxies or more before you can run 1 thread. You will need to wait until your proxies are unblocked and then use the detailed harvester and use a delay.

https://www.youtube.com/watch?v=GadX5AXiW34