ScrapeBox Forum
Google captchas when scraping - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Google captchas when scraping (/Thread-google-captchas-when-scraping)



Google captchas when scraping - adam110 - 11-11-2016

hi

i was wondering if scrapebox will solve google captchas if promoted when scraping google?

Reason im asking is because I added captcha details to scrapebox and Ive been scraping for weeks.. when looking at my captcha balance then it has not yet been touched.

I frequently get many keywords that return zero results when scraping and I reload them back into the tool.. Im wondering if many of these were presented with a captcha and then skipped...

thanks


RE: Google captchas when scraping - loopline - 11-11-2016

No, it will not. Captcahs are used when posting to sites, but are not used when scraping.

They did have that once upon a day but it caused way more issue then it solved.

I mean you might get 50 requests and then get your ip blocked, then solve a captcah, then get 2-3 requests, then blocked again. Then you solve a captcah, get 2-3 requests then get permanently blocked or long term blocked for days with no captcha option.

So solving captchas takes a bunch of time and then you get a handful of requests before you long term ban or perma ban your ip. Its better to just slow down a bit and not get the ips blocked in the first place, or if they are public proxies just get more proxies.

I have a video here

https://www.youtube.com/watch?v=GadX5AXiW34

that might be helpful for you


RE: Google captchas when scraping - adam110 - 11-12-2016

(11-11-2016, 09:22 PM)loopline Wrote: No, it will not. Captcahs are used when posting to sites, but are not used when scraping.

They did have that once upon a day but it caused way more issue then it solved.

I mean you might get 50 requests and then get your ip blocked, then solve a captcah, then get 2-3 requests, then blocked again. Then you solve a captcah, get 2-3 requests then get permanently blocked or long term blocked for days with no captcha option.

So solving captchas takes a bunch of time and then you get a handful of requests before you long term ban or perma ban your ip. Its better to just slow down a bit and not get the ips blocked in the first place, or if they are public proxies just get more proxies.

I have a video here

https://www.youtube.com/watch?v=GadX5AXiW34

that might be helpful for you

thanks Loopline

I thought it would have been good to make it optional. so for those that would like to solve g captchas when scraping can opt to do so.. there could be some settings passed down to the end user such as max captchas to solve per a proxy in a 5 min period etc ..

Because I use back connect (reverse proxies) I feel this could actually get us more results.. Either way ive upped the wait time - set the retries to max and things are working much slower but its more stable

thanks for your help


RE: Google captchas when scraping - loopline - 11-12-2016

Sure. I don't get to make the decision, the developers said no. The problem is that support time has to be considered, its part of the factor of a lifetime license with free support at an ultra inexpensive price. In the past, years ago when it was in, the userbase was a fraction of what it is now. That feature alone generated tons of support, they already feild issues with google blocking proxies every single day as people don't get it.

A handful of users may benefit from such a feature in a big or small way, but if it causes mountain loads of support, its a loss on the scrapebox team end. At some point if they added in all the features that caused tons of support they would have to raise the price way up or go out of business and then everyone loses.

Plus to be honest, they even recently tested this and you can't know that after only X solves an ip will be perma banned. So you might bet 50 or 100 requests in and then 3 more requests and get the ip permananly or very long term banned. So if you do nothing, in a day or two that ip may come back around in your back connect pool and work fine again with google for another 50 to 100 solves or more. If you get it long term banned in a short order you and everyone else will have all the back connect pool of ips perma banned and no one will be able to scrape anything, then you lose and you lose big.

Its far better to go another route, doing as you did or scraping bing for example, as they are very light on ip bans. Or even using deeperweb or google api, which is google powered but has their own ip bans.


RE: Google captchas when scraping - adam110 - 11-19-2016

thanks for the explanation loopline


RE: Google captchas when scraping - loopline - 11-19-2016

your welcome, cheers!


RE: Google captchas when scraping - Try A Million - 11-25-2016

I was wondering about this as well. Seen the option to solve captchas for posts but not for harvesting. Makes sense. Very easy to think that it will only need to solve a captcha now and again only to find it doing that every few requests.