ScrapeBox Forum
I can't extract urls from google - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: I can't extract urls from google (/Thread-i-can-t-extract-urls-from-google)



I can't extract urls from google - danisuite - 11-10-2021

Hello, I put keywords so that scrapebox extracts the urls from the google serps of each word. I have 35 proxies set google passed, I have set scrape box in the white list of the antivirus but still I can't get it to extract well.

Most of the time it doesn't get any results and the few times it gets results are very few for the amount of keywords I put in.

What can I be doing wrong?

Thank you very much


RE: I can't extract urls from google - loopline - 11-11-2021

Im guessing proxies are blocked. If you go to help >> show error log >> harvester

what are the errrors?

503, 429 and 403 are all ip bans.


RE: I can't extract urls from google - danisuite - 11-12-2021

(11-11-2021, 06:06 PM)loopline Wrote: Im guessing proxies are blocked.  If you go to help >> show error log >> harvester

what are the errrors?

503, 429 and 403 are all ip bans.

these are the errors I get, what could be the problem?  Thank you very much


10/11/2021 21:29:32: HTTP: -1 Connect timed out., URL: https://www.google.it/search?complete=0&hl=it&q=la+storia+successo+del+trader+paul+baccaglini&num=100&start=0&filter=0&pws=0
10/11/2021 21:29:52: HTTP: -1 Read timed out., URL: https://www.google.it/search?complete=0&hl=it&q=come+avviare+un+nuovo+business&num=100&start=0&filter=0&pws=0
10/11/2021 21:29:59: HTTP: -1 Connect timed out., URL: https://www.google.it/search?complete=0&hl=it&q=fare+soldi+con+il+poker+online+e+possibile&num=100&start=0&filter=0&pws=0
10/11/2021 21:30:13: HTTP: -1 Read timed out., URL: https://www.google.it/search?complete=0&hl=it&q=guadagnare+con+le+web+serie&num=100&start=0&filter=0&pws=0


RE: I can't extract urls from google - loopline - 11-13-2021

Google will never timeout. So these are proxy errors, if your using proxies, especially if its public proxies.

Else if its private proxies it could be still due to proxies but it could be security software. So make sure you add an exception in all security software, for the entire scrapebox folder.


RE: I can't extract urls from google - zboo - 03-01-2022

Hello,

I have the same problem here :
I use Stormproxies backconnect rotating proxies and I had no problem since 2014 using this method, but since 2021 it appears it's not working anymore.

Here is my detailed harvester log (extract) :
24/02/2022 14:09:34: HTTP: 429 HTTP/1.1 429 Too Many Requests, URL: https://www.google.com/search?complete=0&hl=en&q=site%3Ainstagram%2Ecom%20%22john%20durand%22%20photo&num=100&start=0&filter=0&pws=0 Proxy: 37.48.118.90:13042

It's the 429 error Loopline analyses as IP bans... Is there any new method or Proxy provider I should know because Scrapebox is useless to me right now. Thanks a lot.


RE: I can't extract urls from google - loopline - 03-01-2022

usually detailed harvester will keep retrying forever, does it stop for you?

google proxies are hard to find, because google bans faster then ever and they don't share the reasons why they ban or the data about it. So there are still good proxies in the back connect pool of proxies, but you have to do more retries.

You should be able to run detailed harvester and let it run indefinitely and get results, although perhaps slow due to the back connect proxies.