02-08-2016, 12:25 PM
Hey guys
I have been using Scrapebox for a while for some smallish task but am now facing one that I could use some advice on!
I need to scrape Google for the URLs that are indexed for a particular domain and that contain a certain parameter.
Now I will do this using site: and inurl: commands.
However - there is around 110,000 URLs indexed and I need to extract ALL of them. (Basically these 110K uRLs need removed from the index and I need Google to recrawl them in order to find the NOINDEX on them....I plan on doing this by submitting the URLs in an XML map in GWT)
Any thoughts on how many proxies I will need to do this and the best source of such proxies? I am happy to buy these proxies as I have been using Squid Proxies but never for such a large scrape.
Any thoughts and opinions greatly welcomed!
Thanks!
I have been using Scrapebox for a while for some smallish task but am now facing one that I could use some advice on!
I need to scrape Google for the URLs that are indexed for a particular domain and that contain a certain parameter.
Now I will do this using site: and inurl: commands.
However - there is around 110,000 URLs indexed and I need to extract ALL of them. (Basically these 110K uRLs need removed from the index and I need Google to recrawl them in order to find the NOINDEX on them....I plan on doing this by submitting the URLs in an XML map in GWT)
Any thoughts on how many proxies I will need to do this and the best source of such proxies? I am happy to buy these proxies as I have been using Squid Proxies but never for such a large scrape.
Any thoughts and opinions greatly welcomed!
Thanks!