04-13-2015, 01:49 PM
Because the harvester is geared towards getting the most URL’s using the fewest requests and the least amount of bandwidth it uses a useragent that returns an older style Google design. The pages Google return to modern browsers are huge with a lot of useless fluff/html/js which can add up to GB’s of additional bandwidth consumed on big scrapes.
If you want the full fluff version like a browser, in the custom harvester you just need to change the useragent to something modern like Chrome v41
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
And remove the &num=100 from the URL string.
You do this under settings >> harvester engine configuration. Click google and then remove the &num=100 from the string and update the useragent and then click update engine, or save it as a new engine.
Bearing in mind the extra fluff of the page may slow things down and will use more bandwidth.
If you want the full fluff version like a browser, in the custom harvester you just need to change the useragent to something modern like Chrome v41
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
And remove the &num=100 from the URL string.
You do this under settings >> harvester engine configuration. Click google and then remove the &num=100 from the string and update the useragent and then click update engine, or save it as a new engine.
Bearing in mind the extra fluff of the page may slow things down and will use more bandwidth.