10-15-2011, 05:40 PM
(10-15-2011, 12:38 PM)karlos Wrote: I want to scrape one particular domain to submit all their URLs (about 19,000 pages) for indexing.
The domain apparently hasn't got a sitemap (at least I couldn't find them) and if I put site:http://www.rootdomain.com in the footprint window, use proxies and harvest, scapebox deletes 99% results because apparently "the keywords were maybe too" similar.
Should I use the search site differently or is there another way to scrape one domain only?
Can you help?
K
First, go to the toolbar at the top of Scrapebox and select "options" and uncheck, "Automatically Remove Duplicate Domains".
Second, make sure you only have Google, Bing and Aol checked when using the site: command. Yahoo doesn't use the "site:" command as far as I know.
After you're done harvesting, make sure and go to remove/filter > remove duplicate URL's.
Since you're harvesting from three different search engines, you will get a lot of the same URL's.
Problem solved. Then sit back and enjoy a beer.