ScrapeBox Forum
How to keep from scraping same sites twice?? - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: How to keep from scraping same sites twice?? (/Thread-how-to-keep-from-scraping-same-sites-twice)



How to keep from scraping same sites twice?? - ringer37 - 04-13-2011

So I've had scrapebox for awhile now and have had good success with it. But I've been wondering how to best keep from harvesting the same sites over and over again.

When you harvest a new list of sites, how do you make sure you haven't already harvested these before to make sure you're always scraping fresh sites?

I've thought about keeping a master list of all scraped sites, and checking each new list against that, but I don't know how to do that without merging master with the new list and removing duplicates, which wouldn't leave me with a fresh list of never before posted to forums.

I'd like to hear how you take care of this issue. Thanks.




RE: How to keep from scraping same sites twice?? - tcp01 - 04-15-2011

i have a master list.

I then add the urls to a master list.
Which is created in txt file.

So say i am harvesting a new set of auto approved blogs for example load them in the url's harvested and then go into
import url list.
then select url list to compare on a domain level
then when prompted go and upload your master list to compare against

This will remove any duplicate url's from the new list that you already have on your master list.....





RE: How to keep from scraping same sites twice?? - ringer37 - 04-15-2011

(04-15-2011, 07:51 AM)tcp01 Wrote: i have a master list.

I then add the urls to a master list.
Which is created in txt file.

So say i am harvesting a new set of auto approved blogs for example load them in the url's harvested and then go into
import url list.
then select url list to compare on a domain level
then when prompted go and upload your master list to compare against

This will remove any duplicate url's from the new list that you already have on your master list.....

It removes duplicates from the new list without merging it together with the master? That's great news to me if true. Gonna give it a try. Thanks.