ScrapeBox Forum
Filtering out Domains? - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Filtering out Domains? (/Thread-filtering-out-domains)



Filtering out Domains? - telboy - 12-30-2010

So i've harvested quite a few links, Is it possible to extract links from the list, depending on the domain?

E.g .gov's or edu's?


RE: Filtering out Domains? - s4nt0s - 01-04-2011

You could do it with a program called, "textcrawler". You will have to use reg ex code to do it so you might have to post on their forum to find out how exactly.

Instead of going through the hassle you could always just use a custom footprint before harvesting that would only find .govs or .edu's so you won't have to sift through and filter them out.

For example: site:.edu "powered by wordpress" "leave a reply"

That will only find wordpress .edu blogs