Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cleaning Up Harvested URL's
#1
Hi there,

I am sure there would be an option but am not sure which one or how it would be done. Like we harvest lots of url's , and the process is we remove duplicates.

Then I want to remove the url's with certain words like ;

youtube.
wiki
cnn
bbc


So what I want is perhaps create a file or I did find a blacklist word and edited , put  those words in it , and removed those but those url's still remained , so maybe there is something wrong with how I am doing it.

Also would be great to know if you guys could guide how I can harvest so that these url's containing those stop words are not harvested.

Thanks again
Reply
#2
You want to put those words in a file, 1 per line.

Then put your urls in the urls harvested grid in the upper right hand quadrant of scrapebox.

Then go to remove/filter >> remove urls containing entries from. Then select your file.
[-] The following 1 user says Thank You to loopline for this post:
  • tirmizi
Reply
#3
(11-29-2018, 05:17 AM)loopline Wrote: You want to put those words in a file, 1 per line.  

Then put your urls in the urls harvested grid in the upper right hand quadrant of scrapebox.  

Then go to remove/filter >> remove urls containing entries from.  Then select your file.

Perfect Mate . Thanks
Reply
#4
your welcome. Cheers!
[-] The following 1 user says Thank You to loopline for this post:
  • tirmizi
Reply




Users browsing this thread: 1 Guest(s)