11-28-2018, 01:37 PM
Hi there,
I am sure there would be an option but am not sure which one or how it would be done. Like we harvest lots of url's , and the process is we remove duplicates.
Then I want to remove the url's with certain words like ;
youtube.
wiki
cnn
bbc
So what I want is perhaps create a file or I did find a blacklist word and edited , put those words in it , and removed those but those url's still remained , so maybe there is something wrong with how I am doing it.
Also would be great to know if you guys could guide how I can harvest so that these url's containing those stop words are not harvested.
Thanks again
I am sure there would be an option but am not sure which one or how it would be done. Like we harvest lots of url's , and the process is we remove duplicates.
Then I want to remove the url's with certain words like ;
youtube.
wiki
cnn
bbc
So what I want is perhaps create a file or I did find a blacklist word and edited , put those words in it , and removed those but those url's still remained , so maybe there is something wrong with how I am doing it.
Also would be great to know if you guys could guide how I can harvest so that these url's containing those stop words are not harvested.
Thanks again