ScrapeBox Forum
Why Does SB return results that don't have the keywords Im searching for? - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Why Does SB return results that don't have the keywords Im searching for? (/Thread-why-does-sb-return-results-that-don-t-have-the-keywords-im-searching-for)



Why Does SB return results that don't have the keywords Im searching for? - Nikes - 04-05-2011

Say if Im scraping for ArticleDB sites.

I use something like: "Powered by Article Dashboard".

Then I get like 80k results. Remove dupes and left with say 10k.

But many of them are not ArticleDB sites and don't even have the text "Powered by Article Dashboard" anywhere on the page. So why does SB return these?


RE: Why Does SB return results that don't have the keywords Im searching for? - Nikes - 04-07-2011

Oh wow like 50+ views and not an answer? Wheres the support? Big Grin
Another thing while Im here. Why does SB return duplicate domains? Say I'm only harvesting from Google. I get a big list of say 5k sites. I remove dupes and Im left with like 100.
Why bring me a result it's already got?

Also how to remove urls containing more than one thing you want removed? Or do you have to just do it over and over?

Thanks.


RE: Why Does SB return results that don't have the keywords Im searching for? - s4nt0s - 04-08-2011

First, I wanted to say that this isn't really an official Scrapbox support forum. We're just a growing community of Scrapebox fans that try to help each other when we have free time and know the answers to the questions. I say this because I don't want our lack of "support" to reflect on Sweetfunny (Scrapebox developer). This forum and the developer aren't affiliated.

It looks like you need to make a better footprint then "Powered by Article Dashboard". Look for other things on these web pages you can add in your footprint to narrow it down ever more. For example if it's an article site it might say "submit article" so you could add that to your footprint like this:

"Powered by Article Dashboard" And "submit article" And "your keywords"

The footprints can be as long as you want so try to get as much good info in there as possible.

Also, the reason you're getting so many duplicated domains is probably because of your footprint and maybe using closely related keywords. Continue to tweak your footprint to get better results.

As far as removing URL's containing more than one thing you want removed.. I'm not sure if Scrapebox can do that. You might need to use a free text program like Text Crawler. You will need to have some basic knowledge of reg ex to make it work.


RE: Why Does SB return results that don't have the keywords Im searching for? - Nikes - 04-09-2011

Hi thanks for that.

Still though, SB returns duplicate sites. Why not just remove a site it's already found? Wouldnt that be easier and save time?

Also regarding removing more than one type of url from harvested urls and saying "I'm not sure if Scrapebox can do that."
There is an option to "Remove URL's containing" so you can remove say any URL containing blogspot.com or wordpres.com or remove any URL's that end in .pdf or .doc or .txt for example.

What I mean is instead of having to do them one by one is it possible to put all the urls you want removed that contain any of the following and then put what urls you want removed by putting something like wordpress.com,blogger.com,livejournal.com,.pdf,.doc,.txt etc etc or even add these to a blacklist to automatically remove them upon harvesting.


RE: Why Does SB return results that don't have the keywords Im searching for? - s4nt0s - 04-10-2011

Scrapebox does automatically remove duplicate domains after harvesting. All you have to do is enable it by going to options and selecting, "automatically remove duplicate domains".

I don't entirely understand your other questions so I'll let someone else chime in with the answer.


RE: Why Does SB return results that don't have the keywords Im searching for? - Nikes - 04-10-2011

(04-10-2011, 07:36 PM)s4nt0s Wrote: Scrapebox does automatically remove duplicate domains after harvesting. All you have to do is enable it by going to options and selecting, "automatically remove duplicate domains".

I don't entirely understand your other questions so I'll let someone else chime in with the answer.

This is true but it still only does it upon harvest completion. After it tells you what keywords were used and what ones wern't. My point is why it even puts a url in the harvested list its already found. Doesn't this just unneccasarily use up proxies?

Still would like to know how to remove certain urls by using the remove urls containing feature in one rather than having to do it one by one.

Anyone