ScrapeBox Forum
Unexpected results when using site search - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Unexpected results when using site search (/Thread-unexpected-results-when-using-site-search)



Unexpected results when using site search - gbmarkham - 12-01-2017

I'm using ScrapeBox to help with some research in the real estate market.  Using Bing, I'm trying to harvest a list of URLs from a specific site (Realtor.com), 1 URL for each address.  I've copied the addresses to the keyword list box, 1 per line and each address is enclosed in quotes.  Also, I have "site:realtor.com" in the footprint field.  From a list of 30 addresses, 5 failed to provide a URL.  

In troubleshooting this, I pulled up Bing, entered the search as the following, "site:realtor.com '108 7TH ST ASHLAND OR 97520'".  Normally, this will show the URL I want at the top of the page, but in this case it doesn't show it at all (at least not on the first 3 pages of results).  If I omit the site search reference, then it will show as the first result.  However, adjusting my ScrapeBox config to account for this (removing the footprint altogether) provides less than desirable results.  I'm only wanting results from realtor.com for each address, but I'm getting a mix of realtor.com and zillow.com.

In this example, the URL I'm trying to grab is: https://www.realtor.com/realestateandhomes-detail/108-7th-St_Ashland_OR_97520_M18700-82449
Once I have the list of correct URL's, then plan is to revisit each page and harvest more real estate stats (year built, # beds/baths, sq ft, etc).

Any ideas on how to configure this properly for the desired outcome?

Thanks,

Greg


RE: Unexpected results when using site search - loopline - 12-09-2017

Well you can't force bing to give you results differently necessarily.

So you can just harvest all the results without "site:realtor.com" and then when done go to filter>>remove urls not containing - and then put realtor.com and scrapebox will filter out all the non realtor.com urls.

Thats one option.

Else run them all with "site" export the not found ones and then manually grab them or try the filter method.