ScrapeBox Forum
Why not all URLs are returned from site search? - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Why not all URLs are returned from site search? (/Thread-why-not-all-urls-are-returned-from-site-search)



Why not all URLs are returned from site search? - theone - 06-15-2014

I have encountered same problem with Gscraper as well.

I am trying to scrape whole URLs of domain that have string in them:

?page=

Doing site search in Google brings results with above, however using Scrapebox or Gscraper does not.

What is the problem? Thanks.


RE: Why not all URLs are returned from site search? - loopline - 06-15-2014

What is your exact footprint that you place into google to get the results you want? In a web browser I mean? There are a lot of possible answers, so I just need to see what your seeing in a browser and then I can be specific in my answer and tell you how to get the same thing in scrapebox. As for gscraper, I don't give any help regarding it as scrapebox is a superior product and no need for gscraper to exist IMHO. Smile


RE: Why not all URLs are returned from site search? - theone - 06-16-2014

In a browser this is what I input:

Quote:site:domain.com inurl:?page=



RE: Why not all URLs are returned from site search? - loopline - 06-16-2014

(06-16-2014, 07:34 AM)theone Wrote: In a browser this is what I input:

Quote:site:domain.com inurl:?page=

Well that works fine in scrapebox. You didn't give the exact domain so I can't really troubleshoot further.

I suspect it could be bad proxies or https urls.

Have a look in google and visit the pages, are the urls https? If so they will only work in the custom harvester in scrapebox.

If they are not https then go to settings and uncheck both use custom harvester and use mutli threaded harvester. Then do a harvest and tell me what errors you get in the status column. Screenshots are welcome also.

If you can't get it, if you pm me an exact domain your working with I can test it more specifically.


RE: Why not all URLs are returned from site search? - theone - 06-17-2014

@loopline

It worked this time after I added inurl:?page= to site:domain.com
Strange that site: operator did not pull pages with strings and inurl did because in Google it works for me and I saw all URLs including with URLs with strings.


RE: Why not all URLs are returned from site search? - loopline - 06-17-2014

(06-17-2014, 06:55 AM)theone Wrote: @loopline

It worked this time after I added inurl:?page= to site:domain.com
Strange that site: operator did not pull pages with strings and inurl did because in Google it works for me and I saw all URLs including with URLs with strings.

Well glad its working anyway.