ScrapeBox Forum
Scraping Google News - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Scraping Google News (/Thread-scraping-google-news)

Pages: 1 2


Scraping Google News - Nosh - 08-30-2020

Hi everybody !
I would like to scrape results of spanish Google News. I tried this query string but I get 0 results. Anybody can help ?
https://www.google.com/search?q={KEYWORD}&complete=0&hl=es&pws=0&sxsrf=ALeKk03HvZBF2uo3jX6CcGFg93SSRRAksg:1598728775740&source=lnms&tbm=nws&sa=X&ved=2ahUKEwizr5TmkMHrAhXfyTgGHeIiBcAQ_AUoAnoECGwQBA&biw=1440&bih=489


RE: Scraping Google News - serialscraper - 08-30-2020

Start here and let me know if you run into any problems

https://news.google.com/search?q={KEYWORD}&hl=es


RE: Scraping Google News - Nosh - 08-31-2020

Hi,
there is no "real" Google News in Spain, that's the problem


RE: Scraping Google News - serialscraper - 08-31-2020

Are you looking for a Spanish news website? if so, let me know which one so that we have a starting point.


RE: Scraping Google News - Nosh - 09-01-2020

I want to scrape this:
https://www.google.es/search?q=bla+bla&client=safari&sxsrf=ALeKk01TSBVY3_J38pz6FucI2kv5X6zYFw:1598959266718&source=lnms&tbm=nws&sa=X&ved=2ahUKEwjRlOm468frAhVTTcAKHZ_9A2wQ_AUoBHoECHgQBg&biw=1694&bih=984

In Spain its not a "real" Google News section because of this:
https://www.newsmediaalliance.org/google-news-shutdown-in-spain-not-as-bad-as-google-would-have-you-believe/


RE: Scraping Google News - Nosh - 09-08-2020

Should be something like this: https://www.google.se/search?q={KEYWORD}&source=lnms&tbm=nws but I get 0 results


RE: Scraping Google News - Nosh - 09-11-2020

can anybody help me out with this one ?


RE: Scraping Google News - loopline - 09-11-2020

When you do the engine test, what happens? It can also help on the test page (which is on the screen where you setup the engine) to save the raw html after the test. Because sometimes the html that scrapebox ultimately sees is different then what you see in a browser.


RE: Scraping Google News - Nosh - 09-12-2020

It says: "Error 0. No links could be retrieved"


RE: Scraping Google News - loopline - 09-12-2020

ok, so then save the raw html. Once you do that then you can find the markers.

Because its 1 of 3 things

1 - your getting some sort of general error like 404 or a IP block type error, in which case looking at the raw html should show that pretty easy.

2 - the links are rendered with javacript, in which case scrapebox won't be able to see them. So check the raw html for the links, and see if they are indeed there.

3 - The links are there but your before/after markers are wrong. So you can look at the raw html and determine the correct before and after markers.