ScrapeBox Forum
Scraping text between <H>tags - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Scraping text between <H>tags (/Thread-scraping-text-between-h-tags)



Scraping text between <H>tags - kendog819 - 01-06-2017

I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't.

Can someone point me in the direction of being able to accomplish this?

Thanks,
Ken


RE: Scraping text between <H>tags - Hambone Oblivion - 01-10-2017

(01-06-2017, 08:48 PM)kendog819 Wrote: I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't.

Can someone point me in the direction of being able to accomplish this?

Thanks,
Ken

Ken, great question, I was wondering the same thing, I want to be able to scrape the post title from a blog page. Which is typically the <h2> tag. Any help is appreciated.


RE: Scraping text between <H>tags - kendog819 - 01-10-2017

(01-10-2017, 12:13 AM)Hambone Oblivion Wrote:
(01-06-2017, 08:48 PM)kendog819 Wrote: I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't.

Can someone point me in the direction of being able to accomplish this?

Thanks,
Ken

Ken, great question, I was wondering the same thing, I want to be able to scrape the post title from a blog page. Which is typically the < h2 > tag. Any help is appreciated.

https://www.youtube.com/watch?v=X3Ep-NXg4kY

This seems to be the only way I have found it to work. It would be nice if it were exportable into a csv or xls sheet though. It just exports into a text doc now.


RE: Scraping text between <H>tags - Hambone Oblivion - 01-10-2017

(01-10-2017, 02:45 PM)kendog819 Wrote:
(01-10-2017, 12:13 AM)Hambone Oblivion Wrote:
(01-06-2017, 08:48 PM)kendog819 Wrote: I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't.

Can someone point me in the direction of being able to accomplish this?

Thanks,
Ken

Ken, great question, I was wondering the same thing, I want to be able to scrape the post title from a blog page. Which is typically the < h2 > tag. Any help is appreciated.

https://www.youtube.com/watch?v=X3Ep-NXg4kY

This seems to be the only way I have found it to work. It would be nice if it were exportable into a csv or xls sheet though. It just exports into a text doc now.

I tried this, and it works, you are correct it dumps into a txt file. I then have to go into excel and do some index and match functions to get the <h2> tags to match the respective URL. I want to use this information in the comments field, so I organized my info to run line by line, but what I don't know and haven't tested yet, is does SB go line by line in the fast poster meaning if I have names, websites, comments and blogs all in a particular order from 1-whatever will SB go line by line? I think it does, just haven't tested. Anyone know for sure?


RE: Scraping text between <H>tags - loopline - 01-10-2017

No, fast poster picks things randomly.