Scraping text between <H>tags - Printable Version +- ScrapeBox Forum (https://www.scrapeboxforum.com) +-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion) +--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk) +--- Thread: Scraping text between <H>tags (/Thread-scraping-text-between-h-tags) |
Scraping text between <H>tags - kendog819 - 01-06-2017 I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't. Can someone point me in the direction of being able to accomplish this? Thanks, Ken RE: Scraping text between <H>tags - Hambone Oblivion - 01-10-2017 (01-06-2017, 08:48 PM)kendog819 Wrote: I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't. Ken, great question, I was wondering the same thing, I want to be able to scrape the post title from a blog page. Which is typically the <h2> tag. Any help is appreciated. RE: Scraping text between <H>tags - kendog819 - 01-10-2017 (01-10-2017, 12:13 AM)Hambone Oblivion Wrote:(01-06-2017, 08:48 PM)kendog819 Wrote: I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't. https://www.youtube.com/watch?v=X3Ep-NXg4kY This seems to be the only way I have found it to work. It would be nice if it were exportable into a csv or xls sheet though. It just exports into a text doc now. RE: Scraping text between <H>tags - Hambone Oblivion - 01-10-2017 (01-10-2017, 02:45 PM)kendog819 Wrote:(01-10-2017, 12:13 AM)Hambone Oblivion Wrote:(01-06-2017, 08:48 PM)kendog819 Wrote: I am trying to scan the source code of a page like PageScanner would do and grab the content that falls between < h2 > and < /h2 > tags on the pages. I thought that page scanner would do this, but it seems like it won't. I tried this, and it works, you are correct it dumps into a txt file. I then have to go into excel and do some index and match functions to get the <h2> tags to match the respective URL. I want to use this information in the comments field, so I organized my info to run line by line, but what I don't know and haven't tested yet, is does SB go line by line in the fast poster meaning if I have names, websites, comments and blogs all in a particular order from 1-whatever will SB go line by line? I think it does, just haven't tested. Anyone know for sure? RE: Scraping text between <H>tags - loopline - 01-10-2017 No, fast poster picks things randomly. |