ScrapeBox Forum
Scraping FreeIndex.co.uk - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Scraping FreeIndex.co.uk (/Thread-scraping-freeindex-co-uk)



Scraping FreeIndex.co.uk - tonymgarner - 09-15-2017

Hi

I have used the custom scraper to gather contacts from the Freeindex.co.uk site.  That works fine.

I wanted to grab the websites.  I have tried external links (only youtube links returned) and I have tried the Custom Data option (no data gathered).

This is one of the URL's I am scraping:

https://www.freeindex.co.uk/profile(!daft!-cover-band)_287978.htm

I used the before_after= basis to grab the addresses but, even though I can see the web address in the coding (viewed in IE, press F12), I cannot seem to get at it.

Is it me or is there a foul HTML demon at play here :-)

Thanks in advance


RE: Scraping FreeIndex.co.uk - loopline - 09-15-2017

Thats because there is no external link, here is the code

<a href="/record_click.asp?id=120635&ctype=profile" target="_blank" class="u" title="http://www.daftonline.co.uk">www.daftonline.co.uk</a>

You can see the title and the anchor is the link you want, but the actual ahref= is a internal link. So it probably passes the title/anchor to the internal recording system and counts the click and redirects. But there is no actual valid external link there.

You might get the custom data scraper to work with some regex, matching the title to the anchor or looking for the record click event and getting the anchor or title after that. Not sure, not a regex expert, don't even know if its possible.

Before and After would be very hard as there isn't a lot of unique marker there, but its possibly doable like

before_after=&ctype=profile" target="_blank" class="u" title="|">


RE: Scraping FreeIndex.co.uk - tonymgarner - 09-16-2017

(09-15-2017, 11:27 PM)Did it more by accident that design.  Used before_after in CDG to get a wider pool of text and then weedled it down and got the URL.Would be great if the CDG recorded the mask name / field name with the DATA grabbed.  It would help to stitch it all back together in excel.Happy Scraping Wrote: Thats because there is no external link, here is the code

<a href="/record_click.asp?id=120635&ctype=profile" target="_blank" class="u" title="http://www.daftonline.co.uk">www.daftonline.co.uk</a>

You can see the title and the anchor is the link you want, but the actual ahref= is a internal link.  So it probably passes the title/anchor to the internal recording system and counts the click and redirects.  But there is no actual valid external link there.  

You might get the custom data scraper to work with some regex, matching the title to the anchor or looking for the record click event and getting the anchor or title after that.  Not sure, not a regex expert, don't even know if its possible.  

Before and After would be very hard as there isn't a lot of unique marker there, but its possibly doable like

before_after=&ctype=profile" target="_blank" class="u" title="|">