ScrapeBox Forum
[Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers (/Thread-email-scraper-plugin-custom-crawler-how-to-websites-with-no-distinct-markers)



[Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-10-2018

Hello,

I purchased the Email Scraper Plugin and I'm trying to scrape a directory. I want to scrape companies' name and email.

However, the website I scrape isn't using unique markers. For example the company name is juste a <span>, so SB catches all the other <span> on the page.
Do you know any solution to this?
I've tried writing the marker with the code above the <span> to make it unique but ScrapeBox doesn't allow line breaks.

Ex:
Code:
<h2>Company</h2>
 <span>ScrapeBox</span>


Also, there is no email for some companies, so the marker detected is the one for the next company's email in the directory and it mixes up all the  names and emails.
I'm trying to figure out how to solve this and I would love some help...


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-11-2018

Scrapebox can work with line breaks, you can use #13#10 but I don't think it needs it, it should search without it.

I wouldn't know unless I saw the actual html though.

as for the email not being there, that Im not sure about.


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-11-2018

Ill find out about the skipping the mail.


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-11-2018

(07-11-2018, 04:31 AM)loopline Wrote: Scrapebox can work with line breaks, you can use #13#10  but I don't think it needs it, it should search without it.  

I wouldn't know unless I saw the actual html though.  

Thanks for your help.

This is the source code with the COMPANY field :
Code:
           <div class="lbb-result__header">

 <h2>
   <span>COMPANY</span>
   - <small>CITY</small>
   
 </h2>

I tried using 
Code:
<div class="lbb-result__header"> <h2> <span>
or 
Code:
<div class="lbb-result__header"><h2><span>
or 
Code:
<div class="lbb-result__header">#13#10<h2>#13#10<span>
 as "before markers" (with a closing span as "after marker") but without any success.

   


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-12-2018

You will likely need to take into account every space, tab, line feed and carriage return.  
So

Code:
<div class="lbb-result__header">#13#10#13#10 <h2>#13#10   <span>
Should work


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-12-2018

(07-12-2018, 05:52 AM)loopline Wrote: You will likely need to take into account every space, tab, line feed and carriage return.  
So

Code:
<div class="lbb-result__header">#13#10#13#10 <h2>#13#10   <span>
Should work

I tried but it's not working either... ?


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-12-2018

Not sure. there is probably a space I missed or something, but you get the idea. You can revisit the html and compare what I have and built it like that, but make it match the exact html.

As for the actual skipping of data, its only looking for markers.

So if the email isnt there, its going to proceed to the next marker, there is no other way to do it. You would need to have a custom scraper coded by a developer to know to go to the next entry if email isn't present and keep it all like you want etc..


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-13-2018

OK thanks for your help !


RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-17-2018

your welcome, cheers!