[Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - Printable Version +- ScrapeBox Forum (https://www.scrapeboxforum.com) +-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion) +--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk) +--- Thread: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers (/Thread-email-scraper-plugin-custom-crawler-how-to-websites-with-no-distinct-markers) |
[Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-10-2018 Hello, I purchased the Email Scraper Plugin and I'm trying to scrape a directory. I want to scrape companies' name and email. However, the website I scrape isn't using unique markers. For example the company name is juste a <span>, so SB catches all the other <span> on the page. Do you know any solution to this? I've tried writing the marker with the code above the <span> to make it unique but ScrapeBox doesn't allow line breaks. Ex: Code: <h2>Company</h2> Also, there is no email for some companies, so the marker detected is the one for the next company's email in the directory and it mixes up all the names and emails. I'm trying to figure out how to solve this and I would love some help... RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-11-2018 Scrapebox can work with line breaks, you can use #13#10 but I don't think it needs it, it should search without it. I wouldn't know unless I saw the actual html though. as for the email not being there, that Im not sure about. RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-11-2018 Ill find out about the skipping the mail. RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-11-2018 (07-11-2018, 04:31 AM)loopline Wrote: Scrapebox can work with line breaks, you can use #13#10 but I don't think it needs it, it should search without it. Thanks for your help. This is the source code with the COMPANY field : Code: <div class="lbb-result__header"> I tried using Code: <div class="lbb-result__header"> <h2> <span> Code: <div class="lbb-result__header"><h2><span> Code: <div class="lbb-result__header">#13#10<h2>#13#10<span> RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-12-2018 You will likely need to take into account every space, tab, line feed and carriage return. So Code: <div class="lbb-result__header">#13#10#13#10 <h2>#13#10 <span> RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-12-2018 (07-12-2018, 05:52 AM)loopline Wrote: You will likely need to take into account every space, tab, line feed and carriage return. I tried but it's not working either... ? RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-12-2018 Not sure. there is probably a space I missed or something, but you get the idea. You can revisit the html and compare what I have and built it like that, but make it match the exact html. As for the actual skipping of data, its only looking for markers. So if the email isnt there, its going to proceed to the next marker, there is no other way to do it. You would need to have a custom scraper coded by a developer to know to go to the next entry if email isn't present and keep it all like you want etc.. RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - PixAtom - 07-13-2018 OK thanks for your help ! RE: [Email Scraper Plugin - Custom Crawler] How to websites with no distinct markers - loopline - 07-17-2018 your welcome, cheers! |