Scraping for emails - Printable Version

Scraping for emails - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Scraping for emails (/Thread-scraping-for-emails)

Scraping for emails - Mikkyo - 06-19-2015

Hi everyone,

I'm new to using ScrapeBox. I'm looking to scrape for email address for particular niches, I'm hoping to scrape Cheackatrade.com for email addresses.

Can this be done? Has anyone done it before?

thanks,

Mikkyo

---EDIT---

I've been testing on about 30 page URLs that have email address on, whenever I run the email grabber I get 'email grabber complete, but couldn't find valid emails' notification.

Is this because they aren't say Googlemail accounts rather they are domain emails, like [email protected].

RE: Scraping for emails - Mikkyo - 06-19-2015

I've got a list of 738 webpages, I know at least half of them if not more will have email addresses on them.

Yet whenever I run the grabber I get that no email address found message...

RE: Scraping for emails - loopline - 06-19-2015

Can you post a few example urls? I can give you more specific info if I can see the code on the end website.

RE: Scraping for emails - Mikkyo - 06-22-2015

(06-19-2015, 08:53 PM)loopline Wrote: Can you post a few example urls? I can give you more specific info if I can see the code on the end website.

Example pages: http://www.checkatrade.com/AJones/
http://www.checkatrade.com/SouthCoastBoilers/
---
Also now trying to extract external links from pages like this: http://www.yell.com/ucs/UcsSearchAction.do?keywords=plumber&location=Bristol&scrambleSeed=1960283414&pageNum=2 and getting error redirect messages.

RE: Scraping for emails - loopline - 06-22-2015

(06-22-2015, 08:25 AM)Mikkyo Wrote:
(06-19-2015, 08:53 PM)loopline Wrote: Can you post a few example urls? I can give you more specific info if I can see the code on the end website.

Example pages: http://www.checkatrade.com/AJones/
http://www.checkatrade.com/SouthCoastBoilers/
---
Also now trying to extract external links from pages like this: http://www.yell.com/ucs/UcsSearchAction.do?keywords=plumber&location=Bristol&scrambleSeed=1960283414&pageNum=2 and getting error redirect messages.

The issue is that the email is not in the page source. Scrapebox doesn't use javascript because it uses sockets so it can be multi threaded and fast, but sockets don't support javascript. The email address looks like this:
ajones247@hotmail.co.uk

and then javascript decodes it to be the email address.

So this will not work with Scrapebox.

The link extractor works fine for me on that page, your using V2? Does it do this with and without proxies? Because its possible your ip(s) are just blocked by yell for too many requests.

RE: Scraping for emails - Mikkyo - 06-23-2015

(06-22-2015, 07:56 PM)loopline Wrote:
(06-22-2015, 08:25 AM)Mikkyo Wrote:
(06-19-2015, 08:53 PM)loopline Wrote: Can you post a few example urls? I can give you more specific info if I can see the code on the end website.

Example pages: http://www.checkatrade.com/AJones/
http://www.checkatrade.com/SouthCoastBoilers/
---
Also now trying to extract external links from pages like this: http://www.yell.com/ucs/UcsSearchAction.do?keywords=plumber&location=Bristol&scrambleSeed=1960283414&pageNum=2 and getting error redirect messages.

The issue is that the email is not in the page source. Scrapebox doesn't use javascript because it uses sockets so it can be multi threaded and fast, but sockets don't support javascript. The email address looks like this:
ajones247@hotmail.co.uk

and then javascript decodes it to be the email address.

So this will not work with Scrapebox.

The link extractor works fine for me on that page, your using V2? Does it do this with and without proxies? Because its possible your ip(s) are just blocked by yell for too many requests.

I haven't bought any proxies, I did start harvesting some yesterday though and will try using them today once they've been tested.

I did try it with a few proxies and it did then work, which is good. So I need to use the link extractor with proxies to get it to work.

RE: Scraping for emails - loopline - 06-26-2015

Well the link extractor doesn't inherantly need proxies, but if the end site has blocked your ip, perhaps from too many requests. Then yes you need to use proxies and make sure you don't go too fast. Meaning if your hammering away on 1 domain with 100 connections thats sucking up a lot of their server resources and may make them ban your proxies too so just be mindful of the number of connections your hitting concurrently on 1 domain.

1 connection per ip is probably a good ratio.