Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping Emails from behance.net
#1
Hi guys,

I thinks i need to help here with this site. I have scraped 150K urls from Behance.net but none of the email scraper options will scrape emails from the pages. I have the Email Scraper Plugin and i have tried with the grab/craw option but nothing works. All other sites are fine excpet Behance, but ironically this is the most important site for me. Here is an example list of urls below. If anyone can scrape emails from these pages, tell me.

https://www.behance.net/jessiewhitmill/followers
https://www.behance.net/michelapicchi
https://www.behance.net/anusard/resume
https://www.behance.net/Carelessconundrum/resume
https://www.behance.net/Maryfergin
https://www.behance.net/insborges
https://www.behance.net/mrkc/resume
https://www.behance.net/yousefah/followers
https://www.behance.net/saysomething
https://www.behance.net/wallisonmedeiros
https://www.behance.net/Arcy22/appreciated
https://www.behance.net/AlexiaLou/collections_following
https://www.behance.net/AJVillacentino
https://www.behance.net/frederiquegravier/resume
https://www.behance.net/RickyDP
https://www.behance.net/zoshuacolah/followers
Reply
#2
Im checking with support. Behance has some sort of javsacript and is displaying a generic message of

<h1 id="we-noticed">We notice you are using an outdated version of Internet Explorer.</h1>
<h2 id="browser-not-supported">This version is not supported by Behance.</h2>

when you run the email scraper. I tried Apple iphone user agents and the latest chrome user agent, and it always just gives the above message. So I suspect there is some javascript qualifier that is going to prohibit it from working with scrapebox. This is basically scraping protection on the site, so it probably won't work.

however I am not 100% certain yet.
Reply
#3
Great thanks for the reply.
Reply
#4
so just to confirm if you send the exact same headers as a regular browser sends, the same thing happens. So this is their generic response to javascript being turned off. So this site will not work as scrapebox uses raw sockets and threads and these do not support javascript.
Reply
#5
(10-22-2019, 10:16 PM)loopline Wrote: so just to confirm if you send the exact same headers as a regular browser sends, the same thing happens.  So this is their generic response to javascript being turned off.  So this site will not work as scrapebox uses raw sockets and threads and these do not support javascript.

I apprecite the info. Thanks anyway
Reply
#6
your welcome, cheers!
Reply




Users browsing this thread: 1 Guest(s)