Login

jamesmel · 05-28-2016, 05:30 PM

Ive got a profile site id like to scrape.

The format of the start URL im interested in is : http://example.com/artsits/new-york - on there is a list a artists in new york. 10 per page with pagination (next / prev buttons) between each of the pages, there are 100s of pages.

- Im trying to scrape a list of all the profiles pages URLS from the start URL
- Then once i have a list of their URLs i want to go to each profile page and extract select data (eg: name, email, url) either using the html path or xpath (or something similar)

Previously to do this i was using Kimonify which worked ok, but dosnt support proxies and dosnt support crawl rate which often gets its self banned as it just powers through the crawl to quickly.

Can this be done with Scrape Box ?

**loopline** · 05-28-2016, 11:28 PM

In theory yes. Id say its done in 2 segements. So if page 2 and page 3 and page 4 etc.. are all a structured url like

http://example.com/artsits/new-york?page2
http://example.com/artsits/new-york?page3
http://example.com/artsits/new-york?page4

Then you could quickly fabricate all the pages of profile pages. You can use the merge feature to do this, just generate a list of numbers like 1 thru 5000 or how ever many pages there are and then merge a url with them

http://example.com/artsits/new-york?page%kw%

info here
http://scrapeboxfaq.com/how-do-i-use-tok...rge-option

Then use the link extractor addon to extract internal links from each one of the generated pages, which will effectively give you the urls of the profile pages.

Then you run that list thru the custom data scraper to get your info like name, mail, url etc..

So I guess thats 3 steps, but 1 of them is quick and dirty in scrapebox without having to query the site a bunch.

jamesmel · 05-30-2016, 01:26 PM

Thanks Loopline, when you say custom data scraper is that an add on for scrapebox ?

(05-28-2016, 11:28 PM)loopline Wrote: In theory yes. Id say its done in 2 segements. So if page 2 and page 3 and page 4 etc.. are all a structured url like

http://example.com/artsits/new-york?page2
http://example.com/artsits/new-york?page3
http://example.com/artsits/new-york?page4

Then you could quickly fabricate all the pages of profile pages. You can use the merge feature to do this, just generate a list of numbers like 1 thru 5000 or how ever many pages there are and then merge a url with them

http://example.com/artsits/new-york?page%kw%

info here
http://scrapeboxfaq.com/how-do-i-use-tok...rge-option

Then use the link extractor addon to extract internal links from each one of the generated pages, which will effectively give you the urls of the profile pages.

Then you run that list thru the custom data scraper to get your info like name, mail, url etc..

So I guess thats 3 steps, but 1 of them is quick and dirty in scrapebox without having to query the site a bunch.

**loopline** · 05-31-2016, 01:01 AM

Its just part of the main Scrapebox, here is a video
https://www.youtube.com/watch?v=X3Ep-NXg4kY

your welcome

Login

Username:
Password:

Login

Username:
Password: