Looplines Scrapebox List

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Consolidating Custom Data
#1
Ok, so i´m using the custom data grabber to get different sets of data from
a set of profiles but i also want to grab the emails. As far as i understand it 
i cannot use the CDG to grab emails.  What would be the best way or process
 to get both data sets.

Ideally i want to create profiles of people with name, follower count, social media data plus their
email contact data scraped from one or multiple URLs.
 
My approach would be the following
1. use an email crawler that adds url information
2. take the custom data grabber and run those urls
3. somehow consolidate the information using excel or the help of a coder.


Ideally i´d like take custom data grabs from different sites and augment those searches in one database though.
Because in most cases emails are a scarce ressource on one plattform i´d like to search for it on multiple platforms.

My approach for this would be:

1.grab links by crawling a domain/site that has a relatively large amount of the the contacts i´m looking for.
2. use custom data grabber to find name & data like follower count for relevance segmentation.
3. use an email crawler to grab those urls
4.  use the name as search term in a search for "site:otherdomain.com" to get more information/emails from other domains.

please advise if there is an easier process for this or if im overthinking stuff again.


Also when i´m exporting why can´t there be a way to just export ONE URL with all the custom fields instead of having the custom fields repeating all the time?
Reply
#2
the custom fields are 1 per line and each line is the output of 1 mask worth of data in the custom data grabber, so there is no way to combine that data inside of scrapebox.

You could grab mails with scrapebox, just make some regex to get that done in the custom data grabber.

I would have a coder just whip up something that looks at the url that is tacked on (as you can save the source url with all custom data) and then combine everything that comes from the same url.
Reply
#3
(06-01-2017, 06:32 PM)loopline Wrote: the custom fields are 1 per line and each line is the output of 1 mask worth of data in the custom data grabber, so there is no way to combine that data inside of scrapebox.  

You could grab mails with scrapebox, just make some regex to get that done in the custom data grabber.  

I would have a coder just whip up something that looks at the url that is tacked on (as you can save the source url with all custom data) and then combine everything that comes from the same url.

What should i tell the coder in regards to the briefing? I´m not how to best explain what he needs to do? I tried explaining it a few guys and they couldn´t help.


Is there a way to put the data fields into seperate files so i get one file for each custom data grab entry? I´m currently having problems getting the data imported into
a database. It would have been much easier if it was in XML format.
Reply
#4
Scrapebox wanted to build a fully featured scraper, which would have solved all your problems. The current scraper was never intended to do anything besides basic data scraping. But the components needed to build a fully featured scraper, they do not exist for Delphi (Which is what scrapebox is written in). So until such a time as those components are created, then the scraper is permanently on hold.

That aside, you can lock each url to the data scraped. Meaning the url the data came from, but thats the only option.

If you wanted to get sperate files you would need to create seperate modules and run each one. So if you had 5 sets of data, create each set as its own module and run each one once, thus running the total set of urls 5 times total. If that makes sense.
Reply
#5
Ok, np

I´m having a few problems with scraping using REGEX that seem to work under regex101.com but not in SB.

Is there something to take into account when working with REGEX in SB?

I have some versions that work while others don´t .

Specifically scraping FB pages seems to have issues.

I tried these:
>(.*@.*)<

[a-zA-Z0-9_.+-]+(@|@Wink[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+

[a-zA-Z0-9_.+-]+(@|@Wink[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+


@ is replaced with |@Wink in facebook pages.


Social media regexes and regular email regex seems to work.
Any ideas?
Reply
#6
(06-13-2017, 03:52 PM)scrapingby Wrote: Ok, np

I´m having a few problems with scraping using REGEX that seem to work under regex101.com but not in SB.

Is there something to take into account when working with REGEX in SB?

I have some versions that work while others don´t .
Reply
#7
Im not a regex expert to be honest, you can hit up any forum on coding though. There are a lot of "flavors" of regex, and Scrapebox is written in Delphi so its Delphi's version of regex.
Reply




Users browsing this thread: 1 Guest(s)
Looplines Scrapebox List